From me at louie.lu Tue Aug 1 09:01:30 2017 From: me at louie.lu (Louie Lu) Date: Tue, 1 Aug 2017 21:01:30 +0800 Subject: [Python-ideas] "any" and "all" support multiple arguments Message-ID: Hi all, In "min" and "max" builtin-func, it support two style of args: min(...) min(iterable, *[, default=obj, key=func]) -> value min(arg1, arg2, *args, *[, key=func]) -> value But for "any" and "all", it only support iterable: all(iterable, /) Return True if bool(x) is True for all values x in the iterable. I'm not sure if this is discuss before, but can "any" and "all" support like min_max "arg1, arg2, *args" style? Thanks, Louie. From p.f.moore at gmail.com Tue Aug 1 09:24:47 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 1 Aug 2017 14:24:47 +0100 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: On 1 August 2017 at 14:01, Louie Lu wrote: > I'm not sure if this is discuss before, but can "any" and "all" > support like min_max "arg1, arg2, *args" style? I don't see any particular reason why not, but is there a specific use case for this or is it just a matter of consistency? Unlike max and min, we already have operators in this case (and/or). I'd imagine that if I had a use for any(a, b, c) I'd write it as a or b or c, and for all(a, b, c) I'd write a and b and c. Paul From markusmeskanen at gmail.com Tue Aug 1 09:32:38 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Tue, 1 Aug 2017 16:32:38 +0300 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: I'd be more interested in supporting the "key" function: any(users, key=User.is_admin) As opposed to: any(user.is_admin() for user in users) 1.8.2017 16.07 "Louie Lu" kirjoitti: Hi all, In "min" and "max" builtin-func, it support two style of args: min(...) min(iterable, *[, default=obj, key=func]) -> value min(arg1, arg2, *args, *[, key=func]) -> value But for "any" and "all", it only support iterable: all(iterable, /) Return True if bool(x) is True for all values x in the iterable. I'm not sure if this is discuss before, but can "any" and "all" support like min_max "arg1, arg2, *args" style? Thanks, Louie. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Tue Aug 1 09:43:04 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Tue, 1 Aug 2017 09:43:04 -0400 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: I find it frustrating that they always return booleans. It would be more useful if any() returned the first true value it finds. This seems like a backward-compatible-enough change to me... :) --Ned. On 8/1/17 9:32 AM, Markus Meskanen wrote: > I'd be more interested in supporting the "key" function: > > any(users, key=User.is_admin) > > As opposed to: > > any(user.is_admin() for user in users) > > 1.8.2017 16.07 "Louie Lu" > kirjoitti: > > Hi all, > > In "min" and "max" builtin-func, it support two style of args: > > min(...) > min(iterable, *[, default=obj, key=func]) -> value > min(arg1, arg2, *args, *[, key=func]) -> value > > But for "any" and "all", it only support iterable: > > all(iterable, /) > Return True if bool(x) is True for all values x in the > iterable. > > > I'm not sure if this is discuss before, but can "any" and "all" > support like min_max "arg1, arg2, *args" style? > > > Thanks, > Louie. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Aug 1 11:16:02 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Aug 2017 01:16:02 +1000 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: On 1 August 2017 at 23:43, Ned Batchelder wrote: > I find it frustrating that they always return booleans. It would be more > useful if any() returned the first true value it finds. This seems like a > backward-compatible-enough change to me... :) While I'm not sure how to interpret that smiley, I figure it's worth making it explicit that this is decidedly *not* true given type-dependent serialisation protocols like JSON: >>> import json >>> class MyClass: ... def __bool__(self): ... return True ... >>> json.dumps(any([MyClass()])) 'true' >>> json.dumps(MyClass()) Traceback (most recent call last): ... TypeError: Object of type 'MyClass' is not JSON serializable The idea of elevating first_true from its current status as an itertools recipe to actually being an itertools module API has certainly come up before, though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Aug 1 11:28:24 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Aug 2017 01:28:24 +1000 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: On 1 August 2017 at 23:24, Paul Moore wrote: > On 1 August 2017 at 14:01, Louie Lu wrote: >> I'm not sure if this is discuss before, but can "any" and "all" >> support like min_max "arg1, arg2, *args" style? > > I don't see any particular reason why not, but is there a specific use > case for this or is it just a matter of consistency? Unlike max and > min, we already have operators in this case (and/or). I'd imagine that > if I had a use for any(a, b, c) I'd write it as a or b or c, and for > all(a, b, c) I'd write a and b and c. Right, the main correspondence here is with "sum()": folks can't write "sum(a, b, c)", but they can write "a + b + c". The various container constructors are also consistent in only taking an iterable, with multiple explicit items being expected to use the syntactic forms (e.g. [a, b, c], {a, b, c}, (a, b, c)) The same rationale holds for any() and all(): supporting multiple positional arguments would be redundant with the existing binary operator syntax, with no clear reason to ever prefer one option over the other. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From cpitclaudel at gmail.com Tue Aug 1 12:57:30 2017 From: cpitclaudel at gmail.com (=?UTF-8?Q?Cl=c3=a9ment_Pit-Claudel?=) Date: Tue, 1 Aug 2017 18:57:30 +0200 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: On 2017-08-01 17:28, Nick Coghlan wrote: > Right, the main correspondence here is with "sum()": folks can't write > "sum(a, b, c)", but they can write "a + b + c". > > The various container constructors are also consistent in only taking > an iterable, with multiple explicit items being expected to use the > syntactic forms (e.g. [a, b, c], {a, b, c}, (a, b, c)) > > The same rationale holds for any() and all(): supporting multiple > positional arguments would be redundant with the existing binary > operator syntax, with no clear reason to ever prefer one option over > the other. Isn't there a difference, though, insofar as we don't have a '+/sum' or 'and/all' equivalent of [a, b, *c]? You need to write 1 + 3 + sum(xs), or a and b and all(ys). Or, of course, any(chain([a], [b], c)), but that is not pretty. Cl?ment. From lucas.wiman at gmail.com Tue Aug 1 13:22:03 2017 From: lucas.wiman at gmail.com (Lucas Wiman) Date: Tue, 1 Aug 2017 10:22:03 -0700 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: On Tue, Aug 1, 2017 at 6:01 AM, Louie Lu wrote: > [...] > I'm not sure if this is discuss before, but can "any" and "all" > support like min_max "arg1, arg2, *args" style? > Can this be done consistently? For example consider x=[[]]. Then all(x) where x is interpreted as an iterable should be False, but all(x) where x is interpreted as a single argument should be True. This inconsistency already exists for max: >>> max({1, 2}) 2 >>> max({1, 2}, {1}) set([1, 2]) However, it doesn't seem like there's a good reason to add an inconsistency to the API for any/all. - Lucas -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Aug 1 16:51:41 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 1 Aug 2017 16:51:41 -0400 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: On 8/1/2017 9:01 AM, Louie Lu wrote: > Hi all, > > In "min" and "max" builtin-func, it support two style of args: > > min(...) > min(iterable, *[, default=obj, key=func]) -> value > min(arg1, arg2, *args, *[, key=func]) -> value To me, two APIs is a nuisance. For one thing, default has to be keyword only and not just optional. Compare with sum: >>> sum((2,3,4),5) 14 >>> min((2,3,4),5) # Py3 Traceback (most recent call last): File "", line 1, in min((2,3,4),5) TypeError: '<' not supported between instances of 'int' and 'tuple' >>> min((2,3,4),5) # Py2 5 >>> min(5, (2,3,4)) 5 I believe that a version of the second was in original Python (and at least in 1.3) whereas the first was added later, likely with the new iterator protocol (2.2). In any case, with *unpacking in displays, the second is no longer needed. >>> min(4,3, *[1,2]) 1 >>> min((4,3, *[1,2])) 1 If I am correct, perhaps the doc for max and min in https://docs.python.org/3/library/functions.html#max should mention that the 2nd is derived from the original syntax, kept for back compatibility (rather than a new innovation, to be imitated). I would rather get rid of the exceptional case than emulate it. > But for "any" and "all", it only support iterable: > > all(iterable, /) > Return True if bool(x) is True for all values x in the iterable. As Nick pointed out, this is standard now. >>> list((1,2,3)) [1, 2, 3] >>> list(1,2,3) Traceback (most recent call last): File "", line 1, in list(1,2,3) TypeError: list() takes at most 1 argument (3 given) -- Terry Jan Reedy From mistersheik at gmail.com Tue Aug 1 22:48:58 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 1 Aug 2017 19:48:58 -0700 (PDT) Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: <3288d6b5-ebc7-4837-96b2-311e4ad1e69f@googlegroups.com> On Tuesday, August 1, 2017 at 12:58:24 PM UTC-4, Cl?ment Pit-Claudel wrote: > > On 2017-08-01 17:28, Nick Coghlan wrote: > > Right, the main correspondence here is with "sum()": folks can't write > > "sum(a, b, c)", but they can write "a + b + c". > > > > The various container constructors are also consistent in only taking > > an iterable, with multiple explicit items being expected to use the > > syntactic forms (e.g. [a, b, c], {a, b, c}, (a, b, c)) > > > > The same rationale holds for any() and all(): supporting multiple > > positional arguments would be redundant with the existing binary > > operator syntax, with no clear reason to ever prefer one option over > > the other. > > Isn't there a difference, though, insofar as we don't have a '+/sum' or > 'and/all' equivalent of [a, b, *c]? > You need to write 1 + 3 + sum(xs), or a and b and all(ys). Or, of course, > any(chain([a], [b], c)), but that is not pretty. > a or b or any(c) seems clear to me. > Cl?ment. > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Aug 2 11:06:22 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 3 Aug 2017 01:06:22 +1000 Subject: [Python-ideas] "any" and "all" support multiple arguments In-Reply-To: References: Message-ID: On 2 August 2017 at 02:57, Cl?ment Pit-Claudel wrote: > On 2017-08-01 17:28, Nick Coghlan wrote: >> The same rationale holds for any() and all(): supporting multiple >> positional arguments would be redundant with the existing binary >> operator syntax, with no clear reason to ever prefer one option over >> the other. > > Isn't there a difference, though, insofar as we don't have a '+/sum' or 'and/all' equivalent of [a, b, *c]? > You need to write 1 + 3 + sum(xs), or a and b and all(ys). Or, of course, any(chain([a], [b], c)), but that is not pretty. Function calls create an argument tuple anyway, so writing "any(a, b, *ys)" wouldn't actually be significantly more efficient than the current "any((a, b, *ys))" (note the doubled parentheses). You'd potentially save the allocation of a single element tuple to hold the full tuple, but single element tuples are pretty cheap in the grand scheme of things, and Python interpreter implementations often attempt to avoid creating one in the single-positional argument case (since they'd just need to unpack it again to stick it in the corresponding parameter slot). This means that in the case where what you actually want is lazy iteration over the trailing iterable, then you have to use the itertools.chain form: "any(chain((a, b), ys))" The chained binary operator forms also both seem clearer to me than either "sum(1, 3, *xs)" or "any(a, b, *ys)", as those formulations require that the reader know a Python-specific idiosyncratic concept and notation (iterable unpacking), while the binary operator based forms can be interpreted correctly based solely on knowledge of either arithmetic ("+", "sum") or logic ("and", "all"). So while this is an entirely reasonable design question to ask, it turns out there are a few good reasons not to actually make the change: - it doesn't add expressiveness to the language (the binary operator forms already exist, as does the double-parenthesis form) - it doesn't add readability to the language (the iterable unpacking form requires more assumed knowledge than the binary operator form) - it doesn't improve the efficiency of the language (iterable unpacking is an eager operation, not a lazy one, even in function calls) - min() and max() are actually the odd ones out here (for historical reasons), not any(), all() Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From paul_laos at outlook.com Fri Aug 4 03:39:56 2017 From: paul_laos at outlook.com (Paul Laos) Date: Fri, 4 Aug 2017 07:39:56 +0000 Subject: [Python-ideas] Pseudo methods Message-ID: Hi folks I was thinking about how sometimes, a function sometimes acts on classes, and behaves very much like a method. Adding new methods to classes existing classes is currently somewhat difficult, and having pseudo methods would make that easier. Code example: (The syntax can most likely be improved upon) def has_vowels(self: str): for vowel in ["a", "e,", "i", "o", "u"]: if vowel in self: return True This allows one to wring `string.has_vowels()` instead of `has_vowels(string)`, which would make it easier to read, and would make it easier to add functionality to existing classes, without having to extend them. This would be useful for builtins or imported libraries, so one can fill in "missing" methods. * Simple way to extend classes * Improves readability * Easy to understand ~Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine.rozo at gmail.com Fri Aug 4 03:59:42 2017 From: antoine.rozo at gmail.com (Antoine Rozo) Date: Fri, 4 Aug 2017 09:59:42 +0200 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: Message-ID: Hi, With this kind of feature, you never know which methods are included in the class (depending of which modules have been loaded). I don't think this is a good idea. 2017-08-04 9:39 GMT+02:00 Paul Laos : > Hi folks > I was thinking about how sometimes, a function sometimes acts on classes, > and > behaves very much like a method. Adding new methods to classes existing > classes > is currently somewhat difficult, and having pseudo methods would make that > easier. > > Code example: (The syntax can most likely be improved upon) > def has_vowels(self: str): > for vowel in ["a", "e,", "i", "o", "u"]: > if vowel in self: return True > > This allows one to wring `string.has_vowels()` instead of > `has_vowels(string)`, > which would make it easier to read, and would make it easier to add > functionality to existing classes, without having to extend them. This > would be > useful for builtins or imported libraries, so one can fill in "missing" > methods. > > * Simple way to extend classes > * Improves readability > * Easy to understand > > ~Paul > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Antoine Rozo -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Aug 4 04:16:01 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 4 Aug 2017 09:16:01 +0100 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: Message-ID: On 4 August 2017 at 08:39, Paul Laos wrote: > Hi folks > I was thinking about how sometimes, a function sometimes acts on classes, > and behaves very much like a method. Adding new methods to classes existing > classes is currently somewhat difficult, and having pseudo methods would make that > easier. Adding new methods to classes is deliberately (somewhat) difficult, as it makes it harder to locate the definition of a method. If you need to see the code for a method, you'd expect to look in the class definition. Making it common for people to put method definitions outside the class definition harms supportability by breaking that assumption. > Code example: (The syntax can most likely be improved upon) > def has_vowels(self: str): > for vowel in ["a", "e,", "i", "o", "u"]: > if vowel in self: return True > > This allows one to wring `string.has_vowels()` instead of > `has_vowels(string)`, > which would make it easier to read, That's very much a subjective view. Personally, I don't see "string.has_vowels()" as being any easier to read - except in the sense that it tells me that I can find the definition of has_vowels in the class definition of str (and I can find its documentation in the documentation of the str type). And your proposal removes this advantage! > and would make it easier to add > functionality to existing classes, without having to extend them. This would > be useful for builtins or imported libraries, so one can fill in "missing" > methods. This is a common technique in other languages like Ruby, but is considered specialised and somewhat of an advanced technique (monkeypatching) in Python. As you say yourself, the syntax will make it *easier* to do this - it's already possible, so the change doesn't add any new capabilities. Adding new syntax to the language typically needs a much stronger justification (either in terms of enabling fundamentally new techniques, or providing a significantly more natural spelling of something that's widely used and acknowledged as a common programming idiom). Sorry, but I'm -1 on this change. It doesn't let people do anything they can't do now, on the contrary it makes it simpler to use a technique which has readability and supportability problems, which as a result will mean that people will be inclined to use the approach without properly considering the consequences. Paul From steve at pearwood.info Fri Aug 4 07:32:33 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 4 Aug 2017 21:32:33 +1000 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: Message-ID: <20170804113233.GS3149@ando.pearwood.info> Hi Paul, and welcome! On Fri, Aug 04, 2017 at 07:39:56AM +0000, Paul Laos wrote: > Hi folks > I was thinking about how sometimes, a function sometimes acts on classes, and > behaves very much like a method. I'm not really sure what you mean by "acts on classes". I can only think of a function which takes a class as a parameter, and modifies the class. Like a class decorator. Or possibly a classmethod. But that's not what you seem to mean below. So I'm not quite certain I understand your proposal. > Adding new methods to classes existing classes > is currently somewhat difficult, If the class is written in Python, it isn't difficult at all, it is trivially easy. First define your method: def method(self, arg): pass Then inject it onto the class using ordinary attribute assignment: TheClass.method = method And we're done! If the class is a built-in, or otherwise written in C, then "somewhat difficult" is an understatement. I think it can't be done at all. > and having pseudo methods would make that easier. I'm not sure that "easier" in this case would be better. > Code example: (The syntax can most likely be improved upon) > def has_vowels(self: str): > for vowel in ["a", "e,", "i", "o", "u"]: > if vowel in self: return True How does Python, and for that matter the human reader, know which class or classes that method is injected into? My guess is it looks at the annotation. But that's a big change: annotations are currently guaranteed to have no runtime semantics (apart from being stored in the function's __annotation__ attribute). I'm not saying that can't be done, but there may be consequences we haven't thought of. If we say dir(str), will "has_vowels" show up? How about vars(str)? How does this interact with metaclasses? > This allows one to wring `string.has_vowels()` instead of `has_vowels(string)`, > which would make it easier to read, Well that's one opinion. > and would make it easier to add > functionality to existing classes, without having to extend them. This would be > useful for builtins or imported libraries, so one can fill in "missing" methods. http://www.virtuouscode.com/2008/02/23/why-monkeypatching-is-destroying-ruby/ I think monkeypatching is great, so long as I'm the only one that does it. When other people do it, invariably they introduce bugs into my code by monkeypatching other things I didn't expect to be monkeypatched. > * Simple way to extend classes > * Improves readability > * Easy to understand I'll agree with the first one of those, if by "simple" you mean "somebody else did all the work to make this syntax do what I want it to do". The work behind the scenes is not likely to be simple: for starters, allowing monkeypatching of built-ins is likely going to require a rather big re-design of the Python interpreter. -- Steve From pobocks at gmail.com Fri Aug 4 09:00:15 2017 From: pobocks at gmail.com (David Mayo) Date: Fri, 4 Aug 2017 09:00:15 -0400 Subject: [Python-ideas] Collection type argument for argparse where nargs != None Message-ID: A friend of mine (@bcjbcjbcj on twitter) came up with an idea for an argparse improvement that I'd like to propose for inclusion. Currently, argparse with nargs= collects arguments into a list (or a list of lists in the case of action="append"). I would like to propose adding a "collection type" argument to the store and append actions and to add_argument, consisting of a callable that would be applied to the list of type-converted args before adding them to the Namespace. This would allow for alternate constructors (e.g. set), for modifying the list (e.g. with sorted), or to do checking of properties expected across all components of the argument at parse time. I've worked up a set of examples in this gist: https://gist.github.com/ pobocks/bff0bea494f2b7ec7eba1e8ae281b888 And a rough implementation here: https://github.com/python/ cpython/compare/master...pobocks:argparse_colltype I think this would be genuinely useful, and would require very little change to argparse, which should be backwards compatible provided that the default for the collection type is list, or None with list specified if None. Thank you all for your time in considering this, - Dave Mayo @pobocks on twitter, github, various others -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Fri Aug 4 09:20:55 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 4 Aug 2017 10:20:55 -0300 Subject: [Python-ideas] Pseudo methods In-Reply-To: <20170804113233.GS3149@ando.pearwood.info> References: <20170804113233.GS3149@ando.pearwood.info> Message-ID: Had not this been discussed here earlier this year? (And despite there being perceived dangers to readability in the long term, was accepted?) Here it is on an archive: https://mail.python.org/pipermail/python-ideas/2017-February/044551.html And anyway - along that discussion, despite dislikng the general idea, I got convinced that creating an outside method that makes "super" or "__class__" work was rather complicated. Maybe we could just have a decorator for that, that would properly create the __class__ cell? js -><- On 4 August 2017 at 08:32, Steven D'Aprano wrote: > Hi Paul, and welcome! > > > On Fri, Aug 04, 2017 at 07:39:56AM +0000, Paul Laos wrote: > > Hi folks > > I was thinking about how sometimes, a function sometimes acts on > classes, and > > behaves very much like a method. > > I'm not really sure what you mean by "acts on classes". I can only think > of a function which takes a class as a parameter, and modifies the > class. Like a class decorator. Or possibly a classmethod. But that's not > what you seem to mean below. So I'm not quite certain I understand your > proposal. > > > > Adding new methods to classes existing classes > > is currently somewhat difficult, > > If the class is written in Python, it isn't difficult at all, it is > trivially easy. First define your method: > > def method(self, arg): > pass > > > Then inject it onto the class using ordinary attribute assignment: > > TheClass.method = method > > And we're done! > > If the class is a built-in, or otherwise written in C, then "somewhat > difficult" is an understatement. I think it can't be done at all. > > > > and having pseudo methods would make that easier. > > I'm not sure that "easier" in this case would be better. > > > > Code example: (The syntax can most likely be improved upon) > > def has_vowels(self: str): > > for vowel in ["a", "e,", "i", "o", "u"]: > > if vowel in self: return True > > > How does Python, and for that matter the human reader, know which > class or classes that method is injected into? My guess is it looks at > the annotation. But that's a big change: annotations are currently > guaranteed to have no runtime semantics (apart from being stored in the > function's __annotation__ attribute). I'm not saying that can't be done, > but there may be consequences we haven't thought of. > > If we say dir(str), will "has_vowels" show up? > > How about vars(str)? > > How does this interact with metaclasses? > > > > > This allows one to wring `string.has_vowels()` instead of > `has_vowels(string)`, > > which would make it easier to read, > > Well that's one opinion. > > > > and would make it easier to add > > functionality to existing classes, without having to extend them. This > would be > > useful for builtins or imported libraries, so one can fill in "missing" > methods. > > http://www.virtuouscode.com/2008/02/23/why-monkeypatching- > is-destroying-ruby/ > > I think monkeypatching is great, so long as I'm the only one that does > it. When other people do it, invariably they introduce bugs into my code > by monkeypatching other things I didn't expect to be monkeypatched. > > > > * Simple way to extend classes > > * Improves readability > > * Easy to understand > > I'll agree with the first one of those, if by "simple" you mean > "somebody else did all the work to make this syntax do > what I want it to do". > > The work behind the scenes is not likely to be simple: for starters, > allowing monkeypatching of built-ins is likely going to require a rather > big re-design of the Python interpreter. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Aug 4 09:31:48 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 4 Aug 2017 14:31:48 +0100 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: <20170804113233.GS3149@ando.pearwood.info> Message-ID: On 4 August 2017 at 14:20, Joao S. O. Bueno wrote: > Had not this been discussed here earlier this year? > > (And despite there being perceived dangers to readability in the long term, > was accepted?) > > Here it is on an archive: > https://mail.python.org/pipermail/python-ideas/2017-February/044551.html >From a very brief review of the end of that thread, it looks like it was agreed that a PEP might be worthwhile - it was expected to be rejected, though, and the PEP would simply document the discussion and the fact that the idea was rejected. This agrees with my recollection of the discussion, as well. But as far as I'm aware, no-one ever wrote that PEP. (Not surprising, I guess, as it's hard to get enthusiastic about proposing an idea you know in advance will be rejected). Paul From jsbueno at python.org.br Fri Aug 4 09:42:21 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 4 Aug 2017 10:42:21 -0300 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: <20170804113233.GS3149@ando.pearwood.info> Message-ID: On 4 August 2017 at 10:31, Paul Moore wrote: > On 4 August 2017 at 14:20, Joao S. O. Bueno wrote: > > Had not this been discussed here earlier this year? > > > > (And despite there being perceived dangers to readability in the long > term, > > was accepted?) > > > > Here it is on an archive: > > https://mail.python.org/pipermail/python-ideas/2017-February/044551.html > > From a very brief review of the end of that thread, it looks like it > was agreed that a PEP might be worthwhile - it was expected to be > rejected, though, and the PEP would simply document the discussion and > the fact that the idea was rejected. This agrees with my recollection > of the discussion, as well. But as far as I'm aware, no-one ever wrote > that PEP. (Not surprising, I guess, as it's hard to get enthusiastic > about proposing an idea you know in advance will be rejected). > > Nonetheless, a third party module with some decorators to allow doing that "the right way" might be usefull. If one is willing to write, or retrieve a candidate for that. (I don? t think it is possible to inject the __class__ cell in a clean way, though) js -><- > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Aug 4 10:37:01 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 5 Aug 2017 00:37:01 +1000 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: <20170804113233.GS3149@ando.pearwood.info> Message-ID: <20170804143700.GU3149@ando.pearwood.info> On Fri, Aug 04, 2017 at 10:20:55AM -0300, Joao S. O. Bueno wrote: > Had not this been discussed here earlier this year? > > (And despite there being perceived dangers to readability in the long term, > was accepted?) > > Here it is on an archive: > https://mail.python.org/pipermail/python-ideas/2017-February/044551.html I don't read this as the same proposal. For starters, I don't believe that it was intended to allow monkey-patching of builtins. Another is that the syntax is much more explicit about where the method is going: def MyClass.method(self, arg): ... is clearly a method of MyClass. There was, if I recall, some open discussion of whether arbitrary assignment targets should be allowed: def module.func(x or None)[23 + n].attr.__type__.method(self, arg): ... or if we should intentionally limit the allowed syntax, like we do for decorators. My vote is for intentionally limiting it to a single dotted name, like MyClass.method. > And anyway - along that discussion, despite dislikng the general idea, I > got convinced that > creating an outside method that makes "super" or "__class__" work was > rather complicated. Complicated is an understatement. It's horrid :-) Here's the problem: we can successfully inject methods into a class: # -----%<----- class Parent: def spam(self): return "spam" class Child(Parent): def food(self): return 'yummy ' + self.spam() c = Child() c.food() # returns 'yummy spam' as expected # inject a new method def spam(self): return 'spam spam spam' Child.spam = spam c.food() # returns 'yummy spam spam spam' as expected # -----%<----- But not if you use the zero-argument form of super(): # -----%<----- del Child.spam # revert to original def spam(self): s = super().spam() return ' '.join([s]*3) Child.spam = spam c.food() # -----%<----- This raises: RuntimeError: super(): __class__ cell not found This is the simplest thing I've found that will fix it: # -----%<----- del Child.spam # revert to original again def outer(): __class__ = Child def spam(self): s = super().spam() return ' '.join([s]*3) return spam Child.spam = outer() c.food() # returns 'yummy spam spam spam' as expected # -----%<----- It's probably possibly to wrap this up in a decorator that takes Child as argument, but I expect it will probably require messing about with the undocumented FunctionType constructor to build up a new closure from the bits and pieces scavenged from the decorated function. > Maybe we could just have a decorator for that, that would properly create > the __class__ cell? I expect its possible. A challenge to somebody who wants to get their hands dirty. -- Steve From brett at python.org Fri Aug 4 12:33:49 2017 From: brett at python.org (Brett Cannon) Date: Fri, 04 Aug 2017 16:33:49 +0000 Subject: [Python-ideas] Collection type argument for argparse where nargs != None In-Reply-To: References: Message-ID: I'm not a heavy argparse user so take my opinion with a grain of salt (and I do appreciate the time you put into proposing this), but I'm not seeing the usefulness to classify this as so pragmatic as to outweigh adding one more thing to explain about argparse. Since you're proposing just having a callable to use after constructing the list couldn't you just do e.g. `args.stuff = frozenset(args.stuff)` instead and just be explicit about it? On Fri, Aug 4, 2017, 06:01 David Mayo, wrote: > A friend of mine (@bcjbcjbcj on twitter) came up with an idea for an > argparse improvement that I'd like to propose for inclusion. > > Currently, argparse with nargs= collects arguments into > a list (or a list of lists in the case of action="append"). I would like to > propose adding a "collection type" argument to the store and append actions > and to add_argument, consisting of a callable that would be applied to the > list of type-converted args before adding them to the Namespace. This would > allow for alternate constructors (e.g. set), for modifying the list (e.g. > with sorted), or to do checking of properties expected across all > components of the argument at parse time. > > I've worked up a set of examples in this gist: > https://gist.github.com/pobocks/bff0bea494f2b7ec7eba1e8ae281b888 > > And a rough implementation here: > https://github.com/python/cpython/compare/master...pobocks:argparse_colltype > > I think this would be genuinely useful, and would require very little > change to argparse, which should be backwards compatible provided that the > default for the collection type is list, or None with list specified if > None. > > Thank you all for your time in considering this, > > - Dave Mayo > @pobocks on twitter, github, various others > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pobocks at gmail.com Fri Aug 4 12:56:13 2017 From: pobocks at gmail.com (David Mayo) Date: Fri, 4 Aug 2017 12:56:13 -0400 Subject: [Python-ideas] Collection type argument for argparse where nargs != None In-Reply-To: References: Message-ID: I mean, it's definitely possible, but I'd argue that's actually not any more explicit - and, in fact, args.stuff = something(args.stuff) is arguably less explicit because it's just an arbitrary transform, rather than being called out as "this is the wrapper element for these args." The places where I see doing transforms after as substantially worse than this: 1. any case where a single parser is being used in multiple scripts, or being extended. Moving this kind of thing out of the parser means the logic has to be replicated outside the parser everywhere it's called. 2. validation of multiple arguments against each other - once you're out of the parser, you have to write separate error handling code instead of just throwing the right exception. - Dave On Fri, Aug 4, 2017 at 12:33 PM, Brett Cannon wrote: > I'm not a heavy argparse user so take my opinion with a grain of salt (and > I do appreciate the time you put into proposing this), but I'm not seeing > the usefulness to classify this as so pragmatic as to outweigh adding one > more thing to explain about argparse. Since you're proposing just having a > callable to use after constructing the list couldn't you just do e.g. > `args.stuff = frozenset(args.stuff)` instead and just be explicit about it? > > On Fri, Aug 4, 2017, 06:01 David Mayo, wrote: > >> A friend of mine (@bcjbcjbcj on twitter) came up with an idea for an >> argparse improvement that I'd like to propose for inclusion. >> >> Currently, argparse with nargs= collects arguments >> into a list (or a list of lists in the case of action="append"). I would >> like to propose adding a "collection type" argument to the store and append >> actions and to add_argument, consisting of a callable that would be applied >> to the list of type-converted args before adding them to the Namespace. >> This would allow for alternate constructors (e.g. set), for modifying the >> list (e.g. with sorted), or to do checking of properties expected across >> all components of the argument at parse time. >> >> I've worked up a set of examples in this gist: https://gist.github.com/ >> pobocks/bff0bea494f2b7ec7eba1e8ae281b888 >> >> And a rough implementation here: https://github.com/python/ >> cpython/compare/master...pobocks:argparse_colltype >> >> I think this would be genuinely useful, and would require very little >> change to argparse, which should be backwards compatible provided that the >> default for the collection type is list, or None with list specified if >> None. >> >> Thank you all for your time in considering this, >> >> - Dave Mayo >> @pobocks on twitter, github, various others >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Sat Aug 5 06:49:46 2017 From: barry at barrys-emacs.org (Barry) Date: Sat, 5 Aug 2017 11:49:46 +0100 Subject: [Python-ideas] HTTP compression support for http.server In-Reply-To: References: Message-ID: Does you code allow suporting more then gzip? For example Brotli compression is becoming inmportant for some web apps. Barry > On 24 Jul 2017, at 17:30, Chris Angelico wrote: > >> On Tue, Jul 25, 2017 at 2:20 AM, Chris Barker wrote: >> On Thu, Jul 20, 2017 at 12:15 AM, Pierre Quentel >> wrote: >>> - if so, should it be supported by default ? It is the case in the PR, >>> where a number of content types, eg text/html, are compressed if the user >>> agent accepts the gzip "encoding" >> >> >> I'm pretty wary of compression happening by default -- i.e. someone runs >> exactly the same code with a newer version of Python, and suddenly some >> content is getting compressed. > > FWIW I'm quite okay with that. HTTP already has a mechanism for > negotiating compression (Accept-Encoding), designed to be compatible > with servers that don't support it. Any time a server gains support > for something that clients already support, it's going to start > happening as soon as you upgrade. > > Obviously this kind of change won't be happening in a bugfix release > of Python, so it would be part of the regular checks when you upgrade > from 3.6 to 3.7 - it'll be in the NEWS file and so on, so you read up > on it before you upgrade. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Aug 7 02:36:58 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 7 Aug 2017 15:36:58 +0900 Subject: [Python-ideas] Collection type argument for argparse where nargs != None In-Reply-To: References: Message-ID: <22920.2698.483865.970120@turnbull.sk.tsukuba.ac.jp> David Mayo writes: > I mean, it's definitely possible, but I'd argue that's actually not any > more explicit - and, in fact, args.stuff = something(args.stuff) is > arguably less explicit because it's just an arbitrary transform, rather > than being called out as "this is the wrapper element for these > args." The problem is third parties trying to read and work with the code, who now have to go read not only the definition of the parser, but the definition of the wrapper element (which is an arbitrary transform with a specified role). I think argparse is complex enough already. For both of your use cases (argparsers as reusable components and validation of collection arguments), I don't see why they can't be done in a subclass. This argument doesn't kill your proposal, and I'm just one rando, but FWIW I'm -0.5 on it for now. Steve From pierre.quentel at gmail.com Mon Aug 7 04:19:30 2017 From: pierre.quentel at gmail.com (Pierre Quentel) Date: Mon, 7 Aug 2017 10:19:30 +0200 Subject: [Python-ideas] HTTP compression support for http.server In-Reply-To: References: Message-ID: 2017-08-05 12:49 GMT+02:00 Barry : > Does you code allow suporting more then gzip? For example Brotli > compression is becoming inmportant for some web apps. > > Barry > In the latest version of the Pull Request, only gzip is supported. But your comment makes me think that the code should probably be more modular so that subclasses of SimpleHTTPRequestHandler could handle other algorithms. -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Aug 7 04:48:45 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 7 Aug 2017 10:48:45 +0200 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: Message-ID: Ruby provides this feature. A friend who is a long term user of Rails complained that Rails abuses this and it's a mess in practice. So I dislike this idea. Victor 2017-08-04 9:39 GMT+02:00 Paul Laos : > Hi folks > I was thinking about how sometimes, a function sometimes acts on classes, > and > behaves very much like a method. Adding new methods to classes existing > classes > is currently somewhat difficult, and having pseudo methods would make that > easier. > > Code example: (The syntax can most likely be improved upon) > def has_vowels(self: str): > for vowel in ["a", "e,", "i", "o", "u"]: > if vowel in self: return True > > This allows one to wring `string.has_vowels()` instead of > `has_vowels(string)`, > which would make it easier to read, and would make it easier to add > functionality to existing classes, without having to extend them. This would > be > useful for builtins or imported libraries, so one can fill in "missing" > methods. > > * Simple way to extend classes > * Improves readability > * Easy to understand > > ~Paul > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From fakedme+py at gmail.com Mon Aug 7 15:30:05 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 7 Aug 2017 16:30:05 -0300 Subject: [Python-ideas] Generator syntax hooks? Message-ID: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> The generator syntax, (x for x in i if c), currently always creates a new generator. I find this quite inefficient: {x for x in integers if 1000 <= x < 1000000} # never completes, because it's trying to iterate over all integers What if, somehow, object `integers` could hook the generator and produce the equivalent of {x for x in range(1000, 1000000)}, which does complete? What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar for (x for x in range(1000, 1000000))? (I like mathy syntax. Do you like mathy syntax?) From rosuav at gmail.com Mon Aug 7 16:14:52 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Aug 2017 06:14:52 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On Tue, Aug 8, 2017 at 5:30 AM, Soni L. wrote: > The generator syntax, (x for x in i if c), currently always creates a new > generator. I find this quite inefficient: > > {x for x in integers if 1000 <= x < 1000000} # never completes, because it's > trying to iterate over all integers > > What if, somehow, object `integers` could hook the generator and produce the > equivalent of {x for x in range(1000, 1000000)}, which does complete? > > What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar for > (x for x in range(1000, 1000000))? > > (I like mathy syntax. Do you like mathy syntax?) I don't. I prefer to stick with the syntax we already have. The alternative is a more verbose way to identify a range, plus you need a new global "integers" which implies that you could iterate over "reals" the same way (after all, mathematics doesn't mind you working with a subset of reals the same way you'd work with a subset of ints). And good luck iterating over all the reals. :) ChrisA From chris.barker at noaa.gov Mon Aug 7 19:06:32 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 7 Aug 2017 19:06:32 -0400 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On Mon, Aug 7, 2017 at 4:14 PM, Chris Angelico wrote: > On Tue, Aug 8, 2017 at 5:30 AM, Soni L. wrote: > > The generator syntax, (x for x in i if c), currently always creates a new > > generator. that's what it's for -- I'm confused as to what the problem is. > > {x for x in integers if 1000 <= x < 1000000} # never completes, because > it's > > trying to iterate over all integers > this is a set comprehension -- but what is "integers"? is it a generator? in which case, it should take an argument so it knows when to end. Or if it's really that symple, that's what range() is for. However, similarly, I find that sometimes I want to iterate over a slice of a sequence, but do'nt want to actually make the slice first. So there is itertools.islice() If "integers" is a sequence: {x for x in integers[1000:10000]} makes an unneeded copy of that slice. {x for x in itertools.islice(integers, 1000, 10000)} will iterate on the fly, and not make any extra copies. It would be nice to have an easier access to an "slice iterator" though -- one of these days I may write up a proposal for that. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Aug 7 19:35:37 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 8 Aug 2017 09:35:37 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: <20170807233537.GC3149@ando.pearwood.info> Hi Soni, and welcome! On Mon, Aug 07, 2017 at 04:30:05PM -0300, Soni L. wrote: > What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar > for (x for x in range(1000, 1000000))? If you want the integers from 1000 to 1000000, use: range(1000, 1000000) Don't waste your time slowing down the code with an unnecessary and pointless wrapper that does nothing but pass every value on unchanged: (x for x in range(1000, 1000000)) # waste of time and effort -- Steve From fakedme+py at gmail.com Mon Aug 7 19:56:20 2017 From: fakedme+py at gmail.com (Soni L.) Date: Mon, 7 Aug 2017 20:56:20 -0300 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <20170807233537.GC3149@ando.pearwood.info> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170807233537.GC3149@ando.pearwood.info> Message-ID: <9e9978d7-1910-4453-e085-8f419fbc2dda@gmail.com> On 2017-08-07 08:35 PM, Steven D'Aprano wrote: > Hi Soni, and welcome! > > On Mon, Aug 07, 2017 at 04:30:05PM -0300, Soni L. wrote: > >> What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar >> for (x for x in range(1000, 1000000))? > If you want the integers from 1000 to 1000000, use: > > range(1000, 1000000) > > Don't waste your time slowing down the code with an unnecessary and > pointless wrapper that does nothing but pass every value on unchanged: > > (x for x in range(1000, 1000000)) # waste of time and effort > > > Actually, those have different semantics! >>> x = range(1, 10) >>> list(x) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(x) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> x = (x for x in range(1, 10)) >>> list(x) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(x) [] From stefan_ml at behnel.de Tue Aug 8 03:48:17 2017 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 8 Aug 2017 09:48:17 +0200 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <9e9978d7-1910-4453-e085-8f419fbc2dda@gmail.com> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170807233537.GC3149@ando.pearwood.info> <9e9978d7-1910-4453-e085-8f419fbc2dda@gmail.com> Message-ID: Soni L. schrieb am 08.08.2017 um 01:56: > On 2017-08-07 08:35 PM, Steven D'Aprano wrote: >> Hi Soni, and welcome! >> >> On Mon, Aug 07, 2017 at 04:30:05PM -0300, Soni L. wrote: >> >>> What if, (x for x in integers if 1000 <= x < 1000000), was syntax sugar >>> for (x for x in range(1000, 1000000))? >> If you want the integers from 1000 to 1000000, use: >> >> range(1000, 1000000) >> >> Don't waste your time slowing down the code with an unnecessary and >> pointless wrapper that does nothing but pass every value on unchanged: >> >> (x for x in range(1000, 1000000)) # waste of time and effort > > Actually, those have different semantics! > >>>> x = range(1, 10) >>>> list(x) > [1, 2, 3, 4, 5, 6, 7, 8, 9] >>>> list(x) > [1, 2, 3, 4, 5, 6, 7, 8, 9] > >>>> x = (x for x in range(1, 10)) >>>> list(x) > [1, 2, 3, 4, 5, 6, 7, 8, 9] >>>> list(x) > [] In that case, use iter(range(1000, 1000000)). range() creates an iterable, which is iterable more than once. iter(range()) creates an iterator from that iterable, which has the semantics that you apparently wanted. Stefan From turnbull.stephen.fw at u.tsukuba.ac.jp Tue Aug 8 14:45:31 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 9 Aug 2017 03:45:31 +0900 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <9e9978d7-1910-4453-e085-8f419fbc2dda@gmail.com> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170807233537.GC3149@ando.pearwood.info> <9e9978d7-1910-4453-e085-8f419fbc2dda@gmail.com> Message-ID: <22922.1739.304051.790882@turnbull.sk.tsukuba.ac.jp> >>>>> Soni L. writes: > Steven d'Aprano writes: > > range(1000, 1000000) > > (x for x in range(1000, 1000000)) # waste of time and effort > Actually, those have different semantics! That's not real important. As Stefan Behnel points out, it's simple (and efficient) to get iterator semantics by using iter(). The big issue here is that Python is not the kind of declarative language where (x for x in int if 1_000 ? x ? 1_000_000)[1] is natural to write, let alone easy to implement efficiently. Aside from the problem of (x for x in float if 1_000 ? x ? 1_000_000) (where the answer is "just don't do that"), I can't think of any unbounded collections in Python that aren't iterables, except some types. That makes Steven's criticism pretty compelling. If you need to design a collection's __iter__ specially to allow it to decide whether the subset that satisfies some condition is exhausted, why not just subclass some appropriate existing collection with a more appropriate __iter__? Footnotes: [1] See what I did there? ;-) From ncoghlan at gmail.com Wed Aug 9 00:18:58 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Aug 2017 14:18:58 +1000 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: Message-ID: On 7 August 2017 at 18:48, Victor Stinner wrote: > Ruby provides this feature. A friend who is a long term user of Rails > complained that Rails abuses this and it's a mess in practice. So I > dislike this idea. Right, Python's opinionated design guidance is to clearly distinguish between "data first" designs using methods on objects and "algorithm first" designs using functools.singledispatch (or similar mechanisms), since they place different constraints on how new implementations are added, and where you should look for more information about how an algorithm works. Part of the intent behind this guidance is to better enable local reasoning about a piece of code: from my_string_utils import has_vowels if has_vowels(input("Enter a word: ")): print("Contains vowels!") else: print("Does not contain vowels!") Here, it is clear that if we want to know more about what "has_vowels" does, or if we want to request changes to how it works, then "my_string_utils" is where we need to go next. By contrast, that's significantly less clear if our string utils module were to implicitly modify the behaviour of input() or builtin strings: import my_string_utils if input("Enter a word: ").has_vowels(): print("Contains vowels!") else: print("Does not contain vowels!") To analyse and investigate this code, we need to "just know" that: - the result of "input()" doesn't normally have a "has_vowels()" method - therefore, importing "my_string_utils" must have either replaced the input builtin or mutated the str type - therefore, "my_string_utils" is probably the place to go for more information on "has_vowels" If our import line had instead looked like "import my_string_utils, my_other_utils", we'd have to go look at both of them to figure out where the "has_vowels()" method might be coming from (and hope it wasn't happening further down as a side effect of one of the modules *they* imported). Injecting methods rather than writing functions that dispatch on the type of their first argument also creates new opportunities for naming conflicts: while "my_string_utils.has_vowels" and "your_string_utils.has_vowels" can happily coexist in the same program without conflicts, there's only one "input" builtin, and only one "str" builtin. Can this level of explicitness be an obstacle at times? Yes, it can, especially for testing and interactive use, which is why Python offers features like wildcard imports, runtime support for monkeypatching of user-defined types, and runtime support for dynamically replacing builtins and module globals. However, the concerns around the difficulties of complexity management in the face of implicit action at a distance remain valid, so those features all fall into the category of "supported, but not encouraged, except in specific circumstances". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Aug 9 01:06:54 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Aug 2017 15:06:54 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On 8 August 2017 at 09:06, Chris Barker wrote: > It would be nice to have an easier access to an "slice iterator" though -- > one of these days I may write up a proposal for that. An idea I've occasionally toyed with [1] is some kind of "iterview" that wraps around an arbitrary iterable and produces lazy itertools based results rather than immediate views or copies. However, my experience is also that folks are *really* accustomed to syntactic operations on containers producing either full live views (e.g. memoryview or numpy slices, range as a dynamically computed container), or actual copies (builtin container types). Having them produce consumable iterators instead then gets confusing due to the number of operations that will implicitly consume them (including simple "x in y" checks). The OP's proposal doesn't fit into that category though: rather it's asking about the case where we have an infinite iterator (e.g. itertools.count(0)), and want to drop items until they start meeting some condition (i.e. itertools.dropwhile) and then terminate the iterator as soon as another condition is no longer met (i.e. itertools.takewhile). Right now, getting the "terminate when false" behaviour requires the use of takewhile: {itertools.takewhile(lambda x: x < 1000000, itertools.count(1000)} In these cases, the standard generator expression syntax is an attractive nuisance because it *looks* right from a mathematical perspective, but hides an infinite loop: {x for x in itertools.count(0) if 1000 <= x < 1000000} The most credible proposal to address this that I've seen is to borrow the "while" keyword in its "if not x: break" interpretation to get: {x for x in itertools.count(0) if 1000 <= x while x < 1000000} which would be compiled as equivalent to: x = set() for x in itertools.count(0): if 1000 <= x: set.add(x) if not x < 1000000: break (and similarly for all of the other comprehension variants) There aren't any technical barriers I'm aware of to implementing that, with the main historical objection being that instead of the comprehension level while clause mapping to a while loop directly the way the for and if clauses map to their statement level counterparts, it would instead map to the conditional break in the expanded loop-and-a-half form: while True: if not condition: break While it's taken me a long time to come around to the idea, "Make subtle infinite loops in mathematical code easier to avoid" *is* a pretty compelling user-focused justification for incurring that extra complexity at the language design level. Cheers, Nick. [1] https://mail.python.org/pipermail/python-ideas/2010-April/006983.html -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Aug 9 01:38:22 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Aug 2017 22:38:22 -0700 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan wrote: > On 8 August 2017 at 09:06, Chris Barker wrote: > > It would be nice to have an easier access to an "slice iterator" though > -- > > one of these days I may write up a proposal for that. > > An idea I've occasionally toyed with [1] is some kind of "iterview" > that wraps around an arbitrary iterable and produces lazy itertools > based results rather than immediate views or copies. > > However, my experience is also that folks are *really* accustomed to > syntactic operations on containers producing either full live views > (e.g. memoryview or numpy slices, range as a dynamically computed > container), or actual copies (builtin container types). Having them > produce consumable iterators instead then gets confusing due to the > number of operations that will implicitly consume them (including > simple "x in y" checks). > > The OP's proposal doesn't fit into that category though: rather it's > asking about the case where we have an infinite iterator (e.g. > itertools.count(0)), and want to drop items until they start meeting > some condition (i.e. itertools.dropwhile) and then terminate the > iterator as soon as another condition is no longer met (i.e. > itertools.takewhile). > I don't think that's what the OP meant. The original proposal seemed to assume that it would be somehow reasonable for the input ("integers" in the example) to be able to see and parse the condition in the generator expression ("1000 <= x < 100000" in the example, with "x" somehow known to be bound to the iteration value). That's at least what I think the remark "I like mathy syntax" referred to. > Right now, getting the "terminate when false" behaviour requires the > use of takewhile: > > {itertools.takewhile(lambda x: x < 1000000, itertools.count(1000)} > > In these cases, the standard generator expression syntax is an > attractive nuisance because it *looks* right from a mathematical > perspective, but hides an infinite loop: > > {x for x in itertools.count(0) if 1000 <= x < 1000000} > > The most credible proposal to address this that I've seen is to borrow > the "while" keyword in its "if not x: break" interpretation to get: > > {x for x in itertools.count(0) if 1000 <= x while x < 1000000} > > which would be compiled as equivalent to: > > x = set() > for x in itertools.count(0): > if 1000 <= x: > set.add(x) > if not x < 1000000: > break > > (and similarly for all of the other comprehension variants) > > There aren't any technical barriers I'm aware of to implementing that, > with the main historical objection being that instead of the > comprehension level while clause mapping to a while loop directly the > way the for and if clauses map to their statement level counterparts, > it would instead map to the conditional break in the expanded > loop-and-a-half form: > > while True: > if not condition: > break > > While it's taken me a long time to come around to the idea, "Make > subtle infinite loops in mathematical code easier to avoid" *is* a > pretty compelling user-focused justification for incurring that extra > complexity at the language design level. > I haven't come around to this yet. It looks like it will make explaining comprehensions more complex, since the translation of "while X" into "if not X: break" feels less direct than the translations of "for x in xs" or "if pred(x)". (In particular, your proposal seems to require more experience with mentally translating loops and conditions into jumps -- most regulars of this forum do that for a living, but I doubt it's second nature for the OP.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tarek at ziade.org Wed Aug 9 03:56:20 2017 From: tarek at ziade.org (=?utf-8?Q?Tarek=20Ziad=C3=A9?=) Date: Wed, 09 Aug 2017 09:56:20 +0200 Subject: [Python-ideas] Argparse argument deprecation Message-ID: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> Hey, I don't think there's any helper to deprecate an argument in argparse Let's say you have a --foo option in your CLI and want to deprecate it in the next release before you completely remove it later. My first though on how to do this by adding a new "deprecated" option to https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser.add_argument "deprecated" would be a callable that is called after the argument has been parsed by argparse, so the developer can decide if they want to issue a deprecation warning, use the parsed value or override it etc. Another interesting approach suggest by Doug Hellman, which I like as much, is a set of higher level options that provide a deprecation workflow for arguments, see https://github.com/openstack/oslo.config/blob/master/oslo_config/cfg.py#L441 What do you think? Cheers Tarek -- Tarek Ziad? | coding: https://ziade.org | running: https://foule.es | twitter: @tarek_ziade From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 9 04:18:24 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 9 Aug 2017 17:18:24 +0900 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: <22922.50512.758357.265110@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > Right now, getting the "terminate when false" behaviour requires the > use of takewhile: > > {itertools.takewhile(lambda x: x < 1000000, itertools.count(1000)} My objection to this interpretation is different from Guido's (I think): if you're really thinking in terms of math, sets are *unordered*, and therefore "takewhile" doesn't guarantee exhaustion of the desired subset. Another way to put this is that in order to make it harder to get bit by subtle infloops, you're going to give more teeth to "Miller time came early"[1] bugs. This may be a bigger issue than some may think, because sets and dicts are iterable, and order of iteration is arbitrary (at best history- dependent). Footnotes: [1] American beer commercial claiming that real men go to drink beer after a full day's work. From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 9 04:19:41 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 9 Aug 2017 17:19:41 +0900 Subject: [Python-ideas] Pseudo methods In-Reply-To: References: Message-ID: <22922.50589.497352.85140@turnbull.sk.tsukuba.ac.jp> Nick Coghlan writes: > To analyse and investigate this code, we need to "just know" that: You can of course hope that help(input().has_vowels) will tell you where to find it. If it doesn't, well, shame on you for depending on source-unavailable software that you don't understand. ;-) I'm with you on implementing this feature; I don't like it. But I don't think the discoverability situation is as dire as you suggest. From desmoulinmichel at gmail.com Wed Aug 9 05:23:45 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Wed, 9 Aug 2017 11:23:45 +0200 Subject: [Python-ideas] Argparse argument deprecation In-Reply-To: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> References: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> Message-ID: <4972cbcc-0972-4d6f-4b80-4e66a448e76a@gmail.com> +1, but I would make "deprecated" either a warning, an exception or a callable. This way to create a simple deprecation, you just provide DeprecationWarning('This will be gone in the next release'), or ValueError('This has been removed in 2.X, use "stuff instead"') if you decide it's gone for good. But if you need a custom behavior, you pass in a callable. Le 09/08/2017 ? 09:56, Tarek Ziad? a ?crit : > Hey, > > I don't think there's any helper to deprecate an argument in argparse > > Let's say you have a --foo option in your CLI and want to deprecate it > in the next release before you completely remove it later. > > My first though on how to do this by adding a new "deprecated" option to > https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser.add_argument > > "deprecated" would be a callable that is called after the argument has > been parsed by argparse, > so the developer can decide if they want to issue a deprecation warning, > use the parsed value or override it etc. > > Another interesting approach suggest by Doug Hellman, which I like as > much, is a set of higher level options that > provide a deprecation workflow for arguments, see > > https://github.com/openstack/oslo.config/blob/master/oslo_config/cfg.py#L441 > > What do you think? > > Cheers > Tarek > From ned at nedbatchelder.com Wed Aug 9 05:50:47 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Wed, 9 Aug 2017 05:50:47 -0400 Subject: [Python-ideas] Argparse argument deprecation In-Reply-To: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> References: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> Message-ID: On 8/9/17 3:56 AM, Tarek Ziad? wrote: > Hey, > > I don't think there's any helper to deprecate an argument in argparse > > Let's say you have a --foo option in your CLI and want to deprecate it > in the next release before you completely remove it later. > > My first though on how to do this by adding a new "deprecated" option to > https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser.add_argument > > "deprecated" would be a callable that is called after the argument has > been parsed by argparse, > so the developer can decide if they want to issue a deprecation warning, > use the parsed value or override it etc. I don't see why this is something that argparse has to do. The semantics of options is handled by the rest of the program. Why would the parser be issuing these warnings? Let argparse parse the options, then let other code deal with what they *mean*. --Ned. From desmoulinmichel at gmail.com Wed Aug 9 05:54:23 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Wed, 9 Aug 2017 11:54:23 +0200 Subject: [Python-ideas] Argparse argument deprecation In-Reply-To: References: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> Message-ID: <14d3d469-319a-7e70-a8bf-054de57197bc@gmail.com> Argparse is not just about parsing, it's about providing convenient tooling associated with parsing. Otherwise you would not have automatically generated a "usage" message or a "--help" command. Following your definition, those are not parsing. But there are here, because we all end up coding them anyway. Le 09/08/2017 ? 11:50, Ned Batchelder a ?crit : > On 8/9/17 3:56 AM, Tarek Ziad? wrote: >> Hey, >> >> I don't think there's any helper to deprecate an argument in argparse >> >> Let's say you have a --foo option in your CLI and want to deprecate it >> in the next release before you completely remove it later. >> >> My first though on how to do this by adding a new "deprecated" option to >> https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser.add_argument >> >> "deprecated" would be a callable that is called after the argument has >> been parsed by argparse, >> so the developer can decide if they want to issue a deprecation warning, >> use the parsed value or override it etc. > > I don't see why this is something that argparse has to do. The > semantics of options is handled by the rest of the program. Why would > the parser be issuing these warnings? Let argparse parse the options, > then let other code deal with what they *mean*. > > --Ned. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From ned at nedbatchelder.com Wed Aug 9 06:59:18 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Wed, 9 Aug 2017 06:59:18 -0400 Subject: [Python-ideas] Argparse argument deprecation In-Reply-To: <14d3d469-319a-7e70-a8bf-054de57197bc@gmail.com> References: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> <14d3d469-319a-7e70-a8bf-054de57197bc@gmail.com> Message-ID: <491f4b14-8dbd-804e-c1e5-970640df954b@nedbatchelder.com> OK, then on a more pragmatic note: why is it easier to write a callback than to write a simple if statement after the parsing? Generating help is complex, and a common task that is closely tied to the syntax of the options, so it makes sense for argparse to do it. Deprecation is neither complex, common, nor closely tied to the syntax of the options. Another note about the proposal: calling it "deprecated" seems odd, since the proposal is really just a general-purpose callback. argparse isn't generating the warning, your callback function would be doing it. Why name it "deprecated"? How is this different than the "action" keyword argument that argparse already provides? --Ned. On 8/9/17 5:54 AM, Michel Desmoulin wrote: > Argparse is not just about parsing, it's about providing convenient > tooling associated with parsing. > > Otherwise you would not have automatically generated a "usage" message > or a "--help" command. > > Following your definition, those are not parsing. But there are here, > because we all end up coding them anyway. > > Le 09/08/2017 ? 11:50, Ned Batchelder a ?crit : >> On 8/9/17 3:56 AM, Tarek Ziad? wrote: >>> Hey, >>> >>> I don't think there's any helper to deprecate an argument in argparse >>> >>> Let's say you have a --foo option in your CLI and want to deprecate it >>> in the next release before you completely remove it later. >>> >>> My first though on how to do this by adding a new "deprecated" option to >>> https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser.add_argument >>> >>> "deprecated" would be a callable that is called after the argument has >>> been parsed by argparse, >>> so the developer can decide if they want to issue a deprecation warning, >>> use the parsed value or override it etc. >> I don't see why this is something that argparse has to do. The >> semantics of options is handled by the rest of the program. Why would >> the parser be issuing these warnings? Let argparse parse the options, >> then let other code deal with what they *mean*. >> >> --Ned. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From desmoulinmichel at gmail.com Wed Aug 9 07:16:53 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Wed, 9 Aug 2017 13:16:53 +0200 Subject: [Python-ideas] Argparse argument deprecation In-Reply-To: <491f4b14-8dbd-804e-c1e5-970640df954b@nedbatchelder.com> References: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> <14d3d469-319a-7e70-a8bf-054de57197bc@gmail.com> <491f4b14-8dbd-804e-c1e5-970640df954b@nedbatchelder.com> Message-ID: Le 09/08/2017 ? 12:59, Ned Batchelder a ?crit : > OK, then on a more pragmatic note: why is it easier to write a callback > than to write a simple if statement after the parsing? Generating help > is complex, and a common task that is closely tied to the syntax of the > options, so it makes sense for argparse to do it. Deprecation is > neither complex, common, nor closely tied to the syntax of the options. > > Another note about the proposal: calling it "deprecated" seems odd, > since the proposal is really just a general-purpose callback. argparse > isn't generating the warning, your callback function would be doing it. > Why name it "deprecated"? How is this different than the "action" > keyword argument that argparse already provides? I imagine something like: def _(warn, forbid): warn('This is deprecated') # for forbid to just put an error parser.add_option(on_deprecated=deprecationCallback) This does: - provide an easy way to warn, or transition to forbid - allow introspection to list the deprecated options - deprecated options can be marked as such in the generated --help - create a complex dynamic deprecation message, or just pass a short lambda But indeed I'd like it to be able to do: parser.add_option(on_deprecated=DeprecationWarning('meh')) parser.add_option(on_deprecated=ValueError('meh')) As a shortcut for simple use cases. I still don't know how to make the distinction between deprecated and removed from the introspection point of view. All in all, I think it's an interesting proposal, but I'm not going to fight over it. If it never happens, I can fit a bunch of "if" like you said. > > --Ned. > > > On 8/9/17 5:54 AM, Michel Desmoulin wrote: >> Argparse is not just about parsing, it's about providing convenient >> tooling associated with parsing. >> >> Otherwise you would not have automatically generated a "usage" message >> or a "--help" command. >> >> Following your definition, those are not parsing. But there are here, >> because we all end up coding them anyway. >> >> Le 09/08/2017 ? 11:50, Ned Batchelder a ?crit : >>> On 8/9/17 3:56 AM, Tarek Ziad? wrote: >>>> Hey, >>>> >>>> I don't think there's any helper to deprecate an argument in argparse >>>> >>>> Let's say you have a --foo option in your CLI and want to deprecate it >>>> in the next release before you completely remove it later. >>>> >>>> My first though on how to do this by adding a new "deprecated" option to >>>> https://docs.python.org/3/library/argparse.html#argparse.ArgumentParser.add_argument >>>> >>>> "deprecated" would be a callable that is called after the argument has >>>> been parsed by argparse, >>>> so the developer can decide if they want to issue a deprecation warning, >>>> use the parsed value or override it etc. >>> I don't see why this is something that argparse has to do. The >>> semantics of options is handled by the rest of the program. Why would >>> the parser be issuing these warnings? Let argparse parse the options, >>> then let other code deal with what they *mean*. >>> >>> --Ned. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From tarek at ziade.org Wed Aug 9 07:38:25 2017 From: tarek at ziade.org (=?utf-8?Q?Tarek=20Ziad=C3=A9?=) Date: Wed, 09 Aug 2017 13:38:25 +0200 Subject: [Python-ideas] Argparse argument deprecation In-Reply-To: <491f4b14-8dbd-804e-c1e5-970640df954b@nedbatchelder.com> References: <1502265380.2026839.1067738128.60EEB727@webmail.messagingengine.com> <14d3d469-319a-7e70-a8bf-054de57197bc@gmail.com> <491f4b14-8dbd-804e-c1e5-970640df954b@nedbatchelder.com> Message-ID: <1502278705.2072904.1067929320.33FBB792@webmail.messagingengine.com> > Another note about the proposal: calling it "deprecated" seems odd, > since the proposal is really just a general-purpose callback. argparse > isn't generating the warning, your callback function would be doing it. > Why name it "deprecated"? How is this different than the "action" > keyword argument that argparse already provides? That sounds right. Maybe a better implementation would be to implement a custom action by inheriting from argparse.Action https://docs.python.org/3/library/argparse.html#action and do all the warning/deprecation job there. I'll experiment with this idea on my side to see how it goes :) Cheers Tarek From ncoghlan at gmail.com Wed Aug 9 10:54:57 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Aug 2017 00:54:57 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On 9 August 2017 at 15:38, Guido van Rossum wrote: > On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan wrote: >> The OP's proposal doesn't fit into that category though: rather it's >> asking about the case where we have an infinite iterator (e.g. >> itertools.count(0)), and want to drop items until they start meeting >> some condition (i.e. itertools.dropwhile) and then terminate the >> iterator as soon as another condition is no longer met (i.e. >> itertools.takewhile). > > I don't think that's what the OP meant. The original proposal seemed to > assume that it would be somehow reasonable for the input ("integers" in the > example) to be able to see and parse the condition in the generator > expression ("1000 <= x < 100000" in the example, with "x" somehow known to > be bound to the iteration value). That's at least what I think the remark "I > like mathy syntax" referred to. Right, I was separating the original request to make "{x for x in integers if 1000 <= x < 1000000}" work into the concrete proposal to make exactly *that* syntax work (which I don't think is feasible), and the slightly more general notion of offering a more math-like syntax that allows finite sets to be built from infinite iterators by defining a termination condition in addition to a filter condition. >> There aren't any technical barriers I'm aware of to implementing that, >> with the main historical objection being that instead of the >> comprehension level while clause mapping to a while loop directly the >> way the for and if clauses map to their statement level counterparts, >> it would instead map to the conditional break in the expanded >> loop-and-a-half form: >> >> while True: >> if not condition: >> break >> >> While it's taken me a long time to come around to the idea, "Make >> subtle infinite loops in mathematical code easier to avoid" *is* a >> pretty compelling user-focused justification for incurring that extra >> complexity at the language design level. > > I haven't come around to this yet. It looks like it will make explaining > comprehensions more complex, since the translation of "while X" into "if not > X: break" feels less direct than the translations of "for x in xs" or "if > pred(x)". (In particular, your proposal seems to require more experience > with mentally translating loops and conditions into jumps -- most regulars > of this forum do that for a living, but I doubt it's second nature for the > OP.) Yeah, if we ever did add something like this, I suspect a translation using takewhile would potentially be easier for at least some users to understand than the one to a break condition: {x for x in itertools.count(0) if 1000 <= x while x < 1000000} <=> x = set() for x in itertools.count(0): if 1000 <= x: set.add(x) # If you've never used the loop-and-a-half idiom, it's # not obvious why "while " means "if not : break" if not x < 1000000: break is roughly {x for x in itertools.takewhile(itertools.count(0), lambda x: x < 1000000) if 1000 <= x} <=> x = set() for x in takewhile(itertools.count(0), lambda x: x < 1000000): if 1000 <= x: set.add(x) However, the break condition is the translation that would make sense at a language *implementation* level (and would hence be the one that determined the relative location of the while clause in the expression form). That discrepancy *still* sets off alarm bells for me (since it's a clear sign that "how people would think this works" and "how it would actually work" probably wouldn't match), I'm also conscious of the amount of syntactic noise that "takewhile" introduces vs the "while" keyword. The counter-argument (which remains valid even against my own change of heart) is that adding a new comprehension clause doesn't actually fix the "accidental infinite loop" problem: "{x for x in itertools.count(0) if 1000 <= x < 1000000}" will still loop forever, it would just have a nicer fix to get it to terminate (adding " while x" to turn the second filter condition into a termination condition). So while I'm +0 where I used to be a firm -1, it's still only a +0 :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From fakedme+py at gmail.com Wed Aug 9 11:49:13 2017 From: fakedme+py at gmail.com (Soni L.) Date: Wed, 9 Aug 2017 12:49:13 -0300 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On 2017-08-09 11:54 AM, Nick Coghlan wrote: > On 9 August 2017 at 15:38, Guido van Rossum wrote: >> On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan wrote: >>> The OP's proposal doesn't fit into that category though: rather it's >>> asking about the case where we have an infinite iterator (e.g. >>> itertools.count(0)), and want to drop items until they start meeting >>> some condition (i.e. itertools.dropwhile) and then terminate the >>> iterator as soon as another condition is no longer met (i.e. >>> itertools.takewhile). >> I don't think that's what the OP meant. The original proposal seemed to >> assume that it would be somehow reasonable for the input ("integers" in the >> example) to be able to see and parse the condition in the generator >> expression ("1000 <= x < 100000" in the example, with "x" somehow known to >> be bound to the iteration value). That's at least what I think the remark "I >> like mathy syntax" referred to. > Right, I was separating the original request to make "{x for x in > integers if 1000 <= x < 1000000}" work into the concrete proposal to > make exactly *that* syntax work (which I don't think is feasible), and > the slightly more general notion of offering a more math-like syntax > that allows finite sets to be built from infinite iterators by > defining a termination condition in addition to a filter condition. Ok. A concrete proposal would give a read-only 'filter' argument to the iterator somehow, which represents some form of simplified AST of the condition. So e.g. {x for x in integers if (lambda v: 1000 <= v < 1000000)(x)} would never complete, but {x for x in integers if 1000 <= x < 1000000} would. (But perhaps lambda objects should include an AST attribute... Having it for normal functions would introduce too much overhead tho, and then it would no longer be a simplified AST, but rather a complete python AST, which we don't want.) > >>> There aren't any technical barriers I'm aware of to implementing that, >>> with the main historical objection being that instead of the >>> comprehension level while clause mapping to a while loop directly the >>> way the for and if clauses map to their statement level counterparts, >>> it would instead map to the conditional break in the expanded >>> loop-and-a-half form: >>> >>> while True: >>> if not condition: >>> break >>> >>> While it's taken me a long time to come around to the idea, "Make >>> subtle infinite loops in mathematical code easier to avoid" *is* a >>> pretty compelling user-focused justification for incurring that extra >>> complexity at the language design level. >> I haven't come around to this yet. It looks like it will make explaining >> comprehensions more complex, since the translation of "while X" into "if not >> X: break" feels less direct than the translations of "for x in xs" or "if >> pred(x)". (In particular, your proposal seems to require more experience >> with mentally translating loops and conditions into jumps -- most regulars >> of this forum do that for a living, but I doubt it's second nature for the >> OP.) > Yeah, if we ever did add something like this, I suspect a translation > using takewhile would potentially be easier for at least some users to > understand than the one to a break condition: > > {x for x in itertools.count(0) if 1000 <= x while x < 1000000} > > <=> > > x = set() > for x in itertools.count(0): > if 1000 <= x: > set.add(x) > # If you've never used the loop-and-a-half idiom, it's > # not obvious why "while " means "if not : break" > if not x < 1000000: > break > > is roughly > > {x for x in itertools.takewhile(itertools.count(0), lambda x: x < > 1000000) if 1000 <= x} > > <=> > > x = set() > for x in takewhile(itertools.count(0), lambda x: x < 1000000): > if 1000 <= x: > set.add(x) > > However, the break condition is the translation that would make sense > at a language *implementation* level (and would hence be the one that > determined the relative location of the while clause in the expression > form). > > That discrepancy *still* sets off alarm bells for me (since it's a > clear sign that "how people would think this works" and "how it would > actually work" probably wouldn't match), I'm also conscious of the > amount of syntactic noise that "takewhile" introduces vs the "while" > keyword. > > The counter-argument (which remains valid even against my own change > of heart) is that adding a new comprehension clause doesn't actually > fix the "accidental infinite loop" problem: "{x for x in > itertools.count(0) if 1000 <= x < 1000000}" will still loop forever, > it would just have a nicer fix to get it to terminate (adding " while > x" to turn the second filter condition into a termination condition). > > So while I'm +0 where I used to be a firm -1, it's still only a +0 :) > > Cheers, > Nick. > From e4r7hbug at gmail.com Wed Aug 9 13:42:18 2017 From: e4r7hbug at gmail.com (Nate.) Date: Wed, 09 Aug 2017 17:42:18 +0000 Subject: [Python-ideas] Mimetypes Include application/json Message-ID: Hi, A friend and I have hit a funny situation with the `mimetypes.py` library guessing the type for a '.json' file. Is there a reason why '.json' hasn't been added to the mapping? Without `mailcap` installed: [root at de169da8cc46 /]# python3 -m mimetypes build.json I don't know anything about type build.json With `mailcap` installed: [root at de169da8cc46 /]# python3 -m mimetypes build.json type: application/json encoding: None We experimented with adding a mapping for '.json' to 'application/json' to `mimetypes.py` and it seems to work fine for us. It looks like it has been registered with IANA and everything. Proposed diff: ntangsurat at derigible ~/git/e4r7hbug.cpython/Lib master $ git diff diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py index 3d68694864..5919b45a9b 100644 --- a/Lib/mimetypes.py +++ b/Lib/mimetypes.py @@ -439,6 +439,7 @@ def _default_mime_types(): '.jpeg' : 'image/jpeg', '.jpg' : 'image/jpeg', '.js' : 'application/javascript', + '.json' : 'application/json', '.ksh' : 'text/plain', '.latex' : 'application/x-latex', '.m1v' : 'video/mpeg', Nate. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Aug 9 14:17:59 2017 From: brett at python.org (Brett Cannon) Date: Wed, 09 Aug 2017 18:17:59 +0000 Subject: [Python-ideas] Mimetypes Include application/json In-Reply-To: References: Message-ID: On Wed, 9 Aug 2017 at 10:43 Nate. wrote: > Hi, > > A friend and I have hit a funny situation with the `mimetypes.py` library > guessing the type for a '.json' file. Is there a reason why '.json' hasn't > been > added to the mapping? > Probably no one thought about it since the module was added in 1997 which is only 2 years after the creation of JavaScript itself. :) > > Without `mailcap` installed: > > [root at de169da8cc46 /]# python3 -m mimetypes build.json > I don't know anything about type build.json > > With `mailcap` installed: > > [root at de169da8cc46 /]# python3 -m mimetypes build.json > type: application/json encoding: None > > We experimented with adding a mapping for '.json' to 'application/json' to > `mimetypes.py` and it seems to work fine for us. It looks like it has been > registered with IANA and everything. > > Proposed diff: > > ntangsurat at derigible ~/git/e4r7hbug.cpython/Lib master $ git diff > diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py > index 3d68694864..5919b45a9b 100644 > --- a/Lib/mimetypes.py > +++ b/Lib/mimetypes.py > @@ -439,6 +439,7 @@ def _default_mime_types(): > '.jpeg' : 'image/jpeg', > '.jpg' : 'image/jpeg', > '.js' : 'application/javascript', > + '.json' : 'application/json', > '.ksh' : 'text/plain', > '.latex' : 'application/x-latex', > '.m1v' : 'video/mpeg', > Feel free to file a bug at bugs.python.org and if you aren't too bothered then submit a PR to github.com/python/cpython (https://devguide.python.org/ has all the details). -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Aug 9 14:24:49 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 9 Aug 2017 21:24:49 +0300 Subject: [Python-ideas] Mimetypes Include application/json In-Reply-To: References: Message-ID: 09.08.17 21:17, Brett Cannon ????: > On Wed, 9 Aug 2017 at 10:43 Nate. > > wrote: > A friend and I have hit a funny situation with the `mimetypes.py` > library > guessing the type for a '.json' file. Is there a reason why '.json' > hasn't been > added to the mapping? > > > Probably no one thought about it since the module was added in 1997 > which is only 2 years after the creation of JavaScript itself. :) No one proposed a patch. > Feel free to file a bug at bugs.python.org and > if you aren't too bothered then submit a PR to github.com/python/cpython > (https://devguide.python.org/ has all > the details). https://bugs.python.org/issue30824 From phd at phdru.name Wed Aug 9 14:25:55 2017 From: phd at phdru.name (Oleg Broytman) Date: Wed, 9 Aug 2017 20:25:55 +0200 Subject: [Python-ideas] Mimetypes Include application/json In-Reply-To: References: Message-ID: <20170809182555.GA4079@phdru.name> On Wed, Aug 09, 2017 at 05:42:18PM +0000, "Nate." wrote: > A friend and I have hit a funny situation with the `mimetypes.py` library > guessing the type for a '.json' file. Is there a reason why '.json' hasn't > been > added to the mapping? My guess is that nobody uses mimetypes without mailcap. > Without `mailcap` installed: > > [root at de169da8cc46 /]# python3 -m mimetypes build.json > I don't know anything about type build.json > > With `mailcap` installed: > > [root at de169da8cc46 /]# python3 -m mimetypes build.json > type: application/json encoding: None > > We experimented with adding a mapping for '.json' to 'application/json' to > `mimetypes.py` and it seems to work fine for us. It looks like it has been > registered with IANA and everything. > > Proposed diff: Patches should be published at the issue tracker. > ntangsurat at derigible ~/git/e4r7hbug.cpython/Lib master $ git diff > diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py > index 3d68694864..5919b45a9b 100644 > --- a/Lib/mimetypes.py > +++ b/Lib/mimetypes.py > @@ -439,6 +439,7 @@ def _default_mime_types(): > '.jpeg' : 'image/jpeg', > '.jpg' : 'image/jpeg', > '.js' : 'application/javascript', > + '.json' : 'application/json', > '.ksh' : 'text/plain', > '.latex' : 'application/x-latex', > '.m1v' : 'video/mpeg', > > Nate. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From e4r7hbug at gmail.com Wed Aug 9 14:50:47 2017 From: e4r7hbug at gmail.com (Nate.) Date: Wed, 09 Aug 2017 18:50:47 +0000 Subject: [Python-ideas] Mimetypes Include application/json In-Reply-To: References: Message-ID: O, fun! Thank you for the guidance. I managed to find a Bug already created, http://bugs.python.org/issue30824. I'll create a Pull Request using that Bug. On Wed, Aug 9, 2017 at 1:18 PM Brett Cannon wrote: > On Wed, 9 Aug 2017 at 10:43 Nate. wrote: > >> Hi, >> >> A friend and I have hit a funny situation with the `mimetypes.py` library >> guessing the type for a '.json' file. Is there a reason why '.json' >> hasn't been >> added to the mapping? >> > > Probably no one thought about it since the module was added in 1997 which > is only 2 years after the creation of JavaScript itself. :) > > >> >> Without `mailcap` installed: >> >> [root at de169da8cc46 /]# python3 -m mimetypes build.json >> I don't know anything about type build.json >> >> With `mailcap` installed: >> >> [root at de169da8cc46 /]# python3 -m mimetypes build.json >> type: application/json encoding: None >> >> We experimented with adding a mapping for '.json' to 'application/json' to >> `mimetypes.py` and it seems to work fine for us. It looks like it has been >> registered with IANA and everything. >> >> Proposed diff: >> >> ntangsurat at derigible ~/git/e4r7hbug.cpython/Lib master $ git diff >> diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py >> index 3d68694864..5919b45a9b 100644 >> --- a/Lib/mimetypes.py >> +++ b/Lib/mimetypes.py >> @@ -439,6 +439,7 @@ def _default_mime_types(): >> '.jpeg' : 'image/jpeg', >> '.jpg' : 'image/jpeg', >> '.js' : 'application/javascript', >> + '.json' : 'application/json', >> '.ksh' : 'text/plain', >> '.latex' : 'application/x-latex', >> '.m1v' : 'video/mpeg', >> > > Feel free to file a bug at bugs.python.org and if you aren't too bothered > then submit a PR to github.com/python/cpython ( > https://devguide.python.org/ has all the details). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Aug 9 16:23:28 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 9 Aug 2017 13:23:28 -0700 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan wrote: > On 8 August 2017 at 09:06, Chris Barker wrote: > > It would be nice to have an easier access to an "slice iterator" though > -- > > one of these days I may write up a proposal for that. > > An idea I've occasionally toyed with [1] is some kind of "iterview" > that wraps around an arbitrary iterable and produces lazy itertools > based results rather than immediate views or copies. > > However, my experience is also that folks are *really* accustomed to > syntactic operations on containers producing either full live views > (e.g. memoryview or numpy slices, range as a dynamically computed > container), or actual copies (builtin container types). Having them > produce consumable iterators instead then gets confusing due to the > number of operations that will implicitly consume them (including > simple "x in y" checks). > I agree -- which is why I"m thinking only adding a simple "iterable slice", rather than changing the overall behavior of the container. It would be quite clear what you are asking for. Right now, getting the "terminate when false" behaviour requires the > use of takewhile: > I can't recall the use case(s) at the moment, but I have definitely wanted a way to break out of a comprehension -- and not always with infinite iterators. After all, we have "break" in both for and while loops, so clearly there is the use case... If someone comes up with a clean and not confusing (and general purpose) syntax, I think it would be very useful. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Aug 9 17:22:40 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 9 Aug 2017 17:22:40 -0400 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On 8/9/2017 10:54 AM, Nick Coghlan wrote: > On 9 August 2017 at 15:38, Guido van Rossum wrote: >> On Tue, Aug 8, 2017 at 10:06 PM, Nick Coghlan wrote: >>> The OP's proposal doesn't fit into that category though: rather it's >>> asking about the case where we have an infinite iterator (e.g. >>> itertools.count(0)), and want to drop items until they start meeting >>> some condition (i.e. itertools.dropwhile) and then terminate the >>> iterator as soon as another condition is no longer met (i.e. >>> itertools.takewhile). >> >> I don't think that's what the OP meant. The original proposal seemed to >> assume that it would be somehow reasonable for the input ("integers" in the >> example) to be able to see and parse the condition in the generator >> expression ("1000 <= x < 100000" in the example, with "x" somehow known to >> be bound to the iteration value). That's at least what I think the remark "I >> like mathy syntax" referred to. > > Right, I was separating the original request to make "{x for x in > integers if 1000 <= x < 1000000}" work into the concrete proposal to > make exactly *that* syntax work (which I don't think is feasible), and > the slightly more general notion of offering a more math-like syntax > that allows finite sets to be built from infinite iterators by > defining a termination condition in addition to a filter condition. We already have three nice one liners for that, one of which you gave. x = set(filter(filter_condition, takewhile(continue_condition, source))) x = set(x for x in takewhile(continue_condition, source) if filter_condition) x = {x for x in takewhile(continue_condition, source) if filter_condition} Replace takewhile with islice(source, max) if the continue condition is (number seen < max). Add enumerate if the running count is needed otherwise. Terminating an infinite iterator and filtering the initial slice are different operations. The operations are easily composed as they are, in multiple ways. Trying to mix them together in one jumbled special syntax is a bad idea to me. >>> There aren't any technical barriers I'm aware of to implementing that, >>> with the main historical objection being that instead of the >>> comprehension level while clause mapping to a while loop directly the >>> way the for and if clauses map to their statement level counterparts, >>> it would instead map to the conditional break in the expanded >>> loop-and-a-half form: >>> >>> while True: >>> if not condition: >>> break In other words, aside from other issues, you would have 'while' mean 'do...while' in this one special place. -1. -- Terry Jan Reedy From ncoghlan at gmail.com Wed Aug 9 23:24:45 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Aug 2017 13:24:45 +1000 Subject: [Python-ideas] Pseudo methods In-Reply-To: <22922.50589.497352.85140@turnbull.sk.tsukuba.ac.jp> References: <22922.50589.497352.85140@turnbull.sk.tsukuba.ac.jp> Message-ID: On 9 August 2017 at 18:19, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > To analyse and investigate this code, we need to "just know" that: > > You can of course hope that help(input().has_vowels) will tell you > where to find it. If it doesn't, well, shame on you for depending on > source-unavailable software that you don't understand. ;-) We can't run "help" when we're reviewing a diff or otherwise reading code in a situation where interactive help isn't available :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Aug 9 23:30:24 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Aug 2017 13:30:24 +1000 Subject: [Python-ideas] Mimetypes Include application/json In-Reply-To: References: Message-ID: On 10 August 2017 at 04:24, Serhiy Storchaka wrote: > 09.08.17 21:17, Brett Cannon ????: >> >> On Wed, 9 Aug 2017 at 10:43 Nate. > > wrote: >> A friend and I have hit a funny situation with the `mimetypes.py` >> library >> guessing the type for a '.json' file. Is there a reason why '.json' >> hasn't been >> added to the mapping? >> >> >> Probably no one thought about it since the module was added in 1997 which >> is only 2 years after the creation of JavaScript itself. :) > > No one proposed a patch. That's not *quite* true - there was at least one proposal a few years to modernise the mimetypes list, but the one I was involved in reviewing got intertwined with a proposal to completely rewrite the mimetypes module, and the submitter wasn't interested in creating a more minimalist patch that solved the specific problem (i.e. the list was pretty out of date) without all the extraneous changes to how the module actually worked :( Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Aug 10 00:11:48 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Aug 2017 14:11:48 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On 10 August 2017 at 00:54, Nick Coghlan wrote: > Yeah, if we ever did add something like this, I suspect a translation > using takewhile would potentially be easier for at least some users to > understand than the one to a break condition: > > {x for x in itertools.count(0) if 1000 <= x while x < 1000000} > > <=> > > x = set() > for x in itertools.count(0): > if 1000 <= x: > set.add(x) > # If you've never used the loop-and-a-half idiom, it's > # not obvious why "while " means "if not : break" > if not x < 1000000: > break > > is roughly > > {x for x in itertools.takewhile(itertools.count(0), lambda x: x < > 1000000) if 1000 <= x} > > <=> > > x = set() > for x in takewhile(itertools.count(0), lambda x: x < 1000000): > if 1000 <= x: > set.add(x) Ugh, this discrepancy is worse than I thought, since the translation with that clause order is actually wrong (Terry mentioned this by pointing out that the proposed syntactic translation implemented "do...while" ordering). The takewhile example is also wrong, since it has the arguments in the wrong order. Fixing both of those issues gives the comparison: {x for x in itertools.count(0) while x < 1000000 if 1000 <= x} <=> x = set() for x in itertools.count(0): # If you've never used the loop-and-a-half idiom, it's # not obvious why "while " means "if : else: break" if x < 1000000: if 1000 <= x: set.add(x) else: break is roughly: {x for x in itertools.takewhile(lambda x: x < 1000000, itertools.count(0)) if 1000 <= x} <=> x = set() for x in takewhile(lambda x: x < 1000000, itertools.count(0)): if 1000 <= x: set.add(x) And I think that gets me back to pretty much where I was the last time this came up: a while clause in comprehensions really only makes sense in combination with a while clause on for loops, where: for x in itertools.count(0) while x < 1000000: ... was roughly equivalent to: for x in itertools.count(0): if x < 1000000: ... else: break (such that there's only one loop from the point of view of break/continue/else, but the loop may terminate based on either exhaustion of the underlying iterator *or* some specific condition becoming false) While I do think such a clause would be more readable for more people than the dropwhile/takewhile equivalents (especially when the latter end up needing to use lambda expressions), I'm still dubious that these cases come up often enough to justify the addition of a for-while loop as a composite construct (the old "dropwhile and takewhile aren't even common enough to justify being builtins, why should they jump all the way to syntactic support?" question applies). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From lists at cooperlees.com Thu Aug 10 00:21:56 2017 From: lists at cooperlees.com (Cooper Ry Lees) Date: Thu, 10 Aug 2017 12:21:56 +0800 Subject: [Python-ideas] PyPI JSON Metadata Standardization for Mirrors Message-ID: Hi all, First time emailer, so please be kind. Also, if this is not the right mailing list for PyPA talk, I apologize. Please point me in the right direction if so. The main reason I have emailed here is I believe it may be PEP time to standardize the JSON metadata that PyPI makes available, like what was done for the `'simple API` described in PEP503. I've been doing a bit of work on `bandersnatch` (I didn't name it), which is a PEP 381 mirroring package and wanted to enhance it to also mirror the handy JSON metadata PyPI generates and makes available @ https://pypi.python.org/pypi/PKG_NAME/json. I've done a PR on bandersnatch as a POC that mirrors both the PyPI directory structure (URL/pypi/PKG_NAME/json) and created a standardizable URL/json/PKG_NAME that the former symlinks to (to be served by NGINX / some other proxy). I'm also contemplating naming the directory 'metadata' rather than JSON so if some new hotness / we want to change the format down the line we're not stuck with json as the dirname. This PR can be found here: https://bitbucket.org/pypa/bandersnatch/pull-requests/33/save-json-metadata-to-mirror My main use case is to write a very simple async 'verifier' tool that will crawl all the JSON files and then ensure the packages directory on each of my internal mirrors (I have a mirror per region / datacenter) have all the files they should. I sync centrally (to save resource on the PyPI infrastructure) and then rsync out all the diffs to each region / datacenter, and under some failure scenarios I could miss a file or many. So I feel using JSON pulled down from the authoritative source will allow an async job to verify the MD5 of all the package files on each mirror. What are peoples thoughts here? Is it worth a PEP similar to PEP503 going forward? Can people enhance / share some thoughts on this idea. Thanks, Cooper Lees me at cooperlees.com https://cooperlees.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 10 00:43:50 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 10 Aug 2017 14:43:50 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On 10 August 2017 at 01:49, Soni L. wrote: > On 2017-08-09 11:54 AM, Nick Coghlan wrote: >> Right, I was separating the original request to make "{x for x in >> integers if 1000 <= x < 1000000}" work into the concrete proposal to >> make exactly *that* syntax work (which I don't think is feasible), and >> the slightly more general notion of offering a more math-like syntax >> that allows finite sets to be built from infinite iterators by >> defining a termination condition in addition to a filter condition. > > Ok. A concrete proposal would give a read-only 'filter' argument to the > iterator somehow, which represents some form of simplified AST of the > condition. > > So e.g. {x for x in integers if (lambda v: 1000 <= v < 1000000)(x)} would > never complete, but {x for x in integers if 1000 <= x < 1000000} would. (But > perhaps lambda objects should include an AST attribute... Having it for > normal functions would introduce too much overhead tho, and then it would no > longer be a simplified AST, but rather a complete python AST, which we don't > want.) There have been a variety of different "thunking" proposals over the years, but they've all foundered on the question of what the *primitive* quoted form should look like, and how the thunks should subsequently be executed. For cases like this, where integration with Python's name resolution mechanism isn't actually required, folks have ended up just using strings, where the only downside is the fact that syntax highlighters and other static analysers don't know that the contents are supposed to be valid Python code. In a case like this, that might look like: {x for x in integers.build_set("1000 <= x < 1000000")} As with regexes, the cost of dynamically parsing such strings can then be amortised at runtime through the use of an appropriate caching strategy. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Thu Aug 10 09:42:33 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 10 Aug 2017 23:42:33 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: <20170810134232.GE7395@ando.pearwood.info> On Wed, Aug 09, 2017 at 01:23:28PM -0700, Chris Barker wrote: > I can't recall the use case(s) at the moment, but I have definitely wanted > a way to break out of a comprehension -- and not always with infinite > iterators. > > After all, we have "break" in both for and while loops, so clearly there is > the use case... Indeed :-) > If someone comes up with a clean and not confusing (and general purpose) > syntax, I think it would be very useful. We used to be able to (ab)use StopIteration to do this: def Break(): raise StopIteration # generator expressions only, not list comprehensions result = (expression for x in sequence if condition or Break()) but I believe that loophole has been closed in 3.6. Comprehensions in Clojure have this feature: http://clojuredocs.org/clojure_core/clojure.core/for Clojure uses "when" where Python uses "if", giving: ;; :when continues through the collection even if some have the ;; condition evaluate to false, like filter user=> (for [x (range 3 33 2) :when (prime? x)] x) (3 5 7 11 13 17 19 23 29 31) ;; :while stops at the first collection element that evaluates to ;; false, like take-while user=> (for [x (range 3 33 2) :while (prime? x)] x) (3 5 7) Translating into Python: [x for x in range(3, 33, 2) if is_prime(x)] [x for x in range(3, 33, 2) while is_prime(x)] # hypothetical syntax I don't think it is confusing. Regardless of the implementation, the meaning of: [expression for x in sequence while condition] should (I believe) be obvious to anyone who already groks comprehension syntax. The mapping to a for-loop is admittedly a tad more complex: result = [] for x in sequence: if not condition: break result.append(expression) but I'm yet to meet anyone who routinely and regularly reads comprehensions by converting them to for loops like that. And if they did, all they need do is mentally map "while condition" to "if not condition: break" and it should all Just Work?. -- Steve From steve at pearwood.info Thu Aug 10 10:50:35 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Aug 2017 00:50:35 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: <20170810145035.GF7395@ando.pearwood.info> On Thu, Aug 10, 2017 at 12:54:57AM +1000, Nick Coghlan wrote: Guido wrote: > > I haven't come around to this yet. It looks like it will make explaining > > comprehensions more complex, since the translation of "while X" into "if not > > X: break" feels less direct than the translations of "for x in xs" or "if > > pred(x)". (In particular, your proposal seems to require more experience > > with mentally translating loops and conditions into jumps -- most regulars > > of this forum do that for a living, but I doubt it's second nature for the > > OP.) > > Yeah, if we ever did add something like this, I suspect a translation > using takewhile would potentially be easier for at least some users to > understand than the one to a break condition: "Some users"? Sure, why not? There's probably somebody out there who understands takewhile, but if so, I don't know who they are :-) I always have to look at the docs for takewhile to remind myself whether it drops items ("takes them away") while the condition is true, or yields items ("gives items") while the condition is true. > {x for x in itertools.count(0) if 1000 <= x while x < 1000000} > > <=> > > x = set() > for x in itertools.count(0): > if 1000 <= x: > set.add(x) > # If you've never used the loop-and-a-half idiom, it's > # not obvious why "while " means "if not : break" > if not x < 1000000: > break I'd like to take issue with that "not obvious" comment. I think that anyone who knows while loops knows that the loop exits when the condition becomes false. That's exactly the behaviour we get for the (hypothetical) [expr for x in seq while condition] syntax: when the condition is false, the loop and hence the comprehension, exits. For such simple cases, there's no need to think about "loop and a half". The obvious explanation is that the loop exits when the while condition fails. Based on my experience with beginners on the tutor mailing list, and elsewhere, I think there's a definite learning "hump" to get over before people grok even the trivial case of [expression for x in sequence] but once they do, then adding an "if" clause is obvious, and I expect that the same will apply to "when". Once you move beyond the simple case of a single for and no more than a single if (or while), I don't think there's *anything* obvious about comprehension syntax at all, while clause or no while clause. Holding the while clause to a standard that comprehensions already fail (in my opinion) is unfair: [expression for x in seq1 for y in seq2 if pred1 for z in seq3 if pred2 if pred3 if pred4 for w in seq4 while condition for v in seq5] I don't think it's the "while" that tips that over the edge, readability-wise :-) In any case, I think we're all guessing whether or not people will understand the "while condition" syntax. So I've done an informal survey on the Python-Ideas list, and once folks have had a day or so to answer I'll report what they say. It's not a truly scientific UI test, but it's the best I can do. -- Steve From p.f.moore at gmail.com Thu Aug 10 11:39:32 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Aug 2017 16:39:32 +0100 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <20170810134232.GE7395@ando.pearwood.info> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: On 10 August 2017 at 14:42, Steven D'Aprano wrote: > I don't think it is confusing. Regardless of the implementation, the > meaning of: > > [expression for x in sequence while condition] > > should (I believe) be obvious to anyone who already groks comprehension > syntax. The mapping to a for-loop is admittedly a tad more complex: > > result = [] > for x in sequence: > if not condition: break > result.append(expression) > > but I'm yet to meet anyone who routinely and regularly reads > comprehensions by converting them to for loops like that. And if they > did, all they need do is mentally map "while condition" to "if not > condition: break" and it should all Just Work?. The hard part is the interaction between if and while. Consider (expr for var in seq if cond1 while cond2): This means: for var in seq: if cond1: if not cond2: break yield expr Note that unlike all other comprehension clauses (for and if) while doesn't introduce a new level of nesting. That's an inconsistency, and while it's minor, it would need clarifying (my original draft of this email was a mess, because I misinterpreted how if and while would interact, precisely over this point). Also, there's a potential issue here - consider [expr for var in even_numbers() if is_odd(var) while var < 100] This is an infinite loop, even though it has a finite termination condition (var < 100), because we only test the termination condition if var is odd, which it never will be. Obviously, this is a contrived example. And certainly "don't do that, then" is a valid response. But my instinct is that people are going to get this wrong - *especially* in a maintenance environment. That example could have started off being "for var in count(0)" and then someone realised they could "optimise" it by omitting odd numbers, introducing the bug in the process. (And I'm sure real life code could come up with much subtler examples ;-)) Overall, I agree with Steven's point. It seems pretty obvious what the intention is, and while it's probably possible to construct examples that are somewhat unclear, 1. The mechanical rule gives an explicit meaning 2. People shouldn't be writing such complex comprehensions, so if the rule doesn't give what they expect, they can always rewrite the code with an explicit (and clearer) loop. But while I think this says that the above interpretation of while is the only sensible one, and in general other approaches are unlikely to be as natural, I *don't* think that it unequivocally says that allowing while is a good thing. It may still be better to omit it, and force people to state their intent explicitly (albeit a bit more verbosely). Paul From fakedme+py at gmail.com Thu Aug 10 11:39:34 2017 From: fakedme+py at gmail.com (Soni L.) Date: Thu, 10 Aug 2017 12:39:34 -0300 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On 2017-08-10 01:43 AM, Nick Coghlan wrote: > On 10 August 2017 at 01:49, Soni L. wrote: >> On 2017-08-09 11:54 AM, Nick Coghlan wrote: >>> Right, I was separating the original request to make "{x for x in >>> integers if 1000 <= x < 1000000}" work into the concrete proposal to >>> make exactly *that* syntax work (which I don't think is feasible), and >>> the slightly more general notion of offering a more math-like syntax >>> that allows finite sets to be built from infinite iterators by >>> defining a termination condition in addition to a filter condition. >> Ok. A concrete proposal would give a read-only 'filter' argument to the >> iterator somehow, which represents some form of simplified AST of the >> condition. >> >> So e.g. {x for x in integers if (lambda v: 1000 <= v < 1000000)(x)} would >> never complete, but {x for x in integers if 1000 <= x < 1000000} would. (But >> perhaps lambda objects should include an AST attribute... Having it for >> normal functions would introduce too much overhead tho, and then it would no >> longer be a simplified AST, but rather a complete python AST, which we don't >> want.) > There have been a variety of different "thunking" proposals over the > years, but they've all foundered on the question of what the > *primitive* quoted form should look like, and how the thunks should > subsequently be executed. > > For cases like this, where integration with Python's name resolution > mechanism isn't actually required, folks have ended up just using > strings, where the only downside is the fact that syntax highlighters > and other static analysers don't know that the contents are supposed > to be valid Python code. In a case like this, that might look like: > > {x for x in integers.build_set("1000 <= x < 1000000")} > > As with regexes, the cost of dynamically parsing such strings can then > be amortised at runtime through the use of an appropriate caching > strategy. I'm pretty sure I read somewhere that lambdas and generators share their syntax, and that syntax is already a subset of python syntax. Would it be too hard to expose that with a "simplified AST" API? > > Cheers, > Nick. > From steve at pearwood.info Thu Aug 10 12:11:42 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Aug 2017 02:11:42 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: <20170810161142.GH7395@ando.pearwood.info> On Thu, Aug 10, 2017 at 12:39:34PM -0300, Soni L. wrote: > I'm pretty sure I read somewhere that lambdas and generators share their > syntax, and that syntax is already a subset of python syntax. Would it > be too hard to expose that with a "simplified AST" API? I don't understand what you mean by this. The syntax for lambda is (roughly): lambda parameter-list : expression The syntax for generators is (again, roughly): def name ( parameter-list ) : suite-containing-yield Obviously the generator suite can contain expressions, and both have a parameter-list. What shared syntax are you referring to, and how is it relevant? Or are you referring to generator expressions, rather than generators? ( expression for target in expression ... ) Obviously a Python expression is a Python expression, wherever it is, so a lambda can contain generator expressions, and generator expressions can contain lambdas... And what do you mean by "simplified AST" API? I'm afraid your comment is too abstract for me to understand. -- Steve From fakedme+py at gmail.com Thu Aug 10 12:37:29 2017 From: fakedme+py at gmail.com (Soni L.) Date: Thu, 10 Aug 2017 13:37:29 -0300 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <20170810161142.GH7395@ando.pearwood.info> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810161142.GH7395@ando.pearwood.info> Message-ID: On 2017-08-10 01:11 PM, Steven D'Aprano wrote: > On Thu, Aug 10, 2017 at 12:39:34PM -0300, Soni L. wrote: > >> I'm pretty sure I read somewhere that lambdas and generators share their >> syntax, and that syntax is already a subset of python syntax. Would it >> be too hard to expose that with a "simplified AST" API? > I don't understand what you mean by this. > > The syntax for lambda is (roughly): > > lambda parameter-list : expression > > The syntax for generators is (again, roughly): > > def name ( parameter-list ) : > suite-containing-yield > > > Obviously the generator suite can contain expressions, and both have a > parameter-list. What shared syntax are you referring to, and how is it > relevant? > > Or are you referring to generator expressions, rather than generators? > > ( expression for target in expression ... ) > > Obviously a Python expression is a Python expression, wherever it is, so > a lambda can contain generator expressions, and generator expressions > can contain lambdas... > > And what do you mean by "simplified AST" API? I'm afraid your comment is > too abstract for me to understand. > Yes, see, both are expressions. Expression AST is a subset of python AST, so it's a simplified form of the python AST. > From brett at python.org Thu Aug 10 15:09:34 2017 From: brett at python.org (Brett Cannon) Date: Thu, 10 Aug 2017 19:09:34 +0000 Subject: [Python-ideas] PyPI JSON Metadata Standardization for Mirrors In-Reply-To: References: Message-ID: The proper list for this would be distutils-sig as that's where packaging-related discussions typically occur. On Wed, 9 Aug 2017 at 21:22 Cooper Ry Lees wrote: > Hi all, > > First time emailer, so please be kind. Also, if this is not the right > mailing list for PyPA talk, I apologize. Please point me in the right > direction if so. The main reason I have emailed here is I believe it may be > PEP time to standardize the JSON metadata that PyPI makes available, like > what was done for the `'simple API` described in PEP503. > > I've been doing a bit of work on `bandersnatch` (I didn't name it), which > is a PEP 381 mirroring package and wanted to enhance it to also mirror the > handy JSON metadata PyPI generates and makes available @ > https://pypi.python.org/pypi/PKG_NAME/json. > > I've done a PR on bandersnatch as a POC that mirrors both the PyPI > directory structure (URL/pypi/PKG_NAME/json) and created a standardizable > URL/json/PKG_NAME that the former symlinks to (to be served by NGINX / some > other proxy). I'm also contemplating naming the directory 'metadata' rather > than JSON so if some new hotness / we want to change the format down the > line we're not stuck with json as the dirname. This PR can be found here: > https://bitbucket.org/pypa/bandersnatch/pull-requests/33/save-json-metadata-to-mirror > > My main use case is to write a very simple async 'verifier' tool that will > crawl all the JSON files and then ensure the packages directory on each of > my internal mirrors (I have a mirror per region / datacenter) have all the > files they should. I sync centrally (to save resource on the PyPI > infrastructure) and then rsync out all the diffs to each region / > datacenter, and under some failure scenarios I could miss a file or many. > So I feel using JSON pulled down from the authoritative source will allow > an async job to verify the MD5 of all the package files on each mirror. > > What are peoples thoughts here? Is it worth a PEP similar to PEP503 going > forward? Can people enhance / share some thoughts on this idea. > > Thanks, > Cooper Lees > me at cooperlees.com > https://cooperlees.com/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Aug 10 16:03:42 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 Aug 2017 16:03:42 -0400 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <20170810134232.GE7395@ando.pearwood.info> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: On 8/10/2017 9:42 AM, Steven D'Aprano wrote: > On Wed, Aug 09, 2017 at 01:23:28PM -0700, Chris Barker wrote: > >> I can't recall the use case(s) at the moment, but I have definitely wanted >> a way to break out of a comprehension -- and not always with infinite >> iterators. >> >> After all, we have "break" in both for and while loops, so clearly there is >> the use case... In both cases, we use 'break' to mean break. If we want to break comprehensions, I think we should continue to use 'break' to mean break instead of twisting 'while' to mean 'break'. > [expression for x in sequence while condition] > > should (I believe) be obvious to anyone who already groks comprehension > syntax. The mapping to a for-loop is admittedly a tad more complex: > > result = [] > for x in sequence: > if not condition: break > result.append(expression) This is the same as result = [] for x in sequence: if condition: result.append(expression) else: break which could be written [expression for x in sequence if condition break] -- Terry Jan Reedy From chris.barker at noaa.gov Thu Aug 10 16:25:24 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 10 Aug 2017 13:25:24 -0700 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore wrote: > Also, there's a potential issue > here - consider > > [expr for var in even_numbers() if is_odd(var) while var < 100] > > This is an infinite loop, even though it has a finite termination > condition (var < 100), because we only test the termination condition > if var is odd, which it never will be. > why is the termination only tested if teh if clause is True? Could then not be processed in parallel? or the while first.... so maybe better to do: [expr for var in even_numbers() while var < 100 if is_odd(var)] Maybe it's just me, but I would certainly expect the while to have precedence. I guess I think of it like this: "if" is providing a filtering mechanism "while" is providing a termination mechanism -- is there a use case anyone can think of when they would want the while to be applied to the list AFTER filtering? Obviously, this is a contrived example. And certainly "don't do that, > then" is a valid response. But my instinct is that people are going to > get this wrong - *especially* in a maintenance environment. sure, but would there be an issue if teh while were given precedence? Overall, I agree with Steven's point. It seems pretty obvious what the > intention is, and while it's probably possible to construct examples > that are somewhat unclear, > > 1. The mechanical rule gives an explicit meaning > 2. People shouldn't be writing such complex comprehensions, so if the > rule doesn't give what they expect, they can always rewrite the code > with an explicit (and clearer) loop. > me too -- a direct translation to a for loop isn't necessary to understand how it works. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Aug 10 16:28:12 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 10 Aug 2017 13:28:12 -0700 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: On Thu, Aug 10, 2017 at 1:03 PM, Terry Reedy wrote: > After all, we have "break" in both for and while loops, so clearly there is >>> the use case... >>> >> > In both cases, we use 'break' to mean break. If we want to break > comprehensions, I think we should continue to use 'break' to mean break > instead of twisting 'while' to mean 'break'. I was thinking that too. >> [expression for x in sequence if condition break] hmm, but if you want to filter, also? [expression for x in sequence if condition if condition break] or [expression for x in sequence if condition break if condition ] both of those seem more confusing to me than while. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Aug 10 16:52:51 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Aug 2017 21:52:51 +0100 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: On 10 August 2017 at 21:25, Chris Barker wrote: > On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore wrote: > >> >> Also, there's a potential issue >> here - consider >> >> [expr for var in even_numbers() if is_odd(var) while var < 100] >> >> This is an infinite loop, even though it has a finite termination >> condition (var < 100), because we only test the termination condition >> if var is odd, which it never will be. > > > why is the termination only tested if teh if clause is True? Could then not > be processed in parallel? or the while first.... See? That's my point - the "obvious" interpretation stops being obvious pretty fast... > so maybe better to do: > > [expr for var in even_numbers() while var < 100 if is_odd(var)] That would work. But I bet people's intuition wouldn't immediately lead to that fix (or indeed, necessarily incline them to put the clauses in this order in the first place). > Maybe it's just me, but I would certainly expect the while to have > precedence. > > I guess I think of it like this: > > "if" is providing a filtering mechanism > > "while" is providing a termination mechanism > > -- is there a use case anyone can think of when they would want the while > to be applied to the list AFTER filtering? Probably not, but when you can have multiple FORs, WHILEs and IFs, in any order, explaining the behaviour precisely while still preserving some sense of "filtering comes after termination" is going to be pretty difficult. [expr for var1 in seq1 if cond1 for var2 in seq2 for var3 in seq3 if cond2 if cond3] is legal - stupid, but legal. Now add while clauses randomly in that, and define your expected semantics clearly so a user (and the compiler!) can determine what the resulting mess means. The main benefit of the current "works like a for loop" interpretation is that it's 100% explicit. Nothing will make a mess like the above good code, but at least it's well-defined. Paul From spencerb21 at live.com Thu Aug 10 16:53:18 2017 From: spencerb21 at live.com (Spencer Brown) Date: Thu, 10 Aug 2017 20:53:18 +0000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> , Message-ID: The logical solution to me is to allow any order of while and if, and follow the same 'rule' as multiple for loops - just nest/test those in that order. Then you can have whatever priority you need. One question though is how this should handle multiple loops - break all of them, or just the current one? - Spencer Brown On 11 Aug 2017, at 6:27 am, Chris Barker > wrote: On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore > wrote: Also, there's a potential issue here - consider [expr for var in even_numbers() if is_odd(var) while var < 100] This is an infinite loop, even though it has a finite termination condition (var < 100), because we only test the termination condition if var is odd, which it never will be. why is the termination only tested if teh if clause is True? Could then not be processed in parallel? or the while first.... so maybe better to do: [expr for var in even_numbers() while var < 100 if is_odd(var)] Maybe it's just me, but I would certainly expect the while to have precedence. I guess I think of it like this: "if" is providing a filtering mechanism "while" is providing a termination mechanism -- is there a use case anyone can think of when they would want the while to be applied to the list AFTER filtering? Obviously, this is a contrived example. And certainly "don't do that, then" is a valid response. But my instinct is that people are going to get this wrong - *especially* in a maintenance environment. sure, but would there be an issue if teh while were given precedence? Overall, I agree with Steven's point. It seems pretty obvious what the intention is, and while it's probably possible to construct examples that are somewhat unclear, 1. The mechanical rule gives an explicit meaning 2. People shouldn't be writing such complex comprehensions, so if the rule doesn't give what they expect, they can always rewrite the code with an explicit (and clearer) loop. me too -- a direct translation to a for loop isn't necessary to understand how it works. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Aug 10 18:12:07 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 10 Aug 2017 15:12:07 -0700 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: On Thu, Aug 10, 2017 at 1:53 PM, Spencer Brown wrote: > The logical solution to me is to allow any order of while and if, and > follow the same 'rule' as multiple for loops - just nest/test those in that > order. > Actually, I think it would be better to only allow one order, and have the "while" always teeted first -- which may mean it should be placed first for clarity. > Then you can have whatever priority you need. One question though is how > this should handle multiple loops - break all of them, or just the current > one? > just the current one, just like a "break", or for that matter, a nested while... -CHB > - Spencer Brown > > On 11 Aug 2017, at 6:27 am, Chris Barker wrote: > > On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore wrote: > > >> Also, there's a potential issue >> here - consider >> >> [expr for var in even_numbers() if is_odd(var) while var < 100] >> >> This is an infinite loop, even though it has a finite termination >> condition (var < 100), because we only test the termination condition >> if var is odd, which it never will be. >> > > why is the termination only tested if teh if clause is True? Could then > not be processed in parallel? or the while first.... > > so maybe better to do: > > [expr for var in even_numbers() while var < 100 if is_odd(var)] > > Maybe it's just me, but I would certainly expect the while to have > precedence. > > I guess I think of it like this: > > "if" is providing a filtering mechanism > > "while" is providing a termination mechanism > > -- is there a use case anyone can think of when they would want the while > to be applied to the list AFTER filtering? > > Obviously, this is a contrived example. And certainly "don't do that, >> then" is a valid response. But my instinct is that people are going to >> get this wrong - *especially* in a maintenance environment. > > > sure, but would there be an issue if teh while were given precedence? > > Overall, I agree with Steven's point. It seems pretty obvious what the >> intention is, and while it's probably possible to construct examples >> that are somewhat unclear, >> >> 1. The mechanical rule gives an explicit meaning >> 2. People shouldn't be writing such complex comprehensions, so if the >> rule doesn't give what they expect, they can always rewrite the code >> with an explicit (and clearer) loop. >> > > me too -- a direct translation to a for loop isn't necessary to understand > how it works. > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Thu Aug 10 22:46:05 2017 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 10 Aug 2017 21:46:05 -0500 Subject: [Python-ideas] PyPI JSON Metadata Standardization for Mirrors In-Reply-To: References: Message-ID: On Wednesday, August 9, 2017, Cooper Ry Lees wrote: > Hi all, > > First time emailer, so please be kind. Also, if this is not the right > mailing list for PyPA talk, I apologize. Please point me in the right > direction if so. > Here are some notes re: changing metadata: https://github.com/pypa/interoperability-peps/issues/31 https://www.google.com/search?q=pep426jsonld Towards JSONLD is the best approach, I think. So, that means it would be best to, if you need to add additional metadata (?) and must key things, also copy the key into an object: {"thing1": {"@id": "thing1", "url": "..."}} Instead of just: {"thing1": {"url": "..."}} https://github.com/pypa/interoperability-peps/issues/31#issuecomment-233195564 > The main reason I have emailed here is I believe it may be PEP time to > standardize the JSON metadata that PyPI makes available, like what was done > for the `'simple API` described in PEP503. > > I've been doing a bit of work on `bandersnatch` (I didn't name it), which > is a PEP 381 mirroring package and wanted to enhance it to also mirror the > handy JSON metadata PyPI generates and makes available @ > https://pypi.python.org/pypi/PKG_NAME/json. > > I've done a PR on bandersnatch as a POC that mirrors both the PyPI > directory structure (URL/pypi/PKG_NAME/json) and created a standardizable > URL/json/PKG_NAME that the former symlinks to (to be served by NGINX / some > other proxy). I'm also contemplating naming the directory 'metadata' rather > than JSON so if some new hotness / we want to change the format down the > line we're not stuck with json as the dirname. This PR can be found here: > https://bitbucket.org/pypa/bandersnatch/pull-requests/33/save-json- > metadata-to-mirror > > My main use case is to write a very simple async 'verifier' tool that will > crawl all the JSON files and then ensure the packages directory on each of > my internal mirrors (I have a mirror per region / datacenter) have all the > files they should. I sync centrally (to save resource on the PyPI > infrastructure) and then rsync out all the diffs to each region / > datacenter, and under some failure scenarios I could miss a file or many. > So I feel using JSON pulled down from the authoritative source will allow > an async job to verify the MD5 of all the package files on each mirror. > > What are peoples thoughts here? Is it worth a PEP similar to PEP503 going > forward? Can people enhance / share some thoughts on this idea. > Here are some notes on making this more efficient: "Add API endpoint to get latest version of all projects" https://github.com/pypa/warehouse/issues/347 ... To http://markmail.org/search/?q=list:org.python.distutils-sig . > > Thanks, > Cooper Lees > me at cooperlees.com > https://cooperlees.com/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 10 23:54:05 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Aug 2017 13:54:05 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> Message-ID: On 11 August 2017 at 01:39, Soni L. wrote: > I'm pretty sure I read somewhere that lambdas and generators share their > syntax, and that syntax is already a subset of python syntax. Would it be > too hard to expose that with a "simplified AST" API? We already do, via the "mode" argument to the compile builtin and to ast.parse: >>> ast.dump(ast.parse("1000 <= x < 1000000", mode="eval")) "Expression(body=Compare(left=Num(n=1000), ops=[LtE(), Lt()], comparators=[Name(id='x', ctx=Load()), Num(n=1000000)]))" >>> ast.parse("import sys", mode="eval") Traceback (most recent call last): File "", line 1, in File "/usr/lib64/python3.6/ast.py", line 35, in parse return compile(source, filename, mode, PyCF_ONLY_AST) File "", line 1 import sys ^ SyntaxError: invalid syntax It's a large part of the reason why passing strings around has so far qualified as "good enough" - providing dedicated syntax for it doesn't actually increase the language's expressiveness all that much, it just has the potential to make static analysis easier by eagerly rendering to an AST rather than having that be handled by the function receiving the argument. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Aug 11 00:34:53 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Aug 2017 14:34:53 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: On 11 August 2017 at 01:39, Paul Moore wrote: > On 10 August 2017 at 14:42, Steven D'Aprano wrote: >> I don't think it is confusing. Regardless of the implementation, the >> meaning of: >> >> [expression for x in sequence while condition] >> >> should (I believe) be obvious to anyone who already groks comprehension >> syntax. The mapping to a for-loop is admittedly a tad more complex: >> >> result = [] >> for x in sequence: >> if not condition: break >> result.append(expression) >> >> but I'm yet to meet anyone who routinely and regularly reads >> comprehensions by converting them to for loops like that. And if they >> did, all they need do is mentally map "while condition" to "if not >> condition: break" and it should all Just Work?. > > The hard part is the interaction between if and while. > > Consider (expr for var in seq if cond1 while cond2): > > This means: > > for var in seq: > if cond1: > if not cond2: break > yield expr > > Note that unlike all other comprehension clauses (for and if) while > doesn't introduce a new level of nesting. That's an inconsistency, and > while it's minor, it would need clarifying (my original draft of this > email was a mess, because I misinterpreted how if and while would > interact, precisely over this point). This is actually how I came to the conclusion that if we were ever to do something like this, the termination condition would need to go *before* the filter condition: (expr for var in seq while loop_cond if filter_cond) <=> for var in seq: if loop_cond: if filter_cond: yield expr else: break With the clauses in that order, the "while" keyword effectively operates as "if-else-break" the same way it does in a regular while loop, and could potentially be introduced as a modifying clause on regular for loops at the same time. One of the neat things the latter would allow is to make it even easier to introduce a diagnostic loop counter into while loops: while condition: ... could become: for iteration in itertools.count(1) while condition: ... rather than having to implement a manually incremented loop counter the way you do today. > Also, there's a potential issue > here - consider > > [expr for var in even_numbers() if is_odd(var) while var < 100] > > This is an infinite loop, even though it has a finite termination > condition (var < 100), because we only test the termination condition > if var is odd, which it never will be. This is another good reason why a termination condition would need to be checked before the filter condition rather than either after it, or only when the filter condition was true. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Aug 11 00:49:10 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Aug 2017 14:49:10 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: <20170811044910.GI7395@ando.pearwood.info> On Thu, Aug 10, 2017 at 01:25:24PM -0700, Chris Barker wrote: > On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore wrote: > > > > Also, there's a potential issue > > here - consider > > > > [expr for var in even_numbers() if is_odd(var) while var < 100] > > > > This is an infinite loop, even though it has a finite termination > > condition (var < 100), because we only test the termination condition > > if var is odd, which it never will be. I'm not sure why Paul thinks this is an issue. There are plenty of ways to accidentally write an infinite loop in a comprehension, or a for loop, already: [expr for var in even_numbers()] will do it, if even_numbers is unexpectedly an infinite iterator. Or you could write: for num in even_numbers(): if is_odd(num) and num > 100: break No loop syntax, whether it is functional style (takewhile, map, etc.), comprehension, or traditional style for loops, enables the programmer to avoid thinking about what they write. > why is the termination only tested if teh if clause is True? Could then not > be processed in parallel? or the while first.... Because we're following the standard Python rule of left-to-right execution. The while clause is tested only if the if clause is true because it follows the if clause. I think that there's an argument to be made for the rule: We can have `if` in a comprehension, or `while`, but not both in order to limit complexity. Analogy: (1) we intentionally limit the decorator @ syntax to a subset of expressions; (2) likewise we intentionally allow (but don't encourage) monkey- patching of Python classes only, not built-ins. Just because we *can* allow arbitrary code combinations doesn't mean we *must*. We have a choice to say: "No, you cannot mix `if` and `when` in the same comprehension. Why? Because we say so. Because it is confusing if you do." I'd be okay with that rule. But if we decide to allow arbitrary combinations of for/if/while in comprehensions, then I think we must keep the same left-to-right rule we have now. Currently we process multiple for/if clauses left-to-right: [expr for x in a if cond for y in b] is equivalent to: for x in a: if cond: for y in b: expr rather than moving the `if` to the end. If you want it at the end, put it there yourself. Adding `while` shouldn't change that. It would be crazy-complicated to have a rule: "the presence of a while means the comprehension is processed in parallel" or "all the while clauses are processed before (after?) the if clauses, regardless of their order of appearance." > so maybe better to do: > > [expr for var in even_numbers() while var < 100 if is_odd(var)] Well sure, that's the *correct* way to write the code: for var in even_numbers(): if not (var < 100): break if is_odd(var): results.append(expr) (for some definition of "correct" -- this is clearly an expensive way to generate an empty list.) But in general one might wish to test the if or the while in either order. > Maybe it's just me, but I would certainly expect the while to have > precedence. Does that apply to these idioms as well? while cond: if flag: ... versus: if flag: while cond: ... I would not expect them to be the same, and nor would I expect these to be the same: [expr for x in seq if flag while cond] [expr for x in seq while cond if flag] > I guess I think of it like this: > > "if" is providing a filtering mechanism > > "while" is providing a termination mechanism > > -- is there a use case anyone can think of when they would want the while > to be applied to the list AFTER filtering? [process(n) for n in numbers while n > 0 if is_odd(n)] Halt on the first zero or negative number, regardless of whether it is even or odd, but process only odd numbers. Paul: > > Obviously, this is a contrived example. And certainly "don't do > > that, then" is a valid response. But my instinct is that people are > > going to get this wrong - *especially* in a maintenance environment. That's the argument for limiting comprehensions to either `if` or `while` but not both. And I actually would be okay with that -- especially if we leave open the possibility of relaxing the prohibition in the future. But personally, I think that's under-estimating the ability of programmers to reason about loops. Of course a comprehension with multiple for/if/while clauses is hard to reason about, and we shouldn't *encourage* them, but we don't prohibit multiple for/if clauses. Why should `while` be held to a higher standard? If we allow people to shoot themselves in the foot by writing complex list comprehensions with ten `for` loops and seven `if` clauses, why should we baulk at allowing them a `while` clause as well? -- Steve From ncoghlan at gmail.com Fri Aug 11 00:52:10 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Aug 2017 14:52:10 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: On 11 August 2017 at 06:53, Spencer Brown wrote: > The logical solution to me is to allow any order of while and if, and follow > the same 'rule' as multiple for loops - just nest/test those in that order. > Then you can have whatever priority you need. One question though is how > this should handle multiple loops - break all of them, or just the current > one? This is why I think a for-while construct in comprehensions would really only make sense in combination with a *statement* level for-while construct, as the problem we have is: - a termination condition can't readily use "if" (even in combination with "break") because that's visually and syntactically ambiguous with a filter condition - a naive translation of a "while" based syntax makes it look like a nested *non-terminating* loop Both of those problems may be resolved if a "for-while" loop exists as a top level looping construct that can terminate based on *either* an iterator being exhausted *or* a condition becoming false. The question then becomes whether or not a "for-while" loop is actually useful enough to be added as a general construct, given that we already have "if not condition: break" as a way of modeling a loop ending early because a condition became false. One way to gather evidence on that front would be to survey the standard library for places where we use "break", and see if any of them would be more readable given a for-while construct, whether as a statement, or as part of the comprehension syntax. (Note: I'm not interested enough in the idea to do that evidence gathering myself, I'm just pointing it out in case anyone is curious enough to take the time to collect those details) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Aug 11 01:13:36 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Aug 2017 15:13:36 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: <20170811051336.GJ7395@ando.pearwood.info> On Fri, Aug 11, 2017 at 02:34:53PM +1000, Nick Coghlan wrote: > This is actually how I came to the conclusion that if we were ever to > do something like this, the termination condition would need to go > *before* the filter condition: What if you want to check the filter condition before the termination condition? I have an iterable of arbitrary objects. I want to ignore anything that isn't a string, and halt if the string doesn't start with "A". This is easy: [expr for s in objects if isinstance(s, str) while s.startswith("A")] Why should we prohibit expressing this, and instead write it as this? [expr for s in objects while (s.startswith("A")) if isinstance(s, str) else True) if isinstance(s, str)] Or split into multiple comprehensions? [expr for s in [obj for obj in objects if isinstance(obj, str)] while s.startswith("A")] > (expr for var in seq while loop_cond if filter_cond) > > <=> > > for var in seq: > if loop_cond: > if filter_cond: > yield expr > else: > break We can still expand the clauses if they are presented in the opposite order: (expr for var in seq if filter_cond while loop_cond) <=> for var in seq: if filter_cond: if loop_cond: yield expr else: break There's no need to prohibit that. It is meaningful and useful and just because somebody might accidentally fail to exit an infinite loop is no reason to ban this. > This is another good reason why a termination condition would need to > be checked before the filter condition rather than either after it, or > only when the filter condition was true. Why is this a problem that needs solving? Who is to say that an infinite generator expression isn't exactly what the programmer wants? If the halting condition is not true, the generator expression will either keep going until the iterator is exhausted, or it will be an infinite generator just like the unprocessed, unfiltered source iterator. This is not necessarily a problem. -- Steve From steve at pearwood.info Fri Aug 11 01:18:34 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Aug 2017 15:18:34 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <20170811044910.GI7395@ando.pearwood.info> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> <20170811044910.GI7395@ando.pearwood.info> Message-ID: <20170811051834.GK7395@ando.pearwood.info> On Fri, Aug 11, 2017 at 02:49:10PM +1000, Steven D'Aprano wrote: > On Thu, Aug 10, 2017 at 01:25:24PM -0700, Chris Barker wrote: > > I guess I think of it like this: > > > > "if" is providing a filtering mechanism > > > > "while" is providing a termination mechanism > > > > -- is there a use case anyone can think of when they would want the while > > to be applied to the list AFTER filtering? Oops, sorry I had a thinko and read your question in the opposite sense than it actually is. See my response to Nick for an example: I have an iterable of arbitrary objects. I want to ignore anything that isn't a string, and halt if the string doesn't start with "A". [expr for s in objects if isinstance(s, str) while s.startswith("A")] -- Steve From ncoghlan at gmail.com Fri Aug 11 01:28:05 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Aug 2017 15:28:05 +1000 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <20170811051336.GJ7395@ando.pearwood.info> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> <20170811051336.GJ7395@ando.pearwood.info> Message-ID: On 11 August 2017 at 15:13, Steven D'Aprano wrote: > On Fri, Aug 11, 2017 at 02:34:53PM +1000, Nick Coghlan wrote: >> This is another good reason why a termination condition would need to >> be checked before the filter condition rather than either after it, or >> only when the filter condition was true. > > Why is this a problem that needs solving? Because the most obvious interpretation of a completely independent "while" clause in comprehensions would be as a nested loop inside the outer for loop, not as a nested if-else-break statement. As a result of that, I'm only personally prepared to support for-while comprehensions if they're syntactic sugar for a combined statement level for-while loop that makes it clear why only the "for" clauses in a comprehension create new loops. I *wouldn't* be prepared to support them if they could only be explained in terms of a direct mapping to an if statement and had no statement level counterpart that actually used the "while" keyword. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Fri Aug 11 06:01:22 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 11 Aug 2017 11:01:22 +0100 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: <20170811044910.GI7395@ando.pearwood.info> References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> <20170811044910.GI7395@ando.pearwood.info> Message-ID: On 11 August 2017 at 05:49, Steven D'Aprano wrote: > On Thu, Aug 10, 2017 at 01:25:24PM -0700, Chris Barker wrote: >> On Thu, Aug 10, 2017 at 8:39 AM, Paul Moore wrote: >> >> >> > Also, there's a potential issue >> > here - consider >> > >> > [expr for var in even_numbers() if is_odd(var) while var < 100] >> > >> > This is an infinite loop, even though it has a finite termination >> > condition (var < 100), because we only test the termination condition >> > if var is odd, which it never will be. > > I'm not sure why Paul thinks this is an issue. There are plenty of ways > to accidentally write an infinite loop in a comprehension, or a for > loop, already: Mostly because I work in a support and maintenance environment, where we routinely see code that *originally* made sense, but which was over time modified in ways that break things - usually precisely because coders who in theory understand how to write such things correctly, end up not taking the time to fully understand the constructs they are modifying. Of course that's wrong, but it's sadly all too common, and for that reason I'm always wary of constructs that need thinking through carefully to understand the implications. Nick's original {x for x in itertools.count(0) if 1000 <= x while x < 1000000} was like that. It was *sort of* obvious that it meant "numbers between 1_000 and 1_000_000, but the interaction between "if" and "while" wasn't clear to me. If I were asked to rush in a change to only pick odd numbers, {x for x in itertools.count(0) if 1000 <= x and is_odd(x) while x < 1000000} seems right to me, but quick - what about edge cases? It's not that I can't get it right, nor is it that I can't test that I *did* get it right, just that this sort of "quick fix" is very common in the sort of real-world coding I see regularly, and a huge advantage of Python is that it's hard to get in a situation where the obvious guess is wrong. Don't get me wrong - I'm not arguing that the sky is falling. Just that this construct isn't as easy to understand as it seems at first (and that hard-to-understand cases appear *before* you hit the point where it's obvious that the statement is too complex and should be refactored. Paul From contact at brice.xyz Fri Aug 11 08:53:11 2017 From: contact at brice.xyz (Brice Parent) Date: Fri, 11 Aug 2017 14:53:11 +0200 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: (re-posting here as I first mistakenly answered directly to Terry. Sorry about that!) Le 10/08/17 ? 22:03, Terry Reedy a ?crit : > [...] > result = [] > for x in sequence: > if condition: > result.append(expression) > else: > break > > which could be written > > [expression for x in sequence if condition break] > It's what I thought too. Adding a `while` clause here just overly complicates the understanding of the comprehension. The `break` keyword is already easily understandable and helps to map the comprehension with the plain for-loop (I like this mapping for its reverse counterpart, as I often start with plain for-loops to rewrite them later to comprehensions when it makes sense). I would probably suggest this instead of Terry's proposal, though: [expression for x in sequence if condition*or *break] (maybe it's what you meant?). I suggest this because it doesn't imply the execution of a statement inside the comprehension, but just to continue the evaluation as it's always done. I admit it feels a bit hacky, but maybe just until we get used to it? -Brice -------------- next part -------------- An HTML attachment was scrubbed... URL: From jhihn at gmx.com Fri Aug 11 10:57:46 2017 From: jhihn at gmx.com (Jason H) Date: Fri, 11 Aug 2017 16:57:46 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? Message-ID: Before I done my firesuit, I'd like to say that I much prefer python and I rail on JS whenever I can. However these days it is quite common to be doing work in both Python and Javascript. Harmonizing the two would help JS developers pick up the language as well as people like me that are stuck working in JS as well. TIOBE has Python at 5 and JS at 8 https://www.tiobe.com/tiobe-index/ Redmonk: 1 and 1, respectively http://redmonk.com/sogrady/2017/06/08/language-rankings-6-17/ PYPL: 2 and 5 respectively http://pypl.github.io/PYPL.html While JS is strongly for web (Node.JS, Browsers) and Python has a weak showing (Tornado, Flask), Python is very popular on everything else on the backend where JS isn't and isn't likely to be. The I'm making point is not to choose a 'winner', but to make the observation that: given that the tight clustering of the two languages there will be considerable overlap. People like me are asked to do both quite frequently. So I'd like a little more harmony to aid in my day-to-day. I have just as many python files as JS files open in my editor at this moment. There are several annoyances that if removed, would go a long way. 1. Object literals: JS: {a:1} vs Python: {'a':1} Making my fingers dance on ' or " is not a good use of keystrokes, and it decreases readability. However a counter argument here is what about when the a is a variable? JS allows o[a] as a way to assigned to a property that is a variable. Python of course offers functions that do this, but for simple objects, this would very much be appreciated. The point here is this is 2. Join: JS: [].join(s) vs Python: s.join([]) I've read the justification for putting join on a string, and it makes sense. But I think we should put it on the list too. 3. Allow C/C++/JS style comments: JS:[ //, /* ] vs Python # This one is pretty self-explanatory. Some might want even more harmony, but I don't know the repercussions of all of that. I think the above could be implemented without breaking anything. What I do know is that 85% of my friction would be removed if the above were implemented. From tritium-list at sdamon.com Fri Aug 11 11:09:05 2017 From: tritium-list at sdamon.com (Alex Walters) Date: Fri, 11 Aug 2017 11:09:05 -0400 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: Message-ID: <4b04501d312b3$c6c91830$545b4890$@sdamon.com> > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Jason H > Sent: Friday, August 11, 2017 10:58 AM > To: python-ideas at python.org > Subject: [Python-ideas] Towards harmony with JavaScript? > > Before I done my firesuit, I'd like to say that I much prefer python and I rail > on JS whenever I can. However these days it is quite common to be doing > work in both Python and Javascript. Harmonizing the two would help JS > developers pick up the language as well as people like me that are stuck > working in JS as well. > > TIOBE has Python at 5 and JS at 8 https://www.tiobe.com/tiobe-index/ > Redmonk: 1 and 1, respectively > http://redmonk.com/sogrady/2017/06/08/language-rankings-6-17/ > PYPL: 2 and 5 respectively http://pypl.github.io/PYPL.html > > While JS is strongly for web (Node.JS, Browsers) and Python has a weak > showing (Tornado, Flask), And Django and pyramid. And don't forget youtube. Python has NO weakness on the web. > Python is very popular on everything else on the > backend where JS isn't and isn't likely to be. The I'm making point is not to > choose a 'winner', but to make the observation that: given that the tight > clustering of the two languages there will be considerable overlap. People > like me are asked to do both quite frequently. So I'd like a little more > harmony to aid in my day-to-day. I have just as many python files as JS files > open in my editor at this moment. > > There are several annoyances that if removed, would go a long way. > 1. Object literals: JS: {a:1} vs Python: {'a':1} > Making my fingers dance on ' or " is not a good use of keystrokes, and it > decreases readability. However a counter argument here is what about > when the a is a variable? JS allows o[a] as a way to assigned to a property that > is a variable. Python of course offers functions that do this, but for simple > objects, this would very much be appreciated. Been discussed. Python will not make the same design flaw as JS in this case. If you really want bare keys, do `dict(a=1)` > The point here is this is > > 2. Join: JS: [].join(s) vs Python: s.join([]) > I've read the justification for putting join on a string, and it makes sense. > But I think we should put it on the list too. Again, design decision python actually got right - you don't have to implement a join method, you just have to pass an iterable of strings to the one join method. There is no question as to if an iterable of strings has a join method - as long as its iterable, it's joinable. This too has been discussed ad nauseum, and is not going to change. > > 3. Allow C/C++/JS style comments: JS:[ //, /* ] vs Python # > This one is pretty self-explanatory. > // is valid python syntax (for an operator) - that makes the parser a pain to implement. I don't actually see any value at all in unifying the comment characters... its not like this is Windows Batch, where the comment character is `REM` - # is used in a metric ton of languages. > Some might want even more harmony, but I don't know the repercussions of > all of that. I think the above could be implemented without breaking > anything. What I do know is that 85% of my friction would be removed if the > above were implemented. > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Fri Aug 11 11:15:46 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Aug 2017 01:15:46 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: Message-ID: On Sat, Aug 12, 2017 at 12:57 AM, Jason H wrote: > Before I done my firesuit, I'd like to say that I much prefer python and I rail on JS whenever I can. However these days it is quite common to be doing work in both Python and Javascript. Harmonizing the two would help JS developers pick up the language as well as people like me that are stuck working in JS as well. > > TIOBE has Python at 5 and JS at 8 https://www.tiobe.com/tiobe-index/ > Redmonk: 1 and 1, respectively http://redmonk.com/sogrady/2017/06/08/language-rankings-6-17/ > PYPL: 2 and 5 respectively http://pypl.github.io/PYPL.html > > While JS is strongly for web (Node.JS, Browsers) and Python has a weak showing (Tornado, Flask), Python is very popular on everything else on the backend where JS isn't and isn't likely to be. The I'm making point is not to choose a 'winner', but to make the observation that: given that the tight clustering of the two languages there will be considerable overlap. People like me are asked to do both quite frequently. So I'd like a little more harmony to aid in my day-to-day. I have just as many python files as JS files open in my editor at this moment. > Python has a number of strong web frameworks - Django is probably the best known. > There are several annoyances that if removed, would go a long way. > 1. Object literals: JS: {a:1} vs Python: {'a':1} > Making my fingers dance on ' or " is not a good use of keystrokes, and it decreases readability. However a counter argument here is what about when the a is a variable? JS allows o[a] as a way to assigned to a property that is a variable. Python of course offers functions that do this, but for simple objects, this would very much be appreciated. > The point here is this is > Disagreed. Python is both more consistent and more flexible than JS here. More flexible in that dict keys can be any hashable type, where JS object properties are always strings; and more consistent in that a value is always represented the same way. Consider literals and variables as dict keys in Python: # Literal d = {'a': 1} print(d['a']) d['a'] = 2 # Variable key = 'a' d = {key: 1} print(d[key]) d[key] = 2 Contrast JS: // Literal d = {a: 1} console.log(d.a) d.a = 2 // Variable key = 'a' d = {[key]: 1} console.log(d[key]) d[key] = 2 In Python, a literal string is always in quotes, and an unquoted symbol is always a name lookup. In JS, you can use the shorthand of dot notation for literals that are valid symbols, but to use a variable, you need to switch syntax. (To be fair, this is simply adding a shorthand that Python doesn't have; you could use square brackets and string literals in JS too. But people don't do that, so a programmer has to know to read it using dot notation primarily.) Coupled with the increased flexibility in what you can have in a dict key, Python's requirement to quote keys is a small price to pay for consistency. > 2. Join: JS: [].join(s) vs Python: s.join([]) > I've read the justification for putting join on a string, and it makes sense. But I think we should put it on the list too. This might be safe to add; but it needs to be well worth adding, since it's just a different spelling for the exact same thing. -0. > 3. Allow C/C++/JS style comments: JS:[ //, /* ] vs Python # > This one is pretty self-explanatory. If you'd asked for this a few years ago, maybe, but since // is a division operator, that part of it won't fly. Possibly /* comments */ could be added though. That's about all that I'd support adding, though. ChrisA From jhihn at gmx.com Fri Aug 11 11:35:29 2017 From: jhihn at gmx.com (Jason H) Date: Fri, 11 Aug 2017 17:35:29 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: Message-ID: Thank for all the feedback so far, even if it's not the most enthusiastic response to the ideas. One thing I missed, and I don't know how I could (total face-palm) is: 4. Other list methods: i.e. and specifically: [].push(item) vs [].append() > Sent: Friday, August 11, 2017 at 10:57 AM > From: "Jason H" > To: python-ideas at python.org > Subject: [Python-ideas] Towards harmony with JavaScript? > > Before I done my firesuit, I'd like to say that I much prefer python and I rail on JS whenever I can. However these days it is quite common to be doing work in both Python and Javascript. Harmonizing the two would help JS developers pick up the language as well as people like me that are stuck working in JS as well. > > TIOBE has Python at 5 and JS at 8 https://www.tiobe.com/tiobe-index/ > Redmonk: 1 and 1, respectively http://redmonk.com/sogrady/2017/06/08/language-rankings-6-17/ > PYPL: 2 and 5 respectively http://pypl.github.io/PYPL.html > > While JS is strongly for web (Node.JS, Browsers) and Python has a weak showing (Tornado, Flask), Python is very popular on everything else on the backend where JS isn't and isn't likely to be. The I'm making point is not to choose a 'winner', but to make the observation that: given that the tight clustering of the two languages there will be considerable overlap. People like me are asked to do both quite frequently. So I'd like a little more harmony to aid in my day-to-day. I have just as many python files as JS files open in my editor at this moment. > > There are several annoyances that if removed, would go a long way. > 1. Object literals: JS: {a:1} vs Python: {'a':1} > Making my fingers dance on ' or " is not a good use of keystrokes, and it decreases readability. However a counter argument here is what about when the a is a variable? JS allows o[a] as a way to assigned to a property that is a variable. Python of course offers functions that do this, but for simple objects, this would very much be appreciated. > The point here is this is > > 2. Join: JS: [].join(s) vs Python: s.join([]) > I've read the justification for putting join on a string, and it makes sense. But I think we should put it on the list too. > > 3. Allow C/C++/JS style comments: JS:[ //, /* ] vs Python # > This one is pretty self-explanatory. > > Some might want even more harmony, but I don't know the repercussions of all of that. I think the above could be implemented without breaking anything. What I do know is that 85% of my friction would be removed if the above were implemented. > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From rosuav at gmail.com Fri Aug 11 11:49:39 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Aug 2017 01:49:39 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: Message-ID: On Sat, Aug 12, 2017 at 1:35 AM, Jason H wrote: > Thank for all the feedback so far, even if it's not the most enthusiastic response to the ideas. > > One thing I missed, and I don't know how I could (total face-palm) is: > 4. Other list methods: i.e. and specifically: [].push(item) vs [].append() > Like [].join, this is simply adding a duplicate spelling for something that already exists. That means that everyone who reads Python code would have to know both forms. That's not a good use of programmer time. And unlike [].join, it's purely a duplicate keyword, not giving you even the benefit of writing something in either order. So I'm definitely -1 on this. Have you considered making JS more like Python instead of the other way around? You can mess with core data types in JS, adding methods to them. For example: String.prototype.join = function(arr) { return arr.join(this); } var strings = ["Hello", "world"]; console.log(" ".join(strings)); That doesn't require any core language changes, and will cover most of your issues. (You can simply choose to always use quoted strings for JS keys, for instance.) You can't implement #comment but you can do all the rest. Try that for a while, and then see how people like it who collaborate with you. That's really the key. ChrisA From alberto at metapensiero.it Fri Aug 11 13:04:42 2017 From: alberto at metapensiero.it (Alberto Berti) Date: Fri, 11 Aug 2017 19:04:42 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? References: Message-ID: <87shgykpcl.fsf@ender.lizardnet> >>>>> "Jason" == Jason H writes: Jason> While JS is strongly for web (Node.JS, Browsers) and Python has a weak Jason> showing (Tornado, Flask), Python is very popular on everything else on Jason> the backend where JS isn't and isn't likely to be. The I'm making Jason> point is not to choose a 'winner', but to make the observation that: Jason> given that the tight clustering of the two languages there will be Jason> considerable overlap. People like me are asked to do both quite Jason> frequently. So I'd like a little more harmony to aid in my Jason> day-to-day. I have just as many python files as JS files open in my Jason> editor at this moment. I too do much work on both "sides" Jason> There are several annoyances that if removed, would go a long way. Jason> 1. Object literals: JS: {a:1} vs Python: {'a':1} Jason> Making my fingers dance on ' or " is not a good use of keystrokes, Jason> and it decreases readability. However a counter argument here is what Jason> about when the a is a variable? JS allows o[a] as a way to assigned to Jason> a property that is a variable. Python of course offers functions that Jason> do this, but for simple objects, this would very much be appreciated. Jason> The point here is this is Jason> 2. Join: JS: [].join(s) vs Python: s.join([]) Jason> I've read the justification for putting join on a string, and it makes sense. But I think we should put it on the list too. Jason> 3. Allow C/C++/JS style comments: JS:[ //, /* ] vs Python # Jason> This one is pretty self-explanatory. Other friction points to me are (just to name a few): * when you have to check if a variable contains a string, you have to check for both "typeof foo == 'string'" and "foo instanceof String" * you cannot use negative indexes on Array * when you want to know the length of a sized object you have to know (and to remember) how to ask it: Array has .length, newer Map and Set objects have .size For the goal of reducing the friction (the mind switching when working with both the languages) I have created a tool ( https://github.com/azazel75/metapensiero.pj ) which allows me to write valid Python and translates this to nice JS while taking care of most of these nuances. At the same time it doesn't raise any barrier between the translated code and any other JS library around (and I use them a lot). When I created I wasn't unsure if the goal was worthy, but after developing some large library with it I must say than I'm quite happy using it and that I had positive feedbacks from other developers. I suggest you to take a look at it. cheers, Alberto From rosuav at gmail.com Fri Aug 11 13:37:08 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Aug 2017 03:37:08 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: <87shgykpcl.fsf@ender.lizardnet> References: <87shgykpcl.fsf@ender.lizardnet> Message-ID: On Sat, Aug 12, 2017 at 3:04 AM, Alberto Berti wrote: > For the goal of reducing the friction (the mind switching when working > with both the languages) I have created a tool ( > https://github.com/azazel75/metapensiero.pj ) which allows me to write > valid Python and translates this to nice JS while taking care of most of > these nuances. At the same time it doesn't raise any barrier between the > translated code and any other JS library around (and I use them a lot). What do you do about all the places where the languages have significantly different semantics? For instance, a Python integer can store more values than a Python float (which is broadly compatible with a JS Number), but in JS, bitwise operations restrict the value to 32-bit. And subscripting or iterating over a string containing astral (non-BMP) characters will do different things. Or when you use non-string keys in a dictionary (notably integers). Transpiling is an extremely dangerous thing to do a partial job of. ChrisA From brenbarn at brenbarn.net Fri Aug 11 14:11:03 2017 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Fri, 11 Aug 2017 11:11:03 -0700 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: Message-ID: <598DF337.2020605@brenbarn.net> On 2017-08-11 07:57, Jason H wrote: > Before I done my firesuit, I'd like to say that I much prefer python > and I rail on JS whenever I can. However these days it is quite > common to be doing work in both Python and Javascript. Harmonizing > the two would help JS developers pick up the language as well as > people like me that are stuck working in JS as well. In general I am instinctively opposed to any changes aimed at making Python more like JavaScript, because I think that overall Python is a much better designed language than JavaScript, and JavaScript has numerous profound flaws, so almost anything that makes Python more like JavaScript is likely to make it worse. In particular, all of the changes you propose are very minor things which amount to adding some duplicate or more convenient or almost-the-same way to do something that can already be done. This kind of accumulation of confusing alternatives is exactly the kind of thing that makes JS suck. You have == vs ===, for vs for..in vs for..of, optional semicolons, and on and on and on. This is because people did not think about the right way to do things the first time in JS, and they don't want to break backward compatibility, so they just keep adding new features to paper over the deeper problems. Happily, Python avoids the most damaging cases of this, because Python has far fewer deep problems, and small problems aren't worth the clutter of having multiple ways to do the same thing. >1. Object literals: JS: {a:1} vs Python: {'a':1} Making my fingers dance > on ' or " is not a good use of keystrokes, and it decreases > readability. However a counter argument here is what about when the a > is a variable? JS allows o[a] as a way to assigned to a property that > is a variable. Python of course offers functions that do this, but > for simple objects, this would very much be appreciated. The point > here is this is Was your message truncated here? "The point here is this is" what? In any case, the objection you've already raised is enough to kill this proposal for me. Being able to use a variable for a key is a huge and very real difference in functionality between Python and JS. Being able to not type quotes is a small advantage in comparison to that. You can already do dict(a=1, b=2) if you really want to. > 2. Join: JS: [].join(s) vs Python: s.join([]) I've read the > justification for putting join on a string, and it makes sense. But I > think we should put it on the list too. I agree it is confusing at first. Once you know it, you know it. Also, adding it to list still wouldn't make it available for tuples, dicts, or any other iterables. (JavaScript "avoided" this problem by not providing any way to define your own iterables until 2015, so everyone was stuck using plain arrays.) I do think a case could be made for designing a more comprehensive iterable class hierarchy that would provide things like this, but just adding a single method to a single type isn't worth it. > 3. Allow C/C++/JS style comments: JS:[ //, /* ] vs Python # This one > is pretty self-explanatory. Again, the gain is tiny. Python is already quite a readable language. I don't see "make it easily writable for people who don't know Python without looking up how to write comments" as a useful goal. As with .join(), once you learn that Python uses #, you know it, and it's not really a problem. Also, as someone else mentioned, // is a valid operator in Python, making its use as a comment marker potentially ambiguous. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From alberto at metapensiero.it Fri Aug 11 15:13:33 2017 From: alberto at metapensiero.it (Alberto Berti) Date: Fri, 11 Aug 2017 21:13:33 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? References: <87shgykpcl.fsf@ender.lizardnet> Message-ID: <87k229lxya.fsf@ender.lizardnet> >>>>> "Chris" == Chris Angelico writes: Chris> On Sat, Aug 12, 2017 at 3:04 AM, Alberto Berti wrote: >> For the goal of reducing the friction (the mind switching when working >> with both the languages) I have created a tool ( >> https://github.com/azazel75/metapensiero.pj ) which allows me to write >> valid Python and translates this to nice JS while taking care of most of >> these nuances. At the same time it doesn't raise any barrier between the >> translated code and any other JS library around (and I use them a lot). Chris> What do you do about all the places where the languages have Chris> significantly different semantics? For instance, a Python integer can Chris> store more values than a Python float (which is broadly compatible Chris> with a JS Number), but in JS, bitwise operations restrict the value to Chris> 32-bit. As of now, I do nothing. As I said, the goal of the tool is not to shield you from JS, for this reason it's not meant for beginners (in both JS or Python). You always manipulate JS objects, but allows you to to be naive on all that plethora of JS idiosyncrasies (from a Python pow at least) that you have to think about when you frequently switch from python to js. Because such list of idiosyncrasies may be subjective, I hope to add to it a kind of "layered" translation where the user can add its own set of rules and/or runtime checking or conversion. I've helped porting macropy ( https://github.com/azazel75/macropy ) especially for the purpose of simplify AST manipulation, but it's not done yet. Chris> And subscripting or iterating over a string containing astral Chris> (non-BMP) characters will do different things. This is strange... i tested it with javascripthon embedded interpreter (which is ES5 compatible) and it indeed shows that for example the string '?????' isn't correctly (in the python sense) iterated over but testing it on an ES6 compatible interpreter (more or less latest V8) does the right thing. Something has changed between the two. Chris> Or when you use Chris> non-string keys in a dictionary (notably integers). you should use a Map in such case, as the tool doesn't reimplement most of the Python data api. That would mean build some wrapping type that maybe would simplify converting existing Python code to run on JS, but that would probably mean a more difficult interfacing with JS third party libraries (React, Angular, name your here) that i don't want to reimplement... See here ( https://github.com/azazel75/metapensiero.pj/issues/19 ) for a brief discussion on this matter. Chris> Transpiling is an Chris> extremely dangerous thing to do a partial job of. Even breathing can be dangerous in some environments... Bridging two different concepts together is always a partial job... Again, the use case may seem minimal to you but I can assure that it helps on the day to day work with JS. From rosuav at gmail.com Fri Aug 11 15:19:37 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Aug 2017 05:19:37 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: <87k229lxya.fsf@ender.lizardnet> References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> Message-ID: On Sat, Aug 12, 2017 at 5:13 AM, Alberto Berti wrote: > Chris> What do you do about all the places where the languages have > Chris> significantly different semantics? For instance, a Python integer can > Chris> store more values than a Python float (which is broadly compatible > Chris> with a JS Number), but in JS, bitwise operations restrict the value to > Chris> 32-bit. > > As of now, I do nothing. As I said, the goal of the tool is not to > shield you from JS, for this reason it's not meant for beginners (in > both JS or Python). You always manipulate JS objects, but allows you to > to be naive on all that plethora of JS idiosyncrasies (from a Python pow > at least) that you have to think about when you frequently switch from > python to js. > > Chris> Transpiling is an > Chris> extremely dangerous thing to do a partial job of. > > Even breathing can be dangerous in some environments... > Bridging two different concepts together is always a partial job... > > Again, the use case may seem minimal to you but I can assure that it > helps on the day to day work with JS. Speaking as someone whose day job is teaching Python and JavaScript, I don't like the idea of this kind of thing. You're bringing (some) Python syntax, but sticking to JS semantics. That means your source code looks like Python, but runs like JS. You can't afford to ever run it through a Python interpreter (the semantics will be wrong). There's already plenty of confusion in the world. I don't want to add more. It would be far better to base your language on JS syntax, since it's using JS semantics; just add in a handful of Python features that you really miss. ChrisA From jsbueno at python.org.br Fri Aug 11 15:42:58 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 11 Aug 2017 16:42:58 -0300 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> Message-ID: On 11 August 2017 at 16:19, Chris Angelico wrote: > On Sat, Aug 12, 2017 at 5:13 AM, Alberto Berti > wrote: > > Chris> What do you do about all the places where the languages have > > Chris> significantly different semantics? For instance, a Python > integer can > > Chris> store more values than a Python float (which is broadly > compatible > > Chris> with a JS Number), but in JS, bitwise operations restrict the > value to > > Chris> 32-bit. > > > > As of now, I do nothing. As I said, the goal of the tool is not to > > shield you from JS, for this reason it's not meant for beginners (in > > both JS or Python). You always manipulate JS objects, but allows you to > > to be naive on all that plethora of JS idiosyncrasies (from a Python pow > > at least) that you have to think about when you frequently switch from > > python to js. > > > > Chris> Transpiling is an > > Chris> extremely dangerous thing to do a partial job of. > > > > Even breathing can be dangerous in some environments... > > Bridging two different concepts together is always a partial job... > > > > Again, the use case may seem minimal to you but I can assure that it > > helps on the day to day work with JS. > > Speaking as someone whose day job is teaching Python and JavaScript, I > don't like the idea of this kind of thing. You're bringing (some) > Python syntax, but sticking to JS semantics. That means your source > code looks like Python, but runs like JS. You can't afford to ever run > it through a Python interpreter (the semantics will be wrong). > > There's already plenty of confusion in the world. I don't want to add > more. It would be far better to base your language on JS syntax, since > it's using JS semantics; just add in a handful of Python features that > you really miss. > > Well,I hope you both had at least skimmed over "brython" - it started a couple years ago with somewhat the same "won't o full Python purpose" - but nowadays they have a very conformant implementation of Python3 that is transpiled client-side into working javascript. (It does use some JS third party library to be able to implement Python integers, for example - but I think one can use a "pragma" like statement to use "native" numbers for performance) http://brython.info ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Aug 11 15:47:05 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Aug 2017 05:47:05 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> Message-ID: On Sat, Aug 12, 2017 at 5:42 AM, Joao S. O. Bueno wrote: > > > Well,I hope you both had at least skimmed over "brython" - it started a > couple years ago > with somewhat the same "won't o full Python purpose" - but nowadays they > have a > very conformant implementation of Python3 that is transpiled client-side > into working javascript. > > (It does use some JS third party library to be able to implement Python > integers, for example - > but I think one can use a "pragma" like statement to use "native" numbers > for performance) > > http://brython.info I'm aware of Brython; its purpose is not to let you use JS semantics with Py syntax, but to let you run Python code in a web browser, with full Python semantics. You'll also find PyPyJS, which does a similar job - it uses JS code as a form of machine code, JIT compiling to JS. Taking this off the list as it's no longer on topic. ChrisA From rosuav at gmail.com Fri Aug 11 15:47:51 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Aug 2017 05:47:51 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> Message-ID: On Sat, Aug 12, 2017 at 5:47 AM, Chris Angelico wrote: > > Taking this off the list as it's no longer on topic. ... at least, I *thought* I was taking it off list. Between me and Gmail, some thoughts got crossed. Sorry! ChrisA From chris.barker at noaa.gov Fri Aug 11 15:55:44 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 11 Aug 2017 12:55:44 -0700 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: [...] > result = [] > for x in sequence: > if condition: > result.append(expression) > else: > break > > which could be written > > [expression for x in sequence if condition break] > > It's what I thought too. Adding a `while` clause here just overly > complicates the understanding of the comprehension. The `break` keyword is > already easily understandable and helps to map the comprehension with the > plain for-loop (I like this mapping for its reverse counterpart, as I often > start with plain for-loops to rewrite them later to comprehensions when it > makes sense). > having the "if condition" there seems confusing to me, particularly if you want an if condition as a filter as well: [expression for x in sequence if condition1 if condition2 break] which makes me want: [expression for x in sequence if condition1 breakif condition2] adding another keyword is a pretty big deal though! would it be possible to add a keyword ("breakif" in this case) that was ONLY legal in comprehensions? Though I still dop'nt think using "while" would really introduce that much confusion -- sure, it doesn't introduce a new loop, but, as someone pointed out earlier in this thread it really is only changing from a: "while do" to a "do while" construct -- so means pretty much the same thing. I agree that scanning a code base to see if there really are many loops in practice that could use this construct would be a good way to see if there is any point. And it would also be interesting to do a survey of "random" folks as to how they would interpret such a construct -- it's pretty hard for a small group to know what is and isn't "confusing" -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From markusmeskanen at gmail.com Fri Aug 11 16:08:37 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Fri, 11 Aug 2017 23:08:37 +0300 Subject: [Python-ideas] Generator syntax hooks? In-Reply-To: References: <82672ca6-315a-d054-f039-d5c6c7c630b3@gmail.com> <20170810134232.GE7395@ando.pearwood.info> Message-ID: Though I still dop'nt think using "while" would really introduce that much confusion -- sure, it doesn't introduce a new loop, but, as someone pointed out earlier in this thread it really is only changing from a: "while do" to a "do while" construct -- so means pretty much the same thing. +1 for the "while" from me too, I don't think most people would find it confusing (supposing they don't find current [x for x in foo if ...] confusing either), and introducing a break there is just more of a mess. To those who say that it might get ugly if you do something like: [x for y in foo for x in y while x != y if x + y < 100] This still isn't even unbearable, and once it gets that hard, maybe you should consider something else anyways. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Aug 11 16:10:34 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 11 Aug 2017 13:10:34 -0700 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> Message-ID: > > Taking this off the list as it's no longer on topic. > not totally -- I'm going to add my thoughts: 1) If you want a smoother transition between server-side Python and in-browser code, maybe you're better off using one of the "python in the browser" solutions -- there are at least a few viable ones. 2) a javascript "object" is quite a different beast than a pyton dict, dispite similar syntax for a literal. Making the literals even more similar would simpily add confusion. a JS object is a bit more like a types.SimpleNamespace in python, actually. Making Python look a bit more like JS is NOT a good goal! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alberto at metapensiero.it Fri Aug 11 16:31:50 2017 From: alberto at metapensiero.it (Alberto Berti) Date: Fri, 11 Aug 2017 22:31:50 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> Message-ID: <87fucxlubt.fsf@ender.lizardnet> >>>>> "Chris" == Chris Angelico writes: Chris> Speaking as someone whose day job is teaching Python and Chris> JavaScript, I have some "I'm a good teacher" stickers laying somewhere, do you want me to send them to you so you can put them somewhere? Chris> I don't like the idea of this kind of thing. You're bringing (some) Chris> Python syntax, but sticking to JS semantics. That means your source Chris> code looks like Python, but runs like JS. You can't afford to ever run Chris> it through a Python interpreter (the semantics will be wrong). Not always, The tool has a bunch of tests that do evaluate some code with the two interpreters and then check the results. To some degree it works... Chris> There's already plenty of confusion in the world. I don't want to add Chris> more. YOU don't want to add? :-) Chris> It would be far better to base your language on JS syntax, since Chris> it's using JS semantics; just add in a handful of Python features that Chris> you really miss. It's not really so confusing, most code I wrote with it it's perfectly understandable Python code. For me, one thing is the language, one other thing are the libraries or the builtin classes it's usually shipped with. The tool reads valid Python and writes valid ES6 JavaScript. As the documentation states, it allows you to retain most of Python language semantics (like for example you can have a working try...except...finally statement, instead of what vanilla JS gives you) and some of the library semantics. nothing more, nothing less. From rosuav at gmail.com Fri Aug 11 16:58:21 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Aug 2017 06:58:21 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: <87fucxlubt.fsf@ender.lizardnet> References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> <87fucxlubt.fsf@ender.lizardnet> Message-ID: On Sat, Aug 12, 2017 at 6:31 AM, Alberto Berti wrote: > It's not really so confusing, most code I wrote with it it's perfectly > understandable Python code. For me, one thing is the language, one other > thing are the libraries or the builtin classes it's usually shipped > with. > > The tool reads valid Python and writes valid ES6 JavaScript. As the > documentation states, it allows you to retain most of Python language > semantics (like for example you can have a working > try...except...finally statement, instead of what vanilla JS gives you) > and some of the library semantics. nothing more, nothing less. Hold on. Make up your mind: > As of now, I do nothing. As I said, the goal of the tool is not to > shield you from JS, for this reason it's not meant for beginners (in > both JS or Python). You always manipulate JS objects, but allows you to > to be naive on all that plethora of JS idiosyncrasies (from a Python pow > at least) that you have to think about when you frequently switch from > python to js. Do you "retain most of Python language semantics", or do you "always manipulate JS objects"? As shown in a previous post, there are some subtle and very dangerous semantic differences between the languages. You can't have it both ways. ChrisA From carl.input at gmail.com Fri Aug 11 18:31:27 2017 From: carl.input at gmail.com (Carl Smith) Date: Fri, 11 Aug 2017 23:31:27 +0100 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> <87fucxlubt.fsf@ender.lizardnet> Message-ID: Python is not a good fit for the browser, in part, because of the syntax. JavaScript has issues, but its syntax is better suited to creating GUIs in the browser. For example, in browsers everything revolves around a single threaded event loop, so you have a lot of callbacks and event handlers, which makes function expressions really useful, but Python doesn't have expressions that contain blocks, because of significant indentation. As a result, ordinary JS, like this... $(function(){ $("spam").click(function(){ alert("spam clicked") }) }); ...ends up looking like this... def on_ready(): def click_handler(): alert("spam clicked") jQuery("spam").click(click_handler) jQuery(on_ready) JS semantics means JS libraries, which have APIs that assume JS syntax. Python library developers make heavy use of language specific features to define elegant, Pythonic APIs, which is a big part of what makes the language so nice to use. -- Carl Smith carl.input at gmail.com On 11 August 2017 at 21:58, Chris Angelico wrote: > On Sat, Aug 12, 2017 at 6:31 AM, Alberto Berti > wrote: > > It's not really so confusing, most code I wrote with it it's perfectly > > understandable Python code. For me, one thing is the language, one other > > thing are the libraries or the builtin classes it's usually shipped > > with. > > > > The tool reads valid Python and writes valid ES6 JavaScript. As the > > documentation states, it allows you to retain most of Python language > > semantics (like for example you can have a working > > try...except...finally statement, instead of what vanilla JS gives you) > > and some of the library semantics. nothing more, nothing less. > > Hold on. Make up your mind: > > > As of now, I do nothing. As I said, the goal of the tool is not to > > shield you from JS, for this reason it's not meant for beginners (in > > both JS or Python). You always manipulate JS objects, but allows you to > > to be naive on all that plethora of JS idiosyncrasies (from a Python pow > > at least) that you have to think about when you frequently switch from > > python to js. > > Do you "retain most of Python language semantics", or do you "always > manipulate JS objects"? As shown in a previous post, there are some > subtle and very dangerous semantic differences between the languages. > You can't have it both ways. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Aug 11 18:37:47 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 11 Aug 2017 18:37:47 -0400 Subject: [Python-ideas] New PEP 550: Execution Context Message-ID: Hi, This is a new PEP to implement Execution Contexts in Python. The PEP is in-flight to python.org, and in the meanwhile can be read on GitHub: https://github.com/python/peps/blob/master/pep-0550.rst (it contains a few diagrams and charts, so please read it there.) Thank you! Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017 Abstract ======== This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in. A few examples of where having a reliable state storage is required: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``; * Storing request-related data such as security tokens and request data in web applications; * Profiling, tracing, and logging in complex and large code bases. The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as ``threading.local()``. Unfortunately, TLS does not work for isolating state of generators or asynchronous code because such code shares a single thread. Rationale ========= Traditionally a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:: def calculate(precision, ...): with decimal.localcontext() as ctx: # Set the precision for decimal calculations # inside this block ctx.prec = precision yield calculate_something() yield calculate_something_else() Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. The above code will not work correctly, if a user iterates over the ``calculate()`` generator with different precisions in parallel:: g1 = calculate(100) g2 = calculate(50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision, # second value from g2 calculated with 50 precision. An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix. Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:: async def handle_http_request(request): context.current_http_request = request await ... # Invoke your framework code, render templates, # make DB queries, etc, and use the global # 'current_http_request' in that code. # This isn't currently possible to do reliably # in asyncio out of the box. These examples are just a few out of many, where a reliable way to store context data is absolutely needed. The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, limited to be supported only by code that was explicitly enabled to work with them. Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3]_ as an example issue.) Some languages that have coroutines or generators recommend to manually pass a ``context`` object to every function, see [1]_ describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like ``decimal`` or ``numpy``, which use operator overloading. .NET runtime, which has support for async/await, has a generic solution of this problem, called ``ExecutionContext`` (see [2]_). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code. Goals ===== The goal of this PEP is to provide a more reliable alternative to ``threading.local()``. It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines. An acceptable solution for Python should meet the following requirements: * Transparent support for code executing in threads, coroutines, and generators with an easy to use API. * Negligible impact on the performance of the existing code or the code that will be using the new mechanism. * Fast C API for packages like ``decimal`` and ``numpy``. Explicit is still better than implicit, hence the new APIs should only be used when there is no option to pass the state explicitly. With this PEP implemented, it should be possible to update a context manager like the below:: _local = threading.local() @contextmanager def context(x): old_x = getattr(_local, 'x', None) _local.x = x try: yield finally: _local.x = old_x to a more robust version that can be reliably used in generators and async/await code, with a simple transformation:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) Specification ============= This proposal introduces a new concept called Execution Context (EC), along with a set of Python APIs and C APIs to interact with it. EC is implemented using an immutable mapping. Every modification of the mapping produces a new copy of it. To illustrate what it means let's compare it to how we work with tuples in Python:: a0 = () a1 = a0 + (1,) a2 = a1 + (2,) # a0 is an empty tuple # a1 is (1,) # a2 is (1, 2) Manipulating an EC object would be similar:: a0 = EC() a1 = a0.set('foo', 'bar') a2 = a1.set('spam', 'ham') # a0 is an empty mapping # a1 is {'foo': 'bar'} # a2 is {'foo': 'bar', 'spam': 'ham'} In CPython, every thread that can execute Python code has a corresponding ``PyThreadState`` object. It encapsulates important runtime information like a pointer to the current frame, and is being used by the ceval loop extensively. We add a new field to ``PyThreadState``, called ``exec_context``, which points to the current EC object. We also introduce a set of APIs to work with Execution Context. In this section we will only cover two functions that are needed to explain how Execution Context works. See the full list of new APIs in the `New APIs`_ section. * ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` in the EC of the executing thread. If not found, return ``default``. * ``sys.set_execution_context_item(key, value)``: get the current EC of the executing thread. Add a ``key``/``value`` item to it, which will produce a new EC object. Set the new object as the current one for the executing thread. In pseudo-code:: tstate = PyThreadState_GET() ec = tstate.exec_context ec2 = ec.set(key, value) tstate.exec_context = ec2 Note, that some important implementation details and optimizations are omitted here, and will be covered in later sections of this PEP. Now let's see how Execution Contexts work with regular multi-threaded code, generators, and coroutines. Regular & Multithreaded Code ---------------------------- For regular Python code, EC behaves just like a thread-local. Any modification of the EC object produces a new one, which is immediately set as the current one for the thread state. .. figure:: pep-0550/functions.png :align: center :width: 90% Figure 1. Execution Context flow in a thread. As Figure 1 illustrates, if a function calls ``set_execution_context_item()``, the modification of the execution context will be visible to all subsequent calls and to the caller:: def set_foo(): set_execution_context_item('foo', 'spam') set_execution_context_item('foo', 'bar') print(get_execution_context_item('foo')) set_foo() print(get_execution_context_item('foo')) # will print: # bar # spam Coroutines ---------- Python :pep:`492` coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state. An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an ``await`` expression. In async/await world, awaiting a coroutine can be viewed as a different calling convention: Tasks are similar to threads, and awaiting on coroutines within a Task is similar to calling functions within a thread. By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same thread. To achieve this, a small set of modifications to the coroutine object is needed: * When a coroutine object is instantiated, it saves a reference to the current execution context object to its ``cr_execution_context`` attribute. * Coroutine's ``.send()`` and ``.throw()`` methods are modified as follows (in pseudo-C):: if coro->cr_isolated_execution_context: # Save a reference to the current execution context old_context = tstate->execution_context # Set our saved execution context as the current # for the current thread. tstate->execution_context = coro->cr_execution_context try: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) finally: # Save a reference to the updated execution_context. # We will need it later, when `.send()` or `.throw()` # are called again. coro->cr_execution_context = tstate->execution_context # Restore thread's execution context to what it was before # invoking this coroutine. tstate->execution_context = old_context else: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) * ``cr_isolated_execution_context`` is a new attribute on coroutine objects. Set to ``True`` by default, it makes any execution context modifications performed by coroutine to stay visible only to that coroutine. When Python interpreter sees an ``await`` instruction, it flips ``cr_isolated_execution_context`` to ``False`` for the coroutine that is about to be awaited. This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task. Because the top-level coroutine (Task) cannot be scheduled with ``await`` (in asyncio you need to call ``loop.create_task()`` or ``asyncio.ensure_future()`` to schedule a Task), all execution context modifications are guaranteed to stay within the Task. * We always work with ``tstate->exec_context``. We use ``coro->cr_execution_context`` only to store coroutine's execution context when it is not executing. Figure 2 below illustrates how execution context mutations work with coroutines. .. figure:: pep-0550/coroutines.png :align: center :width: 90% Figure 2. Execution Context flow in coroutines. In the above diagram: * When "coro1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When it awaits on "coro2", any subsequent changes it does to the execution context are visible to "coro1", but not outside of it. In code:: async def inner_foo(): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', 2) async def foo(): print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 1) await inner_foo() print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) asyncio.get_event_loop().run_until_complete(foo()) print('main:', get_execution_context_item('key')) which will output:: main: spam foo: spam inner_foo: 1 foo: 2 main: spam Generator-based coroutines (generators decorated with ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as native coroutines with regards to execution context management: their ``yield from`` expression is semantically equivalent to ``await``. Generators ---------- Generators in Python, while similar to Coroutines, are used in a fundamentally different way. They are producers of data, and they use ``yield`` expression to suspend/resume their execution. A crucial difference between ``await coro`` and ``yield value`` is that the former expression guarantees that the ``coro`` will be executed to the end, while the latter is producing ``value`` and suspending the generator until it gets iterated again. Generators share 99% of their implementation with coroutines, and thus have similar new attributes ``gi_execution_context`` and ``gi_isolated_execution_context``. Similar to coroutines, generators save a reference to the current execution context when they are instantiated. The have the same implementation of ``.send()`` and ``.throw()`` methods. The only difference is that ``gi_isolated_execution_context`` is always set to ``True``, and is never modified by the interpreter. ``yield from o`` expression in regular generators that are not decorated with ``types.coroutine``, is semantically equivalent to ``for v in o: yield v``. .. figure:: pep-0550/generators.png :align: center :width: 90% Figure 3. Execution Context flow in a generator. In the above diagram: * When "gen1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When "gen2" is created, it saves a reference to the current execution context for it -- "2.1". * Any subsequent execution context updated in "gen2" will only be visible to "gen2". * Likewise, any context changes that "gen1" will do after it created "gen2" will not be visible to "gen2". In code:: def inner_foo(): for i in range(3): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', i) yield i def foo(): set_execution_context_item('key', 'spam') print('foo:', get_execution_context_item('key')) inner = inner_foo() while True: val = next(inner, None) if val is None: break yield val print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) list(foo()) print('main:', get_execution_context_item('key')) which will output:: main: ham foo: spam inner_foo: spam foo: spam inner_foo: 0 foo: spam inner_foo: 1 foo: spam main: ham As we see, any modification of the execution context in a generator is visible only to the generator itself. There is one use-case where it is desired for generators to affect the surrounding execution context: ``contextlib.contextmanager`` decorator. To make the following work:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) we modified ``contextmanager`` to flip ``gi_isolated_execution_context`` flag to ``False`` on its generator. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some ``PyThreadState`` fields and pushes/pops the C-stack. Since Execution Context is implemented on top of ``PyThreadState``, it's easy to add transparent support of it to greenlet. New APIs ======== Even though this PEP adds a number of new APIs, please keep in mind, that most Python users will likely ever use only two of them: ``sys.get_execution_context_item()`` and ``sys.set_execution_context_item()``. Python ------ 1. ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` for the current Execution Context. If not found, return ``default``. 2. ``sys.set_execution_context_item(key, value)``: set ``key``/``value`` item for the current Execution Context. If ``value`` is ``None``, the item will be removed. 3. ``sys.get_execution_context()``: return the current Execution Context object: ``sys.ExecutionContext``. 4. ``sys.set_execution_context(ec)``: set the passed ``sys.ExecutionContext`` instance as a current one for the current thread. 5. ``sys.ExecutionContext`` object. Implementation detail: ``sys.ExecutionContext`` wraps a low-level ``PyExecContextData`` object. ``sys.ExecutionContext`` has a mutable mapping API, abstracting away the real immutable ``PyExecContextData``. * ``ExecutionContext()``: construct a new, empty, execution context. * ``ec.run(func, *args)`` method: run ``func(*args)`` in the ``ec`` execution context. * ``ec[key]``: lookup ``key`` in ``ec`` context. * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and ``ec.copy()`` are similar to that of ``dict`` object. C API ----- C API is different from the Python one because it operates directly on the low-level immutable ``PyExecContextData`` object. 1. New ``PyThreadState->exec_context`` field, pointing to a ``PyExecContextData`` object. 2. ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` similar to ``sys.set_execution_context_item()`` and ``sys.get_execution_context_item()``. 3. ``PyThreadState_GetExecContext``: similar to ``sys.get_execution_context()``. Always returns an ``PyExecContextData`` object. If ``PyThreadState->exec_context`` is ``NULL`` an new and empty one will be created and assigned to ``PyThreadState->exec_context``. 4. ``PyThreadState_SetExecContext``: similar to ``sys.set_execution_context()``. 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` object. 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. The exact layout ``PyExecContextData`` is private, which allows to switch it to a different implementation later. More on that in the `Implementation Details`_ section. Modifications in Standard Library ================================= * ``contextlib.contextmanager`` was updated to flip the new ``gi_isolated_execution_context`` attribute on the generator. * ``asyncio.events.Handle`` object now captures the current execution context when it is created, and uses the saved execution context to run the callback (with ``ExecutionContext.run()`` method.) This makes ``loop.call_soon()`` to run callbacks in the execution context they were scheduled. No modifications in ``asyncio.Task`` or ``asyncio.Future`` were necessary. Some standard library modules like ``warnings`` and ``decimal`` can be updated to use new execution contexts. This will be considered in separate issues if this PEP is accepted. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Performance =========== Implementation Details ---------------------- The new ``PyExecContextData`` object is wrapping a ``dict`` object. Any modification requires creating a shallow copy of the dict. While working on the reference implementation of this PEP, we were able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for details. .. figure:: pep-0550/dict_copy.png :align: center :width: 100% Figure 4. Figure 4 shows that the performance of immutable dict implemented with shallow copying is expectedly O(n) for the ``set()`` operation. However, this is tolerable until dict has more than 100 items (1 ``set()`` takes about a microsecond.) Judging by the number of modules that need EC in Standard Library it is likely that real world Python applications will use significantly less than 100 execution context variables. The important point is that the cost of accessing a key in Execution Context is always O(1). If the ``set()`` operation performance is a major concern, we discuss alternative approaches that have O(1) or close ``set()`` performance in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and `Copy-on-write Execution Context`_ sections. Generators and Coroutines ------------------------- Using a microbenchmark for generators and coroutines from :pep:`492` ([12]_), it was possible to observe 0.5 to 1% performance degradation. asyncio echoserver microbechmarks from the uvloop project [13]_ showed 1-1.5% performance degradation for asyncio code. asyncpg benchmarks [14]_, that execute more code and are closer to a real-world application did not exhibit any noticeable performance change. Overall Performance Impact -------------------------- The total number of changed lines in the ceval loop is 2 -- in the ``YIELD_FROM`` opcode implementation. Only performance of generators and coroutines can be affected by the proposal. This was confirmed by running Python Performance Benchmark Suite [15]_, which demonstrated that there is no difference between 3.7 master branch and this PEP reference implementation branch (full benchmark results can be found here [16]_.) Design Considerations ===================== Alternative Immutable Dict Implementation ----------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) performance for both ``set()`` and ``get()`` operations, which will be essentially O(1) for relatively small mappings in EC. To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. .. figure:: pep-0550/hamt_vs_dict.png :align: center :width: 100% Figure 5. Benchmark code can be found here: [9]_. Figure 5 shows that HAMT indeed displays O(1) performance for all benchmarked dictionary sizes. For dictionaries with less than 100 items, HAMT is a bit slower than Python dict/shallow copy. .. figure:: pep-0550/lookup_hamt.png :align: center :width: 100% Figure 6. Benchmark code can be found here: [10]_. Figure 6 below shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. The bottom line is that the current approach with implementing an immutable mapping with shallow-copying dict will likely perform adequately in real-life applications. The HAMT solution is more future proof, however. The proposed API is designed in such a way that the underlying implementation of the mapping can be changed completely without affecting the Execution Context `Specification`_, which allows us to switch to HAMT at some point if necessary. Copy-on-write Execution Context ------------------------------- The implementation of Execution Context in .NET is different from this PEP. .NET uses copy-on-write mechanism and a regular mutable mapping. One way to implement this in CPython would be to have two new fields in ``PyThreadState``: * ``exec_context`` pointing to the current Execution Context mapping; * ``exec_context_copy_on_write`` flag, set to ``0`` initially. The idea is that whenever we are modifying the EC, the copy-on-write flag is checked, and if it is set to ``1``, the EC is copied. Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` methods described in the `Coroutines`_ section will be almost the same, except that in addition to the ``gi_execution_context`` they will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine or a generator starts, the flag will be set to ``1``. This will ensure that any modification of the EC performed within a coroutine or a generator will be isolated. This approach has one advantage: * For Execution Context that contains a large number of items, copy-on-write is a more efficient solution than the shallow-copy dict approach. However, we believe that copy-on-write disadvantages are more important to consider: * Copy-on-write behaviour for generators and coroutines makes EC semantics less predictable. With immutable EC approach, generators and coroutines always execute in the EC that was current at the moment of their creation. Any modifications to the outer EC while a generator or a coroutine is executing are not visible to them:: def generator(): yield 1 print(get_execution_context_item('key')) yield 2 set_execution_context_item('key', 'spam') gen = iter(generator()) next(gen) set_execution_context_item('key', 'ham') next(gen) The above script will always print 'spam' with immutable EC. With a copy-on-write approach, the above script will print 'ham'. Now, consider that ``generator()`` was refactored to call some library function, that uses Execution Context:: def generator(): yield 1 some_function_that_uses_decimal_context() print(get_execution_context_item('key')) yield 2 Now, the script will print 'spam', because ``some_function_that_uses_decimal_context`` forced the EC to copy, and ``set_execution_context_item('key', 'ham')`` line did not affect the ``generator()`` code after all. * Similarly to the previous point, ``sys.ExecutionContext.run()`` method will also become less predictable, as ``sys.get_execution_context()`` would still return a reference to the current mutable EC. We can't modify ``sys.get_execution_context()`` to return a shallow copy of the current EC, because this would seriously harm performance of ``asyncio.call_soon()`` and similar places, where it is important to propagate the Execution Context. * Even though copy-on-write requires to shallow copy the execution context object less frequently, copying will still take place in coroutines and generators. In which case, HAMT approach will perform better for medium to large sized execution contexts. All in all, we believe that the copy-on-write approach introduces very subtle corner cases that could lead to bugs that are exceptionally hard to discover and fix. The immutable EC solution in comparison is always predictable and easy to reason about. Therefore we believe that any slight performance gain that the copy-on-write solution might offer is not worth it. Faster C API ------------ Packages like numpy and standard library modules like decimal need to frequently query the global state for some local context configuration. It is important that the APIs that they use is as fast as possible. The proposed ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` functions need to get the current thread state with ``PyThreadState_GET()`` (fast) and then perform a hash lookup (relatively slow). We can eliminate the hash lookup by adding three additional C API functions: * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` introduced :pep:`523`. The idea is to request a unique index that can later be used to lookup context items. The ``key_name`` can later be used by ``sys.ExecutionContext`` to introspect items added with this API. * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)`` and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` to request an item by its index, avoiding the cost of hash lookup. Why setting a key to None removes the item? ------------------------------------------- Consider a context manager:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) With ``set_execution_context_item(key, None)`` call removing the ``key``, the user doesn't need to write additional code to remove the ``key`` if it wasn't in the execution context already. An alternative design with ``del_execution_context_item()`` method would look like the following:: @contextmanager def context(x): not_there = object() old_x = get_execution_context_item('x', not_there) set_execution_context_item('x', x) try: yield finally: if old_x is not_there: del_execution_context_item('x') else: set_execution_context_item('x', old_x) Can we fix ``PyThreadState_GetDict()``? --------------------------------------- ``PyThreadState_GetDict`` is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: ``__suspend__`` and ``__resume__``. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with ``__asuspend__`` and ``__aresume__``. This allows to implement context managers like decimal context and ``numpy.errstate`` for generators and coroutines. The following code:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) would become this:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __suspend__(self): set_execution_context_item('x', self.old_x) def __resume__(self): set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation. It also does not solve the leaking state problem for greenlet/gevent. :pep:`521` also does not provide any mechanism to propagate state in a local context, like storing a request object in an HTTP request handler to have better logging. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines. Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. Reference Implementation ======================== The reference implementation can be found here: [11]_. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c Copyright ========= This document has been placed in the public domain. From antoine.rozo at gmail.com Fri Aug 11 19:46:29 2017 From: antoine.rozo at gmail.com (Antoine Rozo) Date: Sat, 12 Aug 2017 01:46:29 +0200 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Hi, Is a new EC type really needed? Cannot this be done with collections.ChainMap? 2017-08-12 0:37 GMT+02:00 Yury Selivanov : > Hi, > > This is a new PEP to implement Execution Contexts in Python. > > The PEP is in-flight to python.org, and in the meanwhile can > be read on GitHub: > > https://github.com/python/peps/blob/master/pep-0550.rst > > (it contains a few diagrams and charts, so please read it there.) > > Thank you! > Yury > > > PEP: 550 > Title: Execution Context > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 11-Aug-2017 > Python-Version: 3.7 > Post-History: 11-Aug-2017 > > > Abstract > ======== > > This PEP proposes a new mechanism to manage execution state--the > logical environment in which a function, a thread, a generator, > or a coroutine executes in. > > A few examples of where having a reliable state storage is required: > > * Context managers like decimal contexts, ``numpy.errstate``, > and ``warnings.catch_warnings``; > > * Storing request-related data such as security tokens and request > data in web applications; > > * Profiling, tracing, and logging in complex and large code bases. > > The usual solution for storing state is to use a Thread-local Storage > (TLS), implemented in the standard library as ``threading.local()``. > Unfortunately, TLS does not work for isolating state of generators or > asynchronous code because such code shares a single thread. > > > Rationale > ========= > > Traditionally a Thread-local Storage (TLS) is used for storing the > state. However, the major flaw of using the TLS is that it works only > for multi-threaded code. It is not possible to reliably contain the > state within a generator or a coroutine. For example, consider > the following generator:: > > def calculate(precision, ...): > with decimal.localcontext() as ctx: > # Set the precision for decimal calculations > # inside this block > ctx.prec = precision > > yield calculate_something() > yield calculate_something_else() > > Decimal context is using a TLS to store the state, and because TLS is > not aware of generators, the state can leak. The above code will > not work correctly, if a user iterates over the ``calculate()`` > generator with different precisions in parallel:: > > g1 = calculate(100) > g2 = calculate(50) > > items = list(zip(g1, g2)) > > # items[0] will be a tuple of: > # first value from g1 calculated with 100 precision, > # first value from g2 calculated with 50 precision. > # > # items[1] will be a tuple of: > # second value from g1 calculated with 50 precision, > # second value from g2 calculated with 50 precision. > > An even scarier example would be using decimals to represent money > in an async/await application: decimal calculations can suddenly > lose precision in the middle of processing a request. Currently, > bugs like this are extremely hard to find and fix. > > Another common need for web applications is to have access to the > current request object, or security context, or, simply, the request > URL for logging or submitting performance tracing data:: > > async def handle_http_request(request): > context.current_http_request = request > > await ... > # Invoke your framework code, render templates, > # make DB queries, etc, and use the global > # 'current_http_request' in that code. > > # This isn't currently possible to do reliably > # in asyncio out of the box. > > These examples are just a few out of many, where a reliable way to > store context data is absolutely needed. > > The inability to use TLS for asynchronous code has lead to > proliferation of ad-hoc solutions, limited to be supported only by > code that was explicitly enabled to work with them. > > Current status quo is that any library, including the standard > library, that uses a TLS, will likely not work as expected in > asynchronous code or with generators (see [3]_ as an example issue.) > > Some languages that have coroutines or generators recommend to > manually pass a ``context`` object to every function, see [1]_ > describing the pattern for Go. This approach, however, has limited > use for Python, where we have a huge ecosystem that was built to work > with a TLS-like context. Moreover, passing the context explicitly > does not work at all for libraries like ``decimal`` or ``numpy``, > which use operator overloading. > > .NET runtime, which has support for async/await, has a generic > solution of this problem, called ``ExecutionContext`` (see [2]_). > On the surface, working with it is very similar to working with a TLS, > but the former explicitly supports asynchronous code. > > > Goals > ===== > > The goal of this PEP is to provide a more reliable alternative to > ``threading.local()``. It should be explicitly designed to work with > Python execution model, equally supporting threads, generators, and > coroutines. > > An acceptable solution for Python should meet the following > requirements: > > * Transparent support for code executing in threads, coroutines, > and generators with an easy to use API. > > * Negligible impact on the performance of the existing code or the > code that will be using the new mechanism. > > * Fast C API for packages like ``decimal`` and ``numpy``. > > Explicit is still better than implicit, hence the new APIs should only > be used when there is no option to pass the state explicitly. > > With this PEP implemented, it should be possible to update a context > manager like the below:: > > _local = threading.local() > > @contextmanager > def context(x): > old_x = getattr(_local, 'x', None) > _local.x = x > try: > yield > finally: > _local.x = old_x > > to a more robust version that can be reliably used in generators > and async/await code, with a simple transformation:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > > Specification > ============= > > This proposal introduces a new concept called Execution Context (EC), > along with a set of Python APIs and C APIs to interact with it. > > EC is implemented using an immutable mapping. Every modification > of the mapping produces a new copy of it. To illustrate what it > means let's compare it to how we work with tuples in Python:: > > a0 = () > a1 = a0 + (1,) > a2 = a1 + (2,) > > # a0 is an empty tuple > # a1 is (1,) > # a2 is (1, 2) > > Manipulating an EC object would be similar:: > > a0 = EC() > a1 = a0.set('foo', 'bar') > a2 = a1.set('spam', 'ham') > > # a0 is an empty mapping > # a1 is {'foo': 'bar'} > # a2 is {'foo': 'bar', 'spam': 'ham'} > > In CPython, every thread that can execute Python code has a > corresponding ``PyThreadState`` object. It encapsulates important > runtime information like a pointer to the current frame, and is > being used by the ceval loop extensively. We add a new field to > ``PyThreadState``, called ``exec_context``, which points to the > current EC object. > > We also introduce a set of APIs to work with Execution Context. > In this section we will only cover two functions that are needed to > explain how Execution Context works. See the full list of new APIs > in the `New APIs`_ section. > > * ``sys.get_execution_context_item(key, default=None)``: lookup > ``key`` in the EC of the executing thread. If not found, > return ``default``. > > * ``sys.set_execution_context_item(key, value)``: get the > current EC of the executing thread. Add a ``key``/``value`` > item to it, which will produce a new EC object. Set the > new object as the current one for the executing thread. > In pseudo-code:: > > tstate = PyThreadState_GET() > ec = tstate.exec_context > ec2 = ec.set(key, value) > tstate.exec_context = ec2 > > Note, that some important implementation details and optimizations > are omitted here, and will be covered in later sections of this PEP. > > Now let's see how Execution Contexts work with regular multi-threaded > code, generators, and coroutines. > > > Regular & Multithreaded Code > ---------------------------- > > For regular Python code, EC behaves just like a thread-local. Any > modification of the EC object produces a new one, which is immediately > set as the current one for the thread state. > > .. figure:: pep-0550/functions.png > :align: center > :width: 90% > > Figure 1. Execution Context flow in a thread. > > As Figure 1 illustrates, if a function calls > ``set_execution_context_item()``, the modification of the execution > context will be visible to all subsequent calls and to the caller:: > > def set_foo(): > set_execution_context_item('foo', 'spam') > > set_execution_context_item('foo', 'bar') > print(get_execution_context_item('foo')) > > set_foo() > print(get_execution_context_item('foo')) > > # will print: > # bar > # spam > > > Coroutines > ---------- > > Python :pep:`492` coroutines are used to implement cooperative > multitasking. For a Python end-user they are similar to threads, > especially when it comes to sharing resources or modifying > the global state. > > An event loop is needed to schedule coroutines. Coroutines that > are explicitly scheduled by the user are usually called Tasks. > When a coroutine is scheduled, it can schedule other coroutines using > an ``await`` expression. In async/await world, awaiting a coroutine > can be viewed as a different calling convention: Tasks are similar to > threads, and awaiting on coroutines within a Task is similar to > calling functions within a thread. > > By drawing a parallel between regular multithreaded code and > async/await, it becomes apparent that any modification of the > execution context within one Task should be visible to all coroutines > scheduled within it. Any execution context modifications, however, > must not be visible to other Tasks executing within the same thread. > > To achieve this, a small set of modifications to the coroutine object > is needed: > > * When a coroutine object is instantiated, it saves a reference to > the current execution context object to its ``cr_execution_context`` > attribute. > > * Coroutine's ``.send()`` and ``.throw()`` methods are modified as > follows (in pseudo-C):: > > if coro->cr_isolated_execution_context: > # Save a reference to the current execution context > old_context = tstate->execution_context > > # Set our saved execution context as the current > # for the current thread. > tstate->execution_context = coro->cr_execution_context > > try: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro->send(...) > finally: > # Save a reference to the updated execution_context. > # We will need it later, when `.send()` or `.throw()` > # are called again. > coro->cr_execution_context = tstate->execution_context > > # Restore thread's execution context to what it was before > # invoking this coroutine. > tstate->execution_context = old_context > else: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro->send(...) > > * ``cr_isolated_execution_context`` is a new attribute on coroutine > objects. Set to ``True`` by default, it makes any execution context > modifications performed by coroutine to stay visible only to that > coroutine. > > When Python interpreter sees an ``await`` instruction, it flips > ``cr_isolated_execution_context`` to ``False`` for the coroutine > that is about to be awaited. This makes any changes to execution > context made by nested coroutine calls within a Task to be visible > throughout the Task. > > Because the top-level coroutine (Task) cannot be scheduled with > ``await`` (in asyncio you need to call ``loop.create_task()`` or > ``asyncio.ensure_future()`` to schedule a Task), all execution > context modifications are guaranteed to stay within the Task. > > * We always work with ``tstate->exec_context``. We use > ``coro->cr_execution_context`` only to store coroutine's execution > context when it is not executing. > > Figure 2 below illustrates how execution context mutations work with > coroutines. > > .. figure:: pep-0550/coroutines.png > :align: center > :width: 90% > > Figure 2. Execution Context flow in coroutines. > > In the above diagram: > > * When "coro1" is created, it saves a reference to the current > execution context "2". > > * If it makes any change to the context, it will have its own > execution context branch "2.1". > > * When it awaits on "coro2", any subsequent changes it does to > the execution context are visible to "coro1", but not outside > of it. > > In code:: > > async def inner_foo(): > print('inner_foo:', get_execution_context_item('key')) > set_execution_context_item('key', 2) > > async def foo(): > print('foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 1) > await inner_foo() > > print('foo:', get_execution_context_item('key')) > > > set_execution_context_item('key', 'spam') > print('main:', get_execution_context_item('key')) > > asyncio.get_event_loop().run_until_complete(foo()) > > print('main:', get_execution_context_item('key')) > > which will output:: > > main: spam > foo: spam > inner_foo: 1 > foo: 2 > main: spam > > Generator-based coroutines (generators decorated with > ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as > native coroutines with regards to execution context management: > their ``yield from`` expression is semantically equivalent to > ``await``. > > > Generators > ---------- > > Generators in Python, while similar to Coroutines, are used in a > fundamentally different way. They are producers of data, and > they use ``yield`` expression to suspend/resume their execution. > > A crucial difference between ``await coro`` and ``yield value`` is > that the former expression guarantees that the ``coro`` will be > executed to the end, while the latter is producing ``value`` and > suspending the generator until it gets iterated again. > > Generators share 99% of their implementation with coroutines, and > thus have similar new attributes ``gi_execution_context`` and > ``gi_isolated_execution_context``. Similar to coroutines, generators > save a reference to the current execution context when they are > instantiated. The have the same implementation of ``.send()`` and > ``.throw()`` methods. > > The only difference is that > ``gi_isolated_execution_context`` is always set to ``True``, and > is never modified by the interpreter. ``yield from o`` expression in > regular generators that are not decorated with ``types.coroutine``, > is semantically equivalent to ``for v in o: yield v``. > > .. figure:: pep-0550/generators.png > :align: center > :width: 90% > > Figure 3. Execution Context flow in a generator. > > In the above diagram: > > * When "gen1" is created, it saves a reference to the current > execution context "2". > > * If it makes any change to the context, it will have its own > execution context branch "2.1". > > * When "gen2" is created, it saves a reference to the current > execution context for it -- "2.1". > > * Any subsequent execution context updated in "gen2" will only > be visible to "gen2". > > * Likewise, any context changes that "gen1" will do after it > created "gen2" will not be visible to "gen2". > > In code:: > > def inner_foo(): > for i in range(3): > print('inner_foo:', get_execution_context_item('key')) > set_execution_context_item('key', i) > yield i > > > def foo(): > set_execution_context_item('key', 'spam') > print('foo:', get_execution_context_item('key')) > > inner = inner_foo() > > while True: > val = next(inner, None) > if val is None: > break > yield val > print('foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 'spam') > print('main:', get_execution_context_item('key')) > > list(foo()) > > print('main:', get_execution_context_item('key')) > > which will output:: > > main: ham > foo: spam > inner_foo: spam > foo: spam > inner_foo: 0 > foo: spam > inner_foo: 1 > foo: spam > main: ham > > As we see, any modification of the execution context in a generator > is visible only to the generator itself. > > There is one use-case where it is desired for generators to affect > the surrounding execution context: ``contextlib.contextmanager`` > decorator. To make the following work:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > we modified ``contextmanager`` to flip > ``gi_isolated_execution_context`` flag to ``False`` on its generator. > > > Greenlets > --------- > > Greenlet is an alternative implementation of cooperative > scheduling for Python. Although greenlet package is not part of > CPython, popular frameworks like gevent rely on it, and it is > important that greenlet can be modified to support execution > contexts. > > In a nutshell, greenlet design is very similar to design of > generators. The main difference is that for generators, the stack > is managed by the Python interpreter. Greenlet works outside of the > Python interpreter, and manually saves some ``PyThreadState`` > fields and pushes/pops the C-stack. Since Execution Context is > implemented on top of ``PyThreadState``, it's easy to add > transparent support of it to greenlet. > > > New APIs > ======== > > Even though this PEP adds a number of new APIs, please keep in mind, > that most Python users will likely ever use only two of them: > ``sys.get_execution_context_item()`` and > ``sys.set_execution_context_item()``. > > > Python > ------ > > 1. ``sys.get_execution_context_item(key, default=None)``: lookup > ``key`` for the current Execution Context. If not found, > return ``default``. > > 2. ``sys.set_execution_context_item(key, value)``: set > ``key``/``value`` item for the current Execution Context. > If ``value`` is ``None``, the item will be removed. > > 3. ``sys.get_execution_context()``: return the current Execution > Context object: ``sys.ExecutionContext``. > > 4. ``sys.set_execution_context(ec)``: set the passed > ``sys.ExecutionContext`` instance as a current one for the current > thread. > > 5. ``sys.ExecutionContext`` object. > > Implementation detail: ``sys.ExecutionContext`` wraps a low-level > ``PyExecContextData`` object. ``sys.ExecutionContext`` has a > mutable mapping API, abstracting away the real immutable > ``PyExecContextData``. > > * ``ExecutionContext()``: construct a new, empty, execution > context. > > * ``ec.run(func, *args)`` method: run ``func(*args)`` in the > ``ec`` execution context. > > * ``ec[key]``: lookup ``key`` in ``ec`` context. > > * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. > > * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and > ``ec.copy()`` are similar to that of ``dict`` object. > > > C API > ----- > > C API is different from the Python one because it operates directly > on the low-level immutable ``PyExecContextData`` object. > > 1. New ``PyThreadState->exec_context`` field, pointing to a > ``PyExecContextData`` object. > > 2. ``PyThreadState_SetExecContextItem`` and > ``PyThreadState_GetExecContextItem`` similar to > ``sys.set_execution_context_item()`` and > ``sys.get_execution_context_item()``. > > 3. ``PyThreadState_GetExecContext``: similar to > ``sys.get_execution_context()``. Always returns an > ``PyExecContextData`` object. If ``PyThreadState->exec_context`` > is ``NULL`` an new and empty one will be created and assigned > to ``PyThreadState->exec_context``. > > 4. ``PyThreadState_SetExecContext``: similar to > ``sys.set_execution_context()``. > > 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` > object. > > 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. > > The exact layout ``PyExecContextData`` is private, which allows > to switch it to a different implementation later. More on that > in the `Implementation Details`_ section. > > > Modifications in Standard Library > ================================= > > * ``contextlib.contextmanager`` was updated to flip the new > ``gi_isolated_execution_context`` attribute on the generator. > > * ``asyncio.events.Handle`` object now captures the current > execution context when it is created, and uses the saved > execution context to run the callback (with > ``ExecutionContext.run()`` method.) This makes > ``loop.call_soon()`` to run callbacks in the execution context > they were scheduled. > > No modifications in ``asyncio.Task`` or ``asyncio.Future`` were > necessary. > > Some standard library modules like ``warnings`` and ``decimal`` > can be updated to use new execution contexts. This will be considered > in separate issues if this PEP is accepted. > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > > Performance > =========== > > Implementation Details > ---------------------- > > The new ``PyExecContextData`` object is wrapping a ``dict`` object. > Any modification requires creating a shallow copy of the dict. > > While working on the reference implementation of this PEP, we were > able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for > details. > > .. figure:: pep-0550/dict_copy.png > :align: center > :width: 100% > > Figure 4. > > Figure 4 shows that the performance of immutable dict implemented > with shallow copying is expectedly O(n) for the ``set()`` operation. > However, this is tolerable until dict has more than 100 items > (1 ``set()`` takes about a microsecond.) > > Judging by the number of modules that need EC in Standard Library > it is likely that real world Python applications will use > significantly less than 100 execution context variables. > > The important point is that the cost of accessing a key in > Execution Context is always O(1). > > If the ``set()`` operation performance is a major concern, we discuss > alternative approaches that have O(1) or close ``set()`` performance > in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and > `Copy-on-write Execution Context`_ sections. > > > Generators and Coroutines > ------------------------- > > Using a microbenchmark for generators and coroutines from :pep:`492` > ([12]_), it was possible to observe 0.5 to 1% performance degradation. > > asyncio echoserver microbechmarks from the uvloop project [13]_ > showed 1-1.5% performance degradation for asyncio code. > > asyncpg benchmarks [14]_, that execute more code and are closer to a > real-world application did not exhibit any noticeable performance > change. > > > Overall Performance Impact > -------------------------- > > The total number of changed lines in the ceval loop is 2 -- in the > ``YIELD_FROM`` opcode implementation. Only performance of generators > and coroutines can be affected by the proposal. > > This was confirmed by running Python Performance Benchmark Suite > [15]_, which demonstrated that there is no difference between > 3.7 master branch and this PEP reference implementation branch > (full benchmark results can be found here [16]_.) > > > Design Considerations > ===================== > > Alternative Immutable Dict Implementation > ----------------------------------------- > > Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) > to implement high performance immutable collections [5]_, [6]_. > > Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) > performance for both ``set()`` and ``get()`` operations, which will > be essentially O(1) for relatively small mappings in EC. > > To assess if HAMT can be used for Execution Context, we implemented > it in CPython [7]_. > > .. figure:: pep-0550/hamt_vs_dict.png > :align: center > :width: 100% > > Figure 5. Benchmark code can be found here: [9]_. > > Figure 5 shows that HAMT indeed displays O(1) performance for all > benchmarked dictionary sizes. For dictionaries with less than 100 > items, HAMT is a bit slower than Python dict/shallow copy. > > .. figure:: pep-0550/lookup_hamt.png > :align: center > :width: 100% > > Figure 6. Benchmark code can be found here: [10]_. > > Figure 6 below shows comparison of lookup costs between Python dict > and an HAMT immutable mapping. HAMT lookup time is 30-40% worse > than Python dict lookups on average, which is a very good result, > considering how well Python dicts are optimized. > > Note, that according to [8]_, HAMT design can be further improved. > > The bottom line is that the current approach with implementing > an immutable mapping with shallow-copying dict will likely perform > adequately in real-life applications. The HAMT solution is more > future proof, however. > > The proposed API is designed in such a way that the underlying > implementation of the mapping can be changed completely without > affecting the Execution Context `Specification`_, which allows > us to switch to HAMT at some point if necessary. > > > Copy-on-write Execution Context > ------------------------------- > > The implementation of Execution Context in .NET is different from > this PEP. .NET uses copy-on-write mechanism and a regular mutable > mapping. > > One way to implement this in CPython would be to have two new > fields in ``PyThreadState``: > > * ``exec_context`` pointing to the current Execution Context mapping; > * ``exec_context_copy_on_write`` flag, set to ``0`` initially. > > The idea is that whenever we are modifying the EC, the copy-on-write > flag is checked, and if it is set to ``1``, the EC is copied. > > Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` > methods described in the `Coroutines`_ section will be almost the > same, except that in addition to the ``gi_execution_context`` they > will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine > or a generator starts, the flag will be set to ``1``. This will > ensure that any modification of the EC performed within a coroutine > or a generator will be isolated. > > This approach has one advantage: > > * For Execution Context that contains a large number of items, > copy-on-write is a more efficient solution than the shallow-copy > dict approach. > > However, we believe that copy-on-write disadvantages are more > important to consider: > > * Copy-on-write behaviour for generators and coroutines makes > EC semantics less predictable. > > With immutable EC approach, generators and coroutines always > execute in the EC that was current at the moment of their > creation. Any modifications to the outer EC while a generator > or a coroutine is executing are not visible to them:: > > def generator(): > yield 1 > print(get_execution_context_item('key')) > yield 2 > > set_execution_context_item('key', 'spam') > gen = iter(generator()) > next(gen) > set_execution_context_item('key', 'ham') > next(gen) > > The above script will always print 'spam' with immutable EC. > > With a copy-on-write approach, the above script will print 'ham'. > Now, consider that ``generator()`` was refactored to call some > library function, that uses Execution Context:: > > def generator(): > yield 1 > some_function_that_uses_decimal_context() > print(get_execution_context_item('key')) > yield 2 > > Now, the script will print 'spam', because > ``some_function_that_uses_decimal_context`` forced the EC to copy, > and ``set_execution_context_item('key', 'ham')`` line did not > affect the ``generator()`` code after all. > > * Similarly to the previous point, ``sys.ExecutionContext.run()`` > method will also become less predictable, as > ``sys.get_execution_context()`` would still return a reference to > the current mutable EC. > > We can't modify ``sys.get_execution_context()`` to return a shallow > copy of the current EC, because this would seriously harm > performance of ``asyncio.call_soon()`` and similar places, where > it is important to propagate the Execution Context. > > * Even though copy-on-write requires to shallow copy the execution > context object less frequently, copying will still take place > in coroutines and generators. In which case, HAMT approach will > perform better for medium to large sized execution contexts. > > All in all, we believe that the copy-on-write approach introduces > very subtle corner cases that could lead to bugs that are > exceptionally hard to discover and fix. > > The immutable EC solution in comparison is always predictable and > easy to reason about. Therefore we believe that any slight > performance gain that the copy-on-write solution might offer is not > worth it. > > > Faster C API > ------------ > > Packages like numpy and standard library modules like decimal need > to frequently query the global state for some local context > configuration. It is important that the APIs that they use is as > fast as possible. > > The proposed ``PyThreadState_SetExecContextItem`` and > ``PyThreadState_GetExecContextItem`` functions need to get the > current thread state with ``PyThreadState_GET()`` (fast) and then > perform a hash lookup (relatively slow). We can eliminate the hash > lookup by adding three additional C API functions: > > * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: > a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` > introduced :pep:`523`. The idea is to request a unique index > that can later be used to lookup context items. > > The ``key_name`` can later be used by ``sys.ExecutionContext`` to > introspect items added with this API. > > * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject > *val)`` > and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` > to request an item by its index, avoiding the cost of hash lookup. > > > Why setting a key to None removes the item? > ------------------------------------------- > > Consider a context manager:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > With ``set_execution_context_item(key, None)`` call removing the > ``key``, the user doesn't need to write additional code to remove > the ``key`` if it wasn't in the execution context already. > > An alternative design with ``del_execution_context_item()`` method > would look like the following:: > > @contextmanager > def context(x): > not_there = object() > old_x = get_execution_context_item('x', not_there) > set_execution_context_item('x', x) > try: > yield > finally: > if old_x is not_there: > del_execution_context_item('x') > else: > set_execution_context_item('x', old_x) > > > Can we fix ``PyThreadState_GetDict()``? > --------------------------------------- > > ``PyThreadState_GetDict`` is a TLS, and some of its existing users > might depend on it being just a TLS. Changing its behaviour to follow > the Execution Context semantics would break backwards compatibility. > > > PEP 521 > ------- > > :pep:`521` proposes an alternative solution to the problem: > enhance Context Manager Protocol with two new methods: ``__suspend__`` > and ``__resume__``. To make it compatible with async/await, > the Asynchronous Context Manager Protocol will also need to be > extended with ``__asuspend__`` and ``__aresume__``. > > This allows to implement context managers like decimal context and > ``numpy.errstate`` for generators and coroutines. > > The following code:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > would become this:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __suspend__(self): > set_execution_context_item('x', self.old_x) > > def __resume__(self): > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > Besides complicating the protocol, the implementation will likely > negatively impact performance of coroutines, generators, and any code > that uses context managers, and will notably complicate the > interpreter implementation. It also does not solve the leaking state > problem for greenlet/gevent. > > :pep:`521` also does not provide any mechanism to propagate state > in a local context, like storing a request object in an HTTP request > handler to have better logging. > > > Can Execution Context be implemented outside of CPython? > -------------------------------------------------------- > > Because async/await code needs an event loop to run it, an EC-like > solution can be implemented in a limited way for coroutines. > > Generators, on the other hand, do not have an event loop or > trampoline, making it impossible to intercept their ``yield`` points > outside of the Python interpreter. > > > Reference Implementation > ======================== > > The reference implementation can be found here: [11]_. > > > References > ========== > > .. [1] https://blog.golang.org/context > > .. [2] https://msdn.microsoft.com/en-us/library/system.threading. > executioncontext.aspx > > .. [3] https://github.com/numpy/numpy/issues/9444 > > .. [4] http://bugs.python.org/issue31179 > > .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie > > .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures- > persistenthashmap-part-ii.html > > .. [7] https://github.com/1st1/cpython/tree/hamt > > .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf > > .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > .. [11] https://github.com/1st1/cpython/tree/pep550 > > .. [12] https://www.python.org/dev/peps/pep-0492/#async-await > > .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/ > bench/echoserver.py > > .. [14] https://github.com/MagicStack/pgbench > > .. [15] https://github.com/python/performance > > .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Antoine Rozo -------------- next part -------------- An HTML attachment was scrubbed... URL: From alberto at metapensiero.it Fri Aug 11 19:46:50 2017 From: alberto at metapensiero.it (Alberto Berti) Date: Sat, 12 Aug 2017 01:46:50 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> <87fucxlubt.fsf@ender.lizardnet> Message-ID: <87bmnlllat.fsf@ender.lizardnet> >>>>> "Carl" == Carl Smith writes: Carl> Python is not a good fit for the browser, in part, because of the syntax. Carl> JavaScript has issues, but its syntax is better suited to creating GUIs in Carl> the browser. Just so? Carl> For example, in browsers everything revolves around a single Carl> threaded event loop, so you have a lot of callbacks and event Carl> handlers, You can write applications full of callbacks using libraries like Twisted or even asyncio and you can build entire applications involving ajax and such without callbacks as JS got async/await too in ES8 event handlers are written more or less the same in Pyhton or Javascript Carl> which makes function expressions really useful, but Python doesn't have Carl> expressions that contain blocks, because of significant Carl> indentation. yes, i agree that the difference between lambda an anonymous function is very significant on the way you may think to write your code. Carl> As a result, ordinary JS, like this... Carl> $(function(){ $("spam").click(function(){ alert("spam clicked") }) }); I don't think you mean this is real JS application code :-) Carl> ...ends up looking like this... Carl> def on_ready(): Carl> def click_handler(): alert("spam clicked") Carl> jQuery("spam").click(click_handler) Carl> jQuery(on_ready) or just jQuery(lambda: jQuery("spam").click(lambda: alert("spam clicked"))) Carl> JS semantics means JS libraries, which have APIs that assume JS syntax. Carl> Python library developers make heavy use of language specific features to Carl> define elegant, Pythonic APIs, which is a big part of what makes the Carl> language so nice to use. language specific features... like? From yselivanov.ml at gmail.com Fri Aug 11 20:12:34 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 11 Aug 2017 20:12:34 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: [duplicating my reply cc-ing python-ideas] > Is a new EC type really needed? Cannot this be done with collections.ChainMap? No, not really. ChainMap will have O(N) lookup performance where N is the number of contexts you have in the chain. This will degrade performance of lookups, which isn't acceptable for some potential EC users like decimal/numpy/etc. Inventing heuristics to manage the chain size is harder than making an immutable dict (which is easy to reason about.) Chaining contexts will also force then to reference each other, creating cycles that GC won't be able to break. Besides just performance considerations, with ChainMap design of contexts it's not possible to properly isolate state changes inside of generators or coroutines/tasks as it's done in the PEP. All in all, I don't think that chaining can solve the problem. It will likely lead to a more complicated solution in the end (this was my initial approach FWIW). Yury From carl.input at gmail.com Fri Aug 11 20:15:38 2017 From: carl.input at gmail.com (Carl Smith) Date: Sat, 12 Aug 2017 01:15:38 +0100 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: <87bmnlllat.fsf@ender.lizardnet> References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> <87fucxlubt.fsf@ender.lizardnet> <87bmnlllat.fsf@ender.lizardnet> Message-ID: Using lambdas doesn't solve the problem. I just kept the example short, but had I used more than one expression in each function, you'd be back to square one. You took advantage of the brevity of the example, but it's not realistic. There are lots of language specific features that library authors use, like operator overloading, ABCs etc... Python is a great language, and I always opt for it when it's an option, but I've used it to write front-end code, and it sucks. -- Carl Smith carl.input at gmail.com On 12 August 2017 at 00:46, Alberto Berti wrote: > >>>>> "Carl" == Carl Smith writes: > > Carl> Python is not a good fit for the browser, in part, because of > the syntax. > Carl> JavaScript has issues, but its syntax is better suited to > creating GUIs in > Carl> the browser. > > Just so? > > Carl> For example, in browsers everything revolves around a single > Carl> threaded event loop, so you have a lot of callbacks and event > Carl> handlers, > > You can write applications full of callbacks using libraries like > Twisted or even asyncio and you can build entire applications involving > ajax and such without callbacks as JS got async/await too in ES8 > > event handlers are written more or less the same in Pyhton or > Javascript > > Carl> which makes function expressions really useful, but Python > doesn't have > Carl> expressions that contain blocks, because of significant > Carl> indentation. > > yes, i agree that the difference between lambda an anonymous function is > very significant on the way you may think to write your code. > > Carl> As a result, ordinary JS, like this... > > Carl> $(function(){ $("spam").click(function(){ alert("spam > clicked") }) }); > > I don't think you mean this is real JS application code :-) > > Carl> ...ends up looking like this... > > Carl> def on_ready(): > Carl> def click_handler(): alert("spam clicked") > Carl> jQuery("spam").click(click_handler) > Carl> jQuery(on_ready) > > or just > > jQuery(lambda: jQuery("spam").click(lambda: alert("spam clicked"))) > > > Carl> JS semantics means JS libraries, which have APIs that assume JS > syntax. > Carl> Python library developers make heavy use of language specific > features to > Carl> define elegant, Pythonic APIs, which is a big part of what makes > the > Carl> language so nice to use. > > language specific features... like? > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Aug 11 20:18:51 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Aug 2017 10:18:51 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> <87fucxlubt.fsf@ender.lizardnet> <87bmnlllat.fsf@ender.lizardnet> Message-ID: On Sat, Aug 12, 2017 at 10:15 AM, Carl Smith wrote: > Using lambdas doesn't solve the problem. I just kept the example short, but > had I used more than one expression in each function, you'd be back to > square one. You took advantage of the brevity of the example, but it's not > realistic. > > There are lots of language specific features that library authors use, like > operator overloading, ABCs etc... > > Python is a great language, and I always opt for it when it's an option, but > I've used it to write front-end code, and it sucks. Code to its strengths. A few well-written function decorators will solve your problems elegantly. If you start with JS code and then try to port it, of course that won't look good - but if you start with idiomatic Python, it looks great. ChrisA From alberto at metapensiero.it Fri Aug 11 21:27:23 2017 From: alberto at metapensiero.it (Alberto Berti) Date: Sat, 12 Aug 2017 03:27:23 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> <87fucxlubt.fsf@ender.lizardnet> <87bmnlllat.fsf@ender.lizardnet> Message-ID: <877ey9lgn8.fsf@ender.lizardnet> >>>>> "Carl" == Carl Smith writes: Carl> Using lambdas doesn't solve the problem. I just kept the example short, but Carl> had I used more than one expression in each function, you'd be back to Carl> square one. You took advantage of the brevity of the example, but it's not Carl> realistic. I already told you that it wasn't real application code, it was your example by the way. Carl> There are lots of language specific features that library authors use, like Carl> operator overloading, ABCs etc... Those are features that I do not consider core Python and probably they have similar, already done implementations in some javascript libraries. I'm not stating that JS hasn't its limitations, we are all well aware of that. JavaScripthon just reduces the effort of recontextualize your mind when jumping between python and js code and in doing that it just solves some of the more evindent shortcomings of JS for you. But maybe it's just my impression, i've done it for me anyway ;-) . It produces so uncluttered JS that allows even to redistribute just the JS transpiled sources when necessary. As I said before, it's not a reimplementation of Python's standard library in JS, there are plenty of libraries in JS that cover the same areas of Python's standard library and more and more that deal with things related to manipulating dom and browsers. I do not intend to replace those even because sooner or later you will have to use them (I'm talking about libraries like react, angular and so on)... there's no point for me in trying to build your own "python in the browser" ecosystem. Carl> Python is a great language, and I always opt for it when it's an option, Carl> but I've used it to write front-end code, and it sucks. What do you have used? From steve at pearwood.info Fri Aug 11 22:03:43 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Aug 2017 12:03:43 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: Message-ID: <20170812020343.GM7395@ando.pearwood.info> Hi Jason, and welcome! On Fri, Aug 11, 2017 at 04:57:46PM +0200, Jason H wrote: > Before I done my firesuit, I'd like to say that I much prefer python > and I rail on JS whenever I can. However these days it is quite common > to be doing work in both Python and Javascript. Harmonizing the two > would help JS developers pick up the language as well as people like > me that are stuck working in JS as well. I must say you're a brave man for coming here with a plan which is going to be read as "Let's make Python worse!". Have you considered going to the Javascript forums and suggesting that they harmonise their language to be more like Python? After all: - Python came first (1991 versus 1995); - Python is already one of the stated influences on Javascript; - whereas most of Javascript's influence on Python (the language) can be summed up as "whew, we avoided making that silly mistake!" (the world of web frameworks may be more kind to JS); - according to the "popularity" links you give (and others!) Python is more popular than Javascript. > There are several annoyances that if removed, would go a long way. > 1. Object literals: JS: {a:1} vs Python: {'a':1} > Making my fingers dance on ' or " is not a good use of keystrokes, and it decreases readability. I disagree -- it *increases* readability, because I can always tell the difference between a literal string key and a name. I don't have to try to do a whole-program analysis of the entire application in my head to work out whether {a: 1} refers to a name or the literal string 'a'. {address: "123 Main Street"} *always* refers to the variable address, while: {"address": "123 Main Street"} *always* refers to the literal string "address". Or worse... if you're suggesting that Python should make the backwards-incompatible change that {address: ...} should always be the literal string key, thus breaking millions of working programs, just to save two keystrokes, sorry, that isn't going to happen. If you wish to avoid typing quotes, and your keys are valid identifiers, you can use the dict constructor with keyword arguments: dict(address="123 Main Street") > 2. Join: JS: [].join(s) vs Python: s.join([]) > I've read the justification for putting join on a string, and it > makes sense. But I think we should put it on the list too. And tuples, and dicts, and sets, and deques, and iterators, and generators, and strings, and bytes, and bytearrays, and arrays, and every other iterable type, including people's custom ones. No thank you. We don't need that enormous amount of code duplication and documentation bloat just for the sake of OOP syntactic purity. Or worse... we add it *just* to lists, not other iterables, and then we're confused why values.join(sep) sometimes works and sometimes fails. The obvious fix is clearly: if isinstance(values, list): values.join(sep) else: sep.join(values) but that has an unnecessary isinstance check and can be re-written as: sep.join(values) guaranteed to work for any well-behaved iterable, regardless of whether it is a list or not. > 3. Allow C/C++/JS style comments: JS:[ //, /* ] vs Python # > This one is pretty self-explanatory. What's not self-explanatory is why on earth you would want to type two characters // instead of one # ? Besides, // is already valid syntax in Python. Consider: result = x // 2 + y // -2 > but I don't know the repercussions of all of that. I think the above > could be implemented without breaking anything. It would break lots. -- Steve From jelle.zijlstra at gmail.com Fri Aug 11 23:46:12 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Fri, 11 Aug 2017 20:46:12 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: This is exciting and I'm happy that you're addressing this problem. We've solved a similar problem in our asynchronous programming framework, asynq. Our solution (implemented at https://github.com/quora/asynq/blob/master/asynq/contexts.py) is similar to that in PEP 521: we enhance the context manager protocol with pause/resume methods instead of using an enhanced form of thread-local state. Some of our use cases can't be implemented using this PEP; notably, we use a timing context that times how long an asynchronous function takes by repeatedly pausing and resuming the timer. However, this timing context adds significant overhead because we have to call the pause/resume methods so often. Overall, your approach is almost certainly more performant. 2017-08-11 15:37 GMT-07:00 Yury Selivanov : > Hi, > > This is a new PEP to implement Execution Contexts in Python. > > The PEP is in-flight to python.org, and in the meanwhile can > be read on GitHub: > > https://github.com/python/peps/blob/master/pep-0550.rst > > (it contains a few diagrams and charts, so please read it there.) > > Thank you! > Yury > > > PEP: 550 > Title: Execution Context > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 11-Aug-2017 > Python-Version: 3.7 > Post-History: 11-Aug-2017 > > > Abstract > ======== > > This PEP proposes a new mechanism to manage execution state--the > logical environment in which a function, a thread, a generator, > or a coroutine executes in. > > A few examples of where having a reliable state storage is required: > > * Context managers like decimal contexts, ``numpy.errstate``, > and ``warnings.catch_warnings``; > > * Storing request-related data such as security tokens and request > data in web applications; > > * Profiling, tracing, and logging in complex and large code bases. > > The usual solution for storing state is to use a Thread-local Storage > (TLS), implemented in the standard library as ``threading.local()``. > Unfortunately, TLS does not work for isolating state of generators or > asynchronous code because such code shares a single thread. > > > Rationale > ========= > > Traditionally a Thread-local Storage (TLS) is used for storing the > state. However, the major flaw of using the TLS is that it works only > for multi-threaded code. It is not possible to reliably contain the > state within a generator or a coroutine. For example, consider > the following generator:: > > def calculate(precision, ...): > with decimal.localcontext() as ctx: > # Set the precision for decimal calculations > # inside this block > ctx.prec = precision > > yield calculate_something() > yield calculate_something_else() > > Decimal context is using a TLS to store the state, and because TLS is > not aware of generators, the state can leak. The above code will > not work correctly, if a user iterates over the ``calculate()`` > generator with different precisions in parallel:: > > g1 = calculate(100) > g2 = calculate(50) > > items = list(zip(g1, g2)) > > # items[0] will be a tuple of: > # first value from g1 calculated with 100 precision, > # first value from g2 calculated with 50 precision. > # > # items[1] will be a tuple of: > # second value from g1 calculated with 50 precision, > # second value from g2 calculated with 50 precision. > > An even scarier example would be using decimals to represent money > in an async/await application: decimal calculations can suddenly > lose precision in the middle of processing a request. Currently, > bugs like this are extremely hard to find and fix. > > Another common need for web applications is to have access to the > current request object, or security context, or, simply, the request > URL for logging or submitting performance tracing data:: > > async def handle_http_request(request): > context.current_http_request = request > > await ... > # Invoke your framework code, render templates, > # make DB queries, etc, and use the global > # 'current_http_request' in that code. > > # This isn't currently possible to do reliably > # in asyncio out of the box. > > These examples are just a few out of many, where a reliable way to > store context data is absolutely needed. > > The inability to use TLS for asynchronous code has lead to > proliferation of ad-hoc solutions, limited to be supported only by > code that was explicitly enabled to work with them. > > Current status quo is that any library, including the standard > library, that uses a TLS, will likely not work as expected in > asynchronous code or with generators (see [3]_ as an example issue.) > > Some languages that have coroutines or generators recommend to > manually pass a ``context`` object to every function, see [1]_ > describing the pattern for Go. This approach, however, has limited > use for Python, where we have a huge ecosystem that was built to work > with a TLS-like context. Moreover, passing the context explicitly > does not work at all for libraries like ``decimal`` or ``numpy``, > which use operator overloading. > > .NET runtime, which has support for async/await, has a generic > solution of this problem, called ``ExecutionContext`` (see [2]_). > On the surface, working with it is very similar to working with a TLS, > but the former explicitly supports asynchronous code. > > > Goals > ===== > > The goal of this PEP is to provide a more reliable alternative to > ``threading.local()``. It should be explicitly designed to work with > Python execution model, equally supporting threads, generators, and > coroutines. > > An acceptable solution for Python should meet the following > requirements: > > * Transparent support for code executing in threads, coroutines, > and generators with an easy to use API. > > * Negligible impact on the performance of the existing code or the > code that will be using the new mechanism. > > * Fast C API for packages like ``decimal`` and ``numpy``. > > Explicit is still better than implicit, hence the new APIs should only > be used when there is no option to pass the state explicitly. > > With this PEP implemented, it should be possible to update a context > manager like the below:: > > _local = threading.local() > > @contextmanager > def context(x): > old_x = getattr(_local, 'x', None) > _local.x = x > try: > yield > finally: > _local.x = old_x > > to a more robust version that can be reliably used in generators > and async/await code, with a simple transformation:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > > Specification > ============= > > This proposal introduces a new concept called Execution Context (EC), > along with a set of Python APIs and C APIs to interact with it. > > EC is implemented using an immutable mapping. Every modification > of the mapping produces a new copy of it. To illustrate what it > means let's compare it to how we work with tuples in Python:: > > a0 = () > a1 = a0 + (1,) > a2 = a1 + (2,) > > # a0 is an empty tuple > # a1 is (1,) > # a2 is (1, 2) > > Manipulating an EC object would be similar:: > > a0 = EC() > a1 = a0.set('foo', 'bar') > a2 = a1.set('spam', 'ham') > > # a0 is an empty mapping > # a1 is {'foo': 'bar'} > # a2 is {'foo': 'bar', 'spam': 'ham'} > > In CPython, every thread that can execute Python code has a > corresponding ``PyThreadState`` object. It encapsulates important > runtime information like a pointer to the current frame, and is > being used by the ceval loop extensively. We add a new field to > ``PyThreadState``, called ``exec_context``, which points to the > current EC object. > > We also introduce a set of APIs to work with Execution Context. > In this section we will only cover two functions that are needed to > explain how Execution Context works. See the full list of new APIs > in the `New APIs`_ section. > > * ``sys.get_execution_context_item(key, default=None)``: lookup > ``key`` in the EC of the executing thread. If not found, > return ``default``. > > * ``sys.set_execution_context_item(key, value)``: get the > current EC of the executing thread. Add a ``key``/``value`` > item to it, which will produce a new EC object. Set the > new object as the current one for the executing thread. > In pseudo-code:: > > tstate = PyThreadState_GET() > ec = tstate.exec_context > ec2 = ec.set(key, value) > tstate.exec_context = ec2 > > Note, that some important implementation details and optimizations > are omitted here, and will be covered in later sections of this PEP. > > Now let's see how Execution Contexts work with regular multi-threaded > code, generators, and coroutines. > > > Regular & Multithreaded Code > ---------------------------- > > For regular Python code, EC behaves just like a thread-local. Any > modification of the EC object produces a new one, which is immediately > set as the current one for the thread state. > > .. figure:: pep-0550/functions.png > :align: center > :width: 90% > > Figure 1. Execution Context flow in a thread. > > As Figure 1 illustrates, if a function calls > ``set_execution_context_item()``, the modification of the execution > context will be visible to all subsequent calls and to the caller:: > > def set_foo(): > set_execution_context_item('foo', 'spam') > > set_execution_context_item('foo', 'bar') > print(get_execution_context_item('foo')) > > set_foo() > print(get_execution_context_item('foo')) > > # will print: > # bar > # spam > > > Coroutines > ---------- > > Python :pep:`492` coroutines are used to implement cooperative > multitasking. For a Python end-user they are similar to threads, > especially when it comes to sharing resources or modifying > the global state. > > An event loop is needed to schedule coroutines. Coroutines that > are explicitly scheduled by the user are usually called Tasks. > When a coroutine is scheduled, it can schedule other coroutines using > an ``await`` expression. In async/await world, awaiting a coroutine > can be viewed as a different calling convention: Tasks are similar to > threads, and awaiting on coroutines within a Task is similar to > calling functions within a thread. > > By drawing a parallel between regular multithreaded code and > async/await, it becomes apparent that any modification of the > execution context within one Task should be visible to all coroutines > scheduled within it. Any execution context modifications, however, > must not be visible to other Tasks executing within the same thread. > > To achieve this, a small set of modifications to the coroutine object > is needed: > > * When a coroutine object is instantiated, it saves a reference to > the current execution context object to its ``cr_execution_context`` > attribute. > > * Coroutine's ``.send()`` and ``.throw()`` methods are modified as > follows (in pseudo-C):: > > if coro->cr_isolated_execution_context: > # Save a reference to the current execution context > old_context = tstate->execution_context > > # Set our saved execution context as the current > # for the current thread. > tstate->execution_context = coro->cr_execution_context > > try: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro->send(...) > finally: > # Save a reference to the updated execution_context. > # We will need it later, when `.send()` or `.throw()` > # are called again. > coro->cr_execution_context = tstate->execution_context > > # Restore thread's execution context to what it was before > # invoking this coroutine. > tstate->execution_context = old_context > else: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro->send(...) > > * ``cr_isolated_execution_context`` is a new attribute on coroutine > objects. Set to ``True`` by default, it makes any execution context > modifications performed by coroutine to stay visible only to that > coroutine. > > When Python interpreter sees an ``await`` instruction, it flips > ``cr_isolated_execution_context`` to ``False`` for the coroutine > that is about to be awaited. This makes any changes to execution > context made by nested coroutine calls within a Task to be visible > throughout the Task. > > Because the top-level coroutine (Task) cannot be scheduled with > ``await`` (in asyncio you need to call ``loop.create_task()`` or > ``asyncio.ensure_future()`` to schedule a Task), all execution > context modifications are guaranteed to stay within the Task. > > * We always work with ``tstate->exec_context``. We use > ``coro->cr_execution_context`` only to store coroutine's execution > context when it is not executing. > > Figure 2 below illustrates how execution context mutations work with > coroutines. > > .. figure:: pep-0550/coroutines.png > :align: center > :width: 90% > > Figure 2. Execution Context flow in coroutines. > > In the above diagram: > > * When "coro1" is created, it saves a reference to the current > execution context "2". > > * If it makes any change to the context, it will have its own > execution context branch "2.1". > > * When it awaits on "coro2", any subsequent changes it does to > the execution context are visible to "coro1", but not outside > of it. > > In code:: > > async def inner_foo(): > print('inner_foo:', get_execution_context_item('key')) > set_execution_context_item('key', 2) > > async def foo(): > print('foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 1) > await inner_foo() > > print('foo:', get_execution_context_item('key')) > > > set_execution_context_item('key', 'spam') > print('main:', get_execution_context_item('key')) > > asyncio.get_event_loop().run_until_complete(foo()) > > print('main:', get_execution_context_item('key')) > > which will output:: > > main: spam > foo: spam > inner_foo: 1 > foo: 2 > main: spam > > Generator-based coroutines (generators decorated with > ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as > native coroutines with regards to execution context management: > their ``yield from`` expression is semantically equivalent to > ``await``. > > > Generators > ---------- > > Generators in Python, while similar to Coroutines, are used in a > fundamentally different way. They are producers of data, and > they use ``yield`` expression to suspend/resume their execution. > > A crucial difference between ``await coro`` and ``yield value`` is > that the former expression guarantees that the ``coro`` will be > executed to the end, while the latter is producing ``value`` and > suspending the generator until it gets iterated again. > > Generators share 99% of their implementation with coroutines, and > thus have similar new attributes ``gi_execution_context`` and > ``gi_isolated_execution_context``. Similar to coroutines, generators > save a reference to the current execution context when they are > instantiated. The have the same implementation of ``.send()`` and > ``.throw()`` methods. > > The only difference is that > ``gi_isolated_execution_context`` is always set to ``True``, and > is never modified by the interpreter. ``yield from o`` expression in > regular generators that are not decorated with ``types.coroutine``, > is semantically equivalent to ``for v in o: yield v``. > > .. figure:: pep-0550/generators.png > :align: center > :width: 90% > > Figure 3. Execution Context flow in a generator. > > In the above diagram: > > * When "gen1" is created, it saves a reference to the current > execution context "2". > > * If it makes any change to the context, it will have its own > execution context branch "2.1". > > * When "gen2" is created, it saves a reference to the current > execution context for it -- "2.1". > > * Any subsequent execution context updated in "gen2" will only > be visible to "gen2". > > * Likewise, any context changes that "gen1" will do after it > created "gen2" will not be visible to "gen2". > > In code:: > > def inner_foo(): > for i in range(3): > print('inner_foo:', get_execution_context_item('key')) > set_execution_context_item('key', i) > yield i > > > def foo(): > set_execution_context_item('key', 'spam') > print('foo:', get_execution_context_item('key')) > > inner = inner_foo() > > while True: > val = next(inner, None) > if val is None: > break > yield val > print('foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 'spam') > print('main:', get_execution_context_item('key')) > > list(foo()) > > print('main:', get_execution_context_item('key')) > > which will output:: > > main: ham > foo: spam > inner_foo: spam > foo: spam > inner_foo: 0 > foo: spam > inner_foo: 1 > foo: spam > main: ham > > As we see, any modification of the execution context in a generator > is visible only to the generator itself. > > There is one use-case where it is desired for generators to affect > the surrounding execution context: ``contextlib.contextmanager`` > decorator. To make the following work:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > we modified ``contextmanager`` to flip > ``gi_isolated_execution_context`` flag to ``False`` on its generator. > > > Greenlets > --------- > > Greenlet is an alternative implementation of cooperative > scheduling for Python. Although greenlet package is not part of > CPython, popular frameworks like gevent rely on it, and it is > important that greenlet can be modified to support execution > contexts. > > In a nutshell, greenlet design is very similar to design of > generators. The main difference is that for generators, the stack > is managed by the Python interpreter. Greenlet works outside of the > Python interpreter, and manually saves some ``PyThreadState`` > fields and pushes/pops the C-stack. Since Execution Context is > implemented on top of ``PyThreadState``, it's easy to add > transparent support of it to greenlet. > > > New APIs > ======== > > Even though this PEP adds a number of new APIs, please keep in mind, > that most Python users will likely ever use only two of them: > ``sys.get_execution_context_item()`` and > ``sys.set_execution_context_item()``. > > > Python > ------ > > 1. ``sys.get_execution_context_item(key, default=None)``: lookup > ``key`` for the current Execution Context. If not found, > return ``default``. > > 2. ``sys.set_execution_context_item(key, value)``: set > ``key``/``value`` item for the current Execution Context. > If ``value`` is ``None``, the item will be removed. > > 3. ``sys.get_execution_context()``: return the current Execution > Context object: ``sys.ExecutionContext``. > > 4. ``sys.set_execution_context(ec)``: set the passed > ``sys.ExecutionContext`` instance as a current one for the current > thread. > > 5. ``sys.ExecutionContext`` object. > > Implementation detail: ``sys.ExecutionContext`` wraps a low-level > ``PyExecContextData`` object. ``sys.ExecutionContext`` has a > mutable mapping API, abstracting away the real immutable > ``PyExecContextData``. > > * ``ExecutionContext()``: construct a new, empty, execution > context. > > * ``ec.run(func, *args)`` method: run ``func(*args)`` in the > ``ec`` execution context. > > * ``ec[key]``: lookup ``key`` in ``ec`` context. > > * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. > > * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and > ``ec.copy()`` are similar to that of ``dict`` object. > > > C API > ----- > > C API is different from the Python one because it operates directly > on the low-level immutable ``PyExecContextData`` object. > > 1. New ``PyThreadState->exec_context`` field, pointing to a > ``PyExecContextData`` object. > > 2. ``PyThreadState_SetExecContextItem`` and > ``PyThreadState_GetExecContextItem`` similar to > ``sys.set_execution_context_item()`` and > ``sys.get_execution_context_item()``. > > 3. ``PyThreadState_GetExecContext``: similar to > ``sys.get_execution_context()``. Always returns an > ``PyExecContextData`` object. If ``PyThreadState->exec_context`` > is ``NULL`` an new and empty one will be created and assigned > to ``PyThreadState->exec_context``. > > 4. ``PyThreadState_SetExecContext``: similar to > ``sys.set_execution_context()``. > > 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` > object. > > 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. > > The exact layout ``PyExecContextData`` is private, which allows > to switch it to a different implementation later. More on that > in the `Implementation Details`_ section. > > > Modifications in Standard Library > ================================= > > * ``contextlib.contextmanager`` was updated to flip the new > ``gi_isolated_execution_context`` attribute on the generator. > > * ``asyncio.events.Handle`` object now captures the current > execution context when it is created, and uses the saved > execution context to run the callback (with > ``ExecutionContext.run()`` method.) This makes > ``loop.call_soon()`` to run callbacks in the execution context > they were scheduled. > > No modifications in ``asyncio.Task`` or ``asyncio.Future`` were > necessary. > > Some standard library modules like ``warnings`` and ``decimal`` > can be updated to use new execution contexts. This will be considered > in separate issues if this PEP is accepted. > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > > Performance > =========== > > Implementation Details > ---------------------- > > The new ``PyExecContextData`` object is wrapping a ``dict`` object. > Any modification requires creating a shallow copy of the dict. > > While working on the reference implementation of this PEP, we were > able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for > details. > > .. figure:: pep-0550/dict_copy.png > :align: center > :width: 100% > > Figure 4. > > Figure 4 shows that the performance of immutable dict implemented > with shallow copying is expectedly O(n) for the ``set()`` operation. > However, this is tolerable until dict has more than 100 items > (1 ``set()`` takes about a microsecond.) > > Judging by the number of modules that need EC in Standard Library > it is likely that real world Python applications will use > significantly less than 100 execution context variables. > > The important point is that the cost of accessing a key in > Execution Context is always O(1). > > If the ``set()`` operation performance is a major concern, we discuss > alternative approaches that have O(1) or close ``set()`` performance > in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and > `Copy-on-write Execution Context`_ sections. > > > Generators and Coroutines > ------------------------- > > Using a microbenchmark for generators and coroutines from :pep:`492` > ([12]_), it was possible to observe 0.5 to 1% performance degradation. > > asyncio echoserver microbechmarks from the uvloop project [13]_ > showed 1-1.5% performance degradation for asyncio code. > > asyncpg benchmarks [14]_, that execute more code and are closer to a > real-world application did not exhibit any noticeable performance > change. > > > Overall Performance Impact > -------------------------- > > The total number of changed lines in the ceval loop is 2 -- in the > ``YIELD_FROM`` opcode implementation. Only performance of generators > and coroutines can be affected by the proposal. > > This was confirmed by running Python Performance Benchmark Suite > [15]_, which demonstrated that there is no difference between > 3.7 master branch and this PEP reference implementation branch > (full benchmark results can be found here [16]_.) > > > Design Considerations > ===================== > > Alternative Immutable Dict Implementation > ----------------------------------------- > > Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) > to implement high performance immutable collections [5]_, [6]_. > > Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) > performance for both ``set()`` and ``get()`` operations, which will > be essentially O(1) for relatively small mappings in EC. > > To assess if HAMT can be used for Execution Context, we implemented > it in CPython [7]_. > > .. figure:: pep-0550/hamt_vs_dict.png > :align: center > :width: 100% > > Figure 5. Benchmark code can be found here: [9]_. > > Figure 5 shows that HAMT indeed displays O(1) performance for all > benchmarked dictionary sizes. For dictionaries with less than 100 > items, HAMT is a bit slower than Python dict/shallow copy. > > .. figure:: pep-0550/lookup_hamt.png > :align: center > :width: 100% > > Figure 6. Benchmark code can be found here: [10]_. > > Figure 6 below shows comparison of lookup costs between Python dict > and an HAMT immutable mapping. HAMT lookup time is 30-40% worse > than Python dict lookups on average, which is a very good result, > considering how well Python dicts are optimized. > > Note, that according to [8]_, HAMT design can be further improved. > > The bottom line is that the current approach with implementing > an immutable mapping with shallow-copying dict will likely perform > adequately in real-life applications. The HAMT solution is more > future proof, however. > > The proposed API is designed in such a way that the underlying > implementation of the mapping can be changed completely without > affecting the Execution Context `Specification`_, which allows > us to switch to HAMT at some point if necessary. > > > Copy-on-write Execution Context > ------------------------------- > > The implementation of Execution Context in .NET is different from > this PEP. .NET uses copy-on-write mechanism and a regular mutable > mapping. > > One way to implement this in CPython would be to have two new > fields in ``PyThreadState``: > > * ``exec_context`` pointing to the current Execution Context mapping; > * ``exec_context_copy_on_write`` flag, set to ``0`` initially. > > The idea is that whenever we are modifying the EC, the copy-on-write > flag is checked, and if it is set to ``1``, the EC is copied. > > Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` > methods described in the `Coroutines`_ section will be almost the > same, except that in addition to the ``gi_execution_context`` they > will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine > or a generator starts, the flag will be set to ``1``. This will > ensure that any modification of the EC performed within a coroutine > or a generator will be isolated. > > This approach has one advantage: > > * For Execution Context that contains a large number of items, > copy-on-write is a more efficient solution than the shallow-copy > dict approach. > > However, we believe that copy-on-write disadvantages are more > important to consider: > > * Copy-on-write behaviour for generators and coroutines makes > EC semantics less predictable. > > With immutable EC approach, generators and coroutines always > execute in the EC that was current at the moment of their > creation. Any modifications to the outer EC while a generator > or a coroutine is executing are not visible to them:: > > def generator(): > yield 1 > print(get_execution_context_item('key')) > yield 2 > > set_execution_context_item('key', 'spam') > gen = iter(generator()) > next(gen) > set_execution_context_item('key', 'ham') > next(gen) > > The above script will always print 'spam' with immutable EC. > > With a copy-on-write approach, the above script will print 'ham'. > Now, consider that ``generator()`` was refactored to call some > library function, that uses Execution Context:: > > def generator(): > yield 1 > some_function_that_uses_decimal_context() > print(get_execution_context_item('key')) > yield 2 > > Now, the script will print 'spam', because > ``some_function_that_uses_decimal_context`` forced the EC to copy, > and ``set_execution_context_item('key', 'ham')`` line did not > affect the ``generator()`` code after all. > > * Similarly to the previous point, ``sys.ExecutionContext.run()`` > method will also become less predictable, as > ``sys.get_execution_context()`` would still return a reference to > the current mutable EC. > > We can't modify ``sys.get_execution_context()`` to return a shallow > copy of the current EC, because this would seriously harm > performance of ``asyncio.call_soon()`` and similar places, where > it is important to propagate the Execution Context. > > * Even though copy-on-write requires to shallow copy the execution > context object less frequently, copying will still take place > in coroutines and generators. In which case, HAMT approach will > perform better for medium to large sized execution contexts. > > All in all, we believe that the copy-on-write approach introduces > very subtle corner cases that could lead to bugs that are > exceptionally hard to discover and fix. > > The immutable EC solution in comparison is always predictable and > easy to reason about. Therefore we believe that any slight > performance gain that the copy-on-write solution might offer is not > worth it. > > > Faster C API > ------------ > > Packages like numpy and standard library modules like decimal need > to frequently query the global state for some local context > configuration. It is important that the APIs that they use is as > fast as possible. > > The proposed ``PyThreadState_SetExecContextItem`` and > ``PyThreadState_GetExecContextItem`` functions need to get the > current thread state with ``PyThreadState_GET()`` (fast) and then > perform a hash lookup (relatively slow). We can eliminate the hash > lookup by adding three additional C API functions: > > * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: > a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` > introduced :pep:`523`. The idea is to request a unique index > that can later be used to lookup context items. > > The ``key_name`` can later be used by ``sys.ExecutionContext`` to > introspect items added with this API. > > * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject > *val)`` > and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` > to request an item by its index, avoiding the cost of hash lookup. > > > Why setting a key to None removes the item? > ------------------------------------------- > > Consider a context manager:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > With ``set_execution_context_item(key, None)`` call removing the > ``key``, the user doesn't need to write additional code to remove > the ``key`` if it wasn't in the execution context already. > > An alternative design with ``del_execution_context_item()`` method > would look like the following:: > > @contextmanager > def context(x): > not_there = object() > old_x = get_execution_context_item('x', not_there) > set_execution_context_item('x', x) > try: > yield > finally: > if old_x is not_there: > del_execution_context_item('x') > else: > set_execution_context_item('x', old_x) > > > Can we fix ``PyThreadState_GetDict()``? > --------------------------------------- > > ``PyThreadState_GetDict`` is a TLS, and some of its existing users > might depend on it being just a TLS. Changing its behaviour to follow > the Execution Context semantics would break backwards compatibility. > > > PEP 521 > ------- > > :pep:`521` proposes an alternative solution to the problem: > enhance Context Manager Protocol with two new methods: ``__suspend__`` > and ``__resume__``. To make it compatible with async/await, > the Asynchronous Context Manager Protocol will also need to be > extended with ``__asuspend__`` and ``__aresume__``. > > This allows to implement context managers like decimal context and > ``numpy.errstate`` for generators and coroutines. > > The following code:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > would become this:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __suspend__(self): > set_execution_context_item('x', self.old_x) > > def __resume__(self): > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > Besides complicating the protocol, the implementation will likely > negatively impact performance of coroutines, generators, and any code > that uses context managers, and will notably complicate the > interpreter implementation. It also does not solve the leaking state > problem for greenlet/gevent. > > :pep:`521` also does not provide any mechanism to propagate state > in a local context, like storing a request object in an HTTP request > handler to have better logging. > > > Can Execution Context be implemented outside of CPython? > -------------------------------------------------------- > > Because async/await code needs an event loop to run it, an EC-like > solution can be implemented in a limited way for coroutines. > > Generators, on the other hand, do not have an event loop or > trampoline, making it impossible to intercept their ``yield`` points > outside of the Python interpreter. > > > Reference Implementation > ======================== > > The reference implementation can be found here: [11]_. > > > References > ========== > > .. [1] https://blog.golang.org/context > > .. [2] https://msdn.microsoft.com/en-us/library/system.threading. > executioncontext.aspx > > .. [3] https://github.com/numpy/numpy/issues/9444 > > .. [4] http://bugs.python.org/issue31179 > > .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie > > .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures- > persistenthashmap-part-ii.html > > .. [7] https://github.com/1st1/cpython/tree/hamt > > .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf > > .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > .. [11] https://github.com/1st1/cpython/tree/pep550 > > .. [12] https://www.python.org/dev/peps/pep-0492/#async-await > > .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/ > bench/echoserver.py > > .. [14] https://github.com/MagicStack/pgbench > > .. [15] https://github.com/python/performance > > .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sat Aug 12 00:16:45 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 00:16:45 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: > This is exciting and I'm happy that you're addressing this problem. Thank you! > Some of our use cases can't be implemented using this PEP; notably, we use a timing context that times how long an asynchronous function takes by repeatedly pausing and resuming the timer. Measuring performance of coroutines is a bit different kind of problem. With PEP 550 you will be able to decouple context management from collecting performance data. That would allow you to subclass asyncio.Task (let's call it InstrumentedTask) and implement all extra tracing functionality on it (by overriding its _send method for example). Then you could set a custom task factory that would use InstrumentedTask only for a fraction of requests. That would make it possible to collect performance metrics even in production (my 2c). Yury From ericsnowcurrently at gmail.com Sat Aug 12 01:02:18 2017 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 11 Aug 2017 23:02:18 -0600 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Aug 11, 2017 16:38, "Yury Selivanov" wrote: Hi, This is a new PEP to implement Execution Contexts in Python. Nice! I've had something like this on the back burner for a while as it helps solve some problems with encapsulating the import state (e.g. PEP 408). -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Aug 12 01:17:34 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 11 Aug 2017 22:17:34 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: I may have missed this (I've just skimmed the doc), but what's the rationale for making the EC an *immutable* mapping? It's impressive that you managed to create a faster immutable dict, but why does the use case need one? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Aug 12 01:33:14 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 11 Aug 2017 22:33:14 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Fri, Aug 11, 2017 at 10:17 PM, Guido van Rossum wrote: > I may have missed this (I've just skimmed the doc), but what's the rationale > for making the EC an *immutable* mapping? It's impressive that you managed > to create a faster immutable dict, but why does the use case need one? In this proposal, you have lots and lots of semantically distinct ECs. Potentially every stack frame has its own (at least in async code). So instead of copying the EC every time they create a new one, they want to copy it when it's written to. This is a win if writes are relatively rare compared to the creation of ECs. You could probably optimize it a bit more by checking the refcnt before writing, and skipping the copy if it's exactly 1. But even simpler is to just always copy and throw away the old version. -n -- Nathaniel J. Smith -- https://vorpus.org From yselivanov.ml at gmail.com Sat Aug 12 01:41:06 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 01:41:06 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: [replying to the list] > I may have missed this (I've just skimmed the doc), but what's the rationale for making the EC an *immutable* mapping? It's possible to implement Execution Context with a mutable mapping and copy-on-write (as it's done in .NET) This is one of the approaches that I tried and I discovered that it causes a bunch of subtle inconsistencies in contexts for generators and coroutines. I've tried to cover this here: https://www.python.org/dev/peps/pep-0550/#copy-on-write-execution-context All in all, I believe that the immutable mapping approach gives the most predictable and easy to reason about model. If its performance on large number of items in EC is a concern, I'll be happy to implement it using HAMT (also covered in the PEP). Yury From yselivanov.ml at gmail.com Sat Aug 12 01:43:33 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 01:43:33 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: > On Fri, Aug 11, 2017 at 10:17 PM, Guido van Rossum wrote: > > I may have missed this (I've just skimmed the doc), but what's the rationale > > for making the EC an *immutable* mapping? It's impressive that you managed > > to create a faster immutable dict, but why does the use case need one? > In this proposal, you have lots and lots of semantically distinct ECs. > Potentially every stack frame has its own (at least in async code). So > instead of copying the EC every time they create a new one, they want > to copy it when it's written to. This is a win if writes are > relatively rare compared to the creation of ECs. Correct. If we decide to use HAMT, the ratio of writes/reads becomes less important though. Yury From yselivanov.ml at gmail.com Sat Aug 12 01:45:10 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 01:45:10 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Thanks Eric! PEP 408 -- Standard library __preview__ package? Yury From njs at pobox.com Sat Aug 12 03:54:03 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 12 Aug 2017 00:54:03 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Hi Yury, This is really cool. Some notes on a first read: 1. Excellent work on optimizing dict, that seems valuable independent of the rest of the details here. 2. The text doesn't mention async generators at all. I assume they also have an agi_isolated_execution_context flag that can be set, to enable @asyncontextmanager? 2a. Speaking of which I wonder if it's possible for async_generator to emulate this flag... I don't know if this matters -- at this point the main reason to use async_generator is for code that wants to support PyPy. If PyPy gains native async generator support before CPython 3.7 comes out then async_generator may be entirely irrelevant before PEP 550 matters. But right now async_generator is still quite handy... 2b. BTW, the contextmanager trick is quite nice -- I actually noticed last week that PEP 521 had a problem here, but didn't think of a solution :-). 3. You're right that numpy is *very* performance sensitive about accessing the context -- the errstate object is needed extremely frequently, even on trivial operations like adding two scalars, so a dict lookup is very noticeable. (Imagine adding a dict lookup to float.__add__.) Right now, the errstate object get stored in the threadstate dict, and then there are some dubious-looking hacks involving a global (not thread-local) counter to let us skip the lookup entirely if we think that no errstate object has been set. Really what we ought to be doing (currently, in a non PEP 550 world) is storing the errstate in a __thread variable -- it'd certainly be worth it. Adopting PEP 550 would definitely be easier if we knew that it wasn't ruling out that level of optimization. 4. I'm worried that all of your examples use string keys. One of the great things about threading.local objects is that each one is a new namespace, which is a honking great idea -- here it prevents accidental collisions between unrelated libraries. And while it's possible to implement threading.local in terms of the threadstate dict (that's how they work now!), it requires some extremely finicky code to get the memory management right: https://github.com/python/cpython/blob/dadca480c5b7c5cf425d423316cd695bc5db3023/Modules/_threadmodule.c#L558-L595 It seems like you're imagining that this API will be used directly by user code? Is that true? ...Are you sure that's a good idea? Are we just assuming that not many keys will be used and the keys will generally be immortal anyway, so leaking entries is OK? Maybe this is nit-picking, but this is hooking into the language semantics in such a deep way that I sorta feel like it would be bad to end up with something where we can never get garbage collection right. The suggested index-based API for super fast C lookup also has this problem, but that would be such a low-level API -- and not part of the language definition -- that the right answer is probably just to document that there's no way to unallocate indices so any given C library should only allocate, like... 1 of them. Maybe provide an explicit API to release an index, if we really want to get fancy. 5. Is there some performance-related reason that the API for getting/setting isn't just sys.get_execution_context()[...] = ...? Or even sys.execution_context[...]? 5a. Speaking of which I'm not a big fan of the None-means-delete behavior. Not only does Python have a nice standard way to describe all the mapping operations without such hacks, but you're actually implementing that whole interface anyway. Why not use it? 6. Should Thread.start inherit the execution context from the spawning thread? 7. Compatibility: it does sort of break 3rd party contextmanager implementations (contextlib2, asyncio_extras's acontextmanager, trio's internal acontextmanager, ...). This is extremely minor though. 8. You discuss how this works for asyncio and gevent. Have you looked at how it will interact with tornado's context handling system? Can they use this? It's the most important extant context implementation I can think of (aside from thread local storage itself). 9. OK, my big question, about semantics. The PEP's design is based on the assumption that all context-local state is scalar-like, and contexts split but never join. But there are some cases where this isn't true, in particular for values that have "stack-like" semantics. These are terms I just made up, but let me give some examples. Python's sys.exc_info is one. Another I ran into recently is for trio's cancel scopes. So basically the background is, in trio you can wrap a context manager around any arbitrary chunk of code and then set a timeout or explicitly cancel that code. It's called a "cancel scope". These are fully nestable. Full details here: https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-timeouts Currently, the implementation involves keeping a stack of cancel scopes in Task-local storage. This works fine for regular async code because when we switch Tasks, we also switch the cancel scope stack. But of course it falls apart for generators/async generators: async def agen(): with fail_after(10): # 10 second timeout for finishing this block await some_blocking_operation() yield await another_blocking_operation() async def caller(): with fail_after(20): ag = agen() await ag.__anext__() # now that cancel scope is on the stack, even though we're not # inside the context manager! this will not end well. await some_blocking_operation() # this might get cancelled when it shouldn't # even if it doesn't, we'll crash here when exiting the context manager # because we try to pop a cancel scope that isn't at the top of the stack So I was thinking about whether I could implement this using PEP 550. It requires some cleverness, but I could switch to representing the stack as a singly-linked list, and then snapshot it and pass it back to the coroutine runner every time I yield. That would fix the case above. But, I think there's another case that's kind of a showstopper. async def agen(): await some_blocking_operation() yield async def caller(): ag = agen() # context is captured here with fail_after(10): await ag.__anext__() Currently this case works correctly: the timeout is applied to the __anext__ call, as you'd expect. But with PEP 550, it wouldn't work: the generator's timeouts would all be fixed when it was instantiated, and we wouldn't be able to detect that the second call has a timeout imposed on it. So that's a pretty nasty footgun. Any time you have code that's supposed to have a timeout applied, but in fact has no timeout applied, then that's a really serious bug -- it can lead to hangs, trivial DoS, pagers going off, etc. Another problem is code like: async def caller(): with fail_after(10): ag = agen() # then exit the scope Can we clean up the cancel scope? (e.g., remove it from the global priority queue that tracks timeouts?) Normally yes, that's what __exit__ blocks are for, letting you know deterministically that an object can be cleaned up. But here it got captured by the async generator. I really don't want to have to rely on the GC, because on PyPy it means that we could leak an unbounded number of cancel scopes for a finite but unbounded number of time, and all those extra entries in the global timeout priority queue aren't free. (And sys.exc_info has had buggy behavior in analogous situations.) So, I'm wondering if you (or anyone) have any ideas how to fix this :-). Technically, PEP 521 is powerful enough to do it, but in practice the performance would be catastrophically bad. It's one thing to have some extra cost to yielding out of an np.errstate block, those are rare and yielding out of them is rare. But cancel scopes are different: essentially all code in trio runs inside one or more of them, so every coroutine suspend/resume would have to call all those suspend/resume hooks up and down the stack. OTOH PEP 550 is fast, but AFAICT its semantics are wrong for this use case. The basic invariant I want is: if at any given moment you stop and take a backtrace, and then look at the syntactic surroundings of each line in the backtrace and write down a list of all the 'with' blocks that the code *looks* like it's inside, then context lookups should give the same result as they would if you simply entered all of those with blocks in order. Generators make it tricky to maintain this invariant, because a generator frame's backtrace changes every time you call next(). But those are the semantics that make the most sense to me, and seem least surprising in practice. These are also IIUC the semantics that exc_info is supposed to follow (though historically the interaction of exc_info and generators has had lots of bugs, not sure if that's been fixed or not). ...and now that I've written that down, I sort of feel like that might be what you want for all the other sorts of context object too? Like, here's a convoluted example: def gen(): a = decimal.Decimal("1.111") b = decimal.Decimal("2.222") print(a + b) yield print(a + b) def caller(): # let's pretend this context manager exists, the actual API is more complicated with decimal_context_precision(3): g = gen() with decimal_context_precision(2): next(g) with decimal_context_precision(1): next(g) Currently, this will print "3.3 3", because when the generator is resumed it inherits the context of the resuming site. With PEP 550, it would print "3.33 3.33" (or maybe "3.3 3.3"? it's not totally clear from the text), because it inherits the context when the generator is created and then ignores the calling context. It's hard to get strong intuitions, but I feel like the current behavior is actually more sensible -- each time the generator gets resumed, the next bit of code runs in the context of whoever called next(), and the generator is just passively inheriting context, so ... that makes sense. OTOH of course if you change the generator code to: def gen(): a = decimal.Decimal("1.111") b = decimal.Decimal("2.222") with decimal_context_precision(4): print(a + b) yield print(a + b) then it should print "3.333 3.333", because the generator is overriding the caller -- now when we resume the frame we're re-entering the decimal_context_precision(4) block, so it should take priority. So ... maybe all context variables are "stack-like"? -n -- Nathaniel J. Smith -- https://vorpus.org From stefan at bytereef.org Sat Aug 12 06:33:39 2017 From: stefan at bytereef.org (Stefan Krah) Date: Sat, 12 Aug 2017 12:33:39 +0200 Subject: [Python-ideas] New PEP 550: Execution Context Message-ID: <20170812103339.GA2735@bytereef.org> Yury Selivanov wrote: > This is a new PEP to implement Execution Contexts in Python. The idea is of course great! A couple of issues for decimal: > Moreover, passing the context explicitly does not work at all for > libraries like ``decimal`` or ``numpy``, which use operator overloading. Instead of "with localcontext() ...", each coroutine can create a new Context() and use its methods, without any loss of functionality. All one loses is the inline operator syntax sugar. I'm aware you know all this, but the entire decimal paragraph sounds a bit as if this option did not exist. > Fast C API for packages like ``decimal`` and ``numpy``. _decimal relies on caching the most recently used thread-local context, which gives a speedup of about 25% for inline operators: https://github.com/python/cpython/blob/master/Modules/_decimal/_decimal.c#L1639 Can this speed be achieved with the execution contexts? IOW, can the lookup of an excecution context be as fast as PyThreadState_GET()? Stefan Krah From alberto at metapensiero.it Sat Aug 12 07:04:45 2017 From: alberto at metapensiero.it (Alberto Berti) Date: Sat, 12 Aug 2017 13:04:45 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> <87fucxlubt.fsf@ender.lizardnet> Message-ID: <87tw1djbci.fsf@ender.lizardnet> >>>>> "Chris" == Chris Angelico writes: Chris> On Sat, Aug 12, 2017 at 6:31 AM, Alberto Berti wrote: >> As of now, I do nothing. As I said, the goal of the tool is not to >> shield you from JS, for this reason it's not meant for beginners (in >> both JS or Python). You always manipulate JS objects, but allows you to >> to be naive on all that plethora of JS idiosyncrasies (from a Python pow >> at least) that you have to think about when you frequently switch from >> python to js. Chris> Do you "retain most of Python language semantics", or do you "always Chris> manipulate JS objects"? As shown in a previous post, there are some Chris> subtle and very dangerous semantic differences between the languages. Chris> You can't have it both ways. that's right you can't have both ways. That's the difficult decision to make, because as you add more and more Python APIs to those supported, probably you'll end up creating your "Python island in JS" where you need to transform the objects you manipulate from/to JS on the functions that are called by external JS code (either manually or automatically). And on the other end, if don't add any Pythonic API you will end up with ugly Python code that yes, is valid Python code, but that's nothing you would like to see. JavaScripthon was and is an experiment to see how much of the "Pythonic way of expressing alogrithms" can be retained adding as less "runtime" as possible. That's the reason why it targets ES6+ JavaScript, because the "point of contacts" between the two languages are much higher in number. As an example let's take the following simple code: def test(): a = 'foo' d = {a: 'bar'} return d[a] one can naively translate it to: function test() { var a, d; a = 'foo'; d = {a: 'bar'}; return d[a]; } but it returs 'bar' in Python and undefined in JS. But even if it's just a simple case expressed in a four lines function, it one of those things that can slip through when coding in both languages at the same time (at least for me). So I asked myself if it was worthwhile to have a tool that: * allows me to use Python syntax to write some amount of JS code. I'm more accustomed to Python syntax and I like it more. It's generally more terse and has less distractions (like variable declarations and line terminations); * fixes as many as possible of this things automatically, without having to precisely remember that this is a "corner case" in JS and that must be handled with care (so that reduces the "context-switching" effort); * produces a good looking JS code that's still possible to read and follow without much trouble. How many "corner cases" like this there are in JS? In my coding experience, "thanks" to the fact that JS is much less "harmonious" than Python ( my opinion ) I've encountered many of those, and also there are many simple Python coding habits that are translatable in a simple way. So what the tools does in this case? $ pj -s - def test(): a = 'foo' d = {a: 'bar'} return d[a] function test() { var a, d; a = "foo"; d = {[a]: "bar"}; return d[a]; } It turns out that ES6 has a special notation for wath JS calls "computed property names", keys in object literals that aren't strings does it evaluates the way a Python developer expects when run? let's see $ pj -s - -e def test(): a = 'foo' d = {a: 'bar'} return d[a] test() bar From alberto at metapensiero.it Sat Aug 12 07:41:46 2017 From: alberto at metapensiero.it (Alberto Berti) Date: Sat, 12 Aug 2017 13:41:46 +0200 Subject: [Python-ideas] Towards harmony with JavaScript? References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> <87fucxlubt.fsf@ender.lizardnet> <87bmnlllat.fsf@ender.lizardnet> Message-ID: <87poc1j9mt.fsf@ender.lizardnet> >>>>> "Carl" == Carl Smith writes: Carl> Using lambdas doesn't solve the problem. I just kept the example short, but Carl> had I used more than one expression in each function, you'd be back to Carl> square one. You took advantage of the brevity of the example, but it's not Carl> realistic. Let me elaborate more on this... yes, i took "advantage" of the brevity of your example, but there's a another side of it. In my JS coding I usually avoid non trivial anonymous functions in real applications. The reason is that if an error happens inside an anonymous function and maybe it was the last one in a series of anonymous functions the stack trace of that error will end up with references like "in anonymous function at line xy of 'foo.js'" and that doesn't allows me get a first idea of what the code was doing when the error was thrown. That's why I don't like them and why I don't have a great opinion of large codebases making extensive usage of them. It also appears to me that the trend in some (relevant) part of the JS community if to refrain from use them when possible towards a more structured approach to coding that resembles more of a class based componentization, like in react. cheers, Alberto From ncoghlan at gmail.com Sat Aug 12 10:12:04 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Aug 2017 00:12:04 +1000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On 12 August 2017 at 08:37, Yury Selivanov wrote: > Hi, > > This is a new PEP to implement Execution Contexts in Python. > > The PEP is in-flight to python.org, and in the meanwhile can > be read on GitHub: > > https://github.com/python/peps/blob/master/pep-0550.rst > > (it contains a few diagrams and charts, so please read it there.) The fully rendered version is also up now: https://www.python.org/dev/peps/pep-0550/ Thanks for this! The general approach looks good to me, so I just have some questions about specifics of the API: 1. Are you sure you want to expose the CoW type to pure Python code? The draft API looks fairly error prone to me, as I'm not sure of the intended differences in behaviour between the following: @contextmanager def context(x): old_x = sys.get_execution_context_item('x') sys.set_execution_context_item('x', x) try: yield finally: sys.set_execution_context_item('x', old_x) @contextmanager def context(x): old_x = sys.get_execution_context().get('x') sys.get_execution_context()['x'] = x try: yield finally: sys.get_execution_context()['x'] = old_x @contextmanager def context(x): ec = sys.get_execution_context() old_x = ec.get('x') ec['x'] = x try: yield finally: ec['x'] = old_x It seems to me that everything would be a lot safer if the *only* Python level API was a live dynamic view that completely hid the copy-on-write behaviour behind an "ExecutionContextProxy" type, such that the last two examples were functionally equivalent to each other and to the current PEP's get/set functions (rendering the latter redundant, and allowing it to be dropped from the PEP). If Python code wanted a snapshot of the current state, it would need to call sys.get_execution_context().copy(), which would give it a plain dictionary containing a shallow copy of the execution context at that particular point in time. If there's a genuine need to expose the raw copy-on-write machinery to Python level code (e.g. for asyncio's benefit), then that could be more clearly marked as "here be dragons" territory that most folks aren't going to want to touch (e.g. "sys.get_raw_execution_context()") 2. Do we need an ag_isolated_execution_context for asynchronous generators? (Modify this question as needed for the answer to the next question) 3. It bothers me that *_execution_context points to an actual execution context, while *_isolated_execution_context is a boolean. With names that similar I'd expect them to point to the same kind of object. Would it work to adjust that setting to say that rather than being an "isolated/not isolated" boolean, we instead made it a cr_back reverse pointer to the awaiting coroutine (akin to f_back in the frame stack), such that we had a doubly-linked list that defined the coroutine call stacks via their cr_await and cr_back attributes? If we did that, we'd have: Top-level Task: cr_back -> NULL (C) or None (Python) Awaited coroutine: cr_back -> coroutine that awaited this one (which would in turn have a cr_await reference back to here) coroutine.send()/throw() would then save and restore the execution context around the call if cr_back was NULL/None (equivalent to isolated==True in the current PEP), and leave it alone otherwise (equivalent to isolated==False). For generators, gi_back would normally be NULL/None (since we don't typically couple regular generators to a single managing object), but could be set appropriately by types.coroutine when the generator-based coroutine is awaited, and by contextlib.contextmanager before starting the underlying generator. (It may even make sense to break the naming symmetry for that attribute, and call it something like "gi_owner", since generators don't form a clean await-based logical call chain the way native coroutines do). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Aug 12 10:20:28 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Aug 2017 00:20:28 +1000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On 12 August 2017 at 15:45, Yury Selivanov wrote: > Thanks Eric! > > PEP 408 -- Standard library __preview__ package? Typo in the PEP number: PEP 406, which was an ultimately failed attempt to get away from the reliance on process globals to manage the import system by encapsulating the top level state as an "Import Engine": https://www.python.org/dev/peps/pep-0406/ We still like the idea in principle (hence the Withdrawn status rather then being Rejected), but someone needs to find time to take a run at designing a new version of it atop the cleaner PEP 451 import plugin API (hence why the *specific* proposal in PEP 406 has been withdrawn). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Aug 12 12:22:31 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Aug 2017 02:22:31 +1000 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> Message-ID: On 12 August 2017 at 06:10, Chris Barker wrote: > >> > Taking this off the list as it's no longer on topic. > > > not totally -- I'm going to add my thoughts: > > 1) If you want a smoother transition between server-side Python and > in-browser code, maybe you're better off using one of the "python in the > browser" solutions -- there are at least a few viable ones. More experimentally, there's also toga's "web" backend (which allows you to take an application you developed with the primary intention of running it as a rich client application on mobile or desktop devices, and instead publishing it as a Django web application with a JavaScript frontend). Essentially, the relationship we see between Python and JavaScript is similar to the one that exists between Python and C/C++/Rust/Go/etc, just on the side that sits between the Python code and the GUI, rather than between the Python code and the compute & storage systems. As such, there are various libraries and transpilers that are designed to handle writing the JavaScript *for* you (bokeh, toga, JavaScripthon, etc), and the emergence of WASM as a frontend equivalent to machine code on the backend is only going to make the similarities in those dynamics more pronounced. In that vein, it's highly *un*likely we'd add any redundant constructs to Python purely to make it easier for JS developers to use JS idioms in Python instead of Pythonic ones, but JavaScript *is* one of the languages we look at for syntactic consistency when considering potential new additions to Python. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Sat Aug 12 12:22:53 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Aug 2017 09:22:53 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Thanks for the explanation. Can you make sure this is explained in the PEP? On Aug 11, 2017 10:43 PM, "Yury Selivanov" wrote: > > On Fri, Aug 11, 2017 at 10:17 PM, Guido van Rossum > wrote: > > > I may have missed this (I've just skimmed the doc), but what's the > rationale > > > for making the EC an *immutable* mapping? It's impressive that you > managed > > > to create a faster immutable dict, but why does the use case need one? > > > In this proposal, you have lots and lots of semantically distinct ECs. > > Potentially every stack frame has its own (at least in async code). So > > instead of copying the EC every time they create a new one, they want > > to copy it when it's written to. This is a win if writes are > > relatively rare compared to the creation of ECs. > > Correct. If we decide to use HAMT, the ratio of writes/reads becomes > less important though. > > Yury > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Aug 12 13:09:54 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Aug 2017 03:09:54 +1000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On 12 August 2017 at 17:54, Nathaniel Smith wrote: > ...and now that I've written that down, I sort of feel like that might > be what you want for all the other sorts of context object too? Like, > here's a convoluted example: > > def gen(): > a = decimal.Decimal("1.111") > b = decimal.Decimal("2.222") > print(a + b) > yield > print(a + b) > > def caller(): > # let's pretend this context manager exists, the actual API is > more complicated > with decimal_context_precision(3): > g = gen() > with decimal_context_precision(2): > next(g) > with decimal_context_precision(1): > next(g) > > Currently, this will print "3.3 3", because when the generator is > resumed it inherits the context of the resuming site. With PEP 550, it > would print "3.33 3.33" (or maybe "3.3 3.3"? it's not totally clear > from the text), because it inherits the context when the generator is > created and then ignores the calling context. It's hard to get strong > intuitions, but I feel like the current behavior is actually more > sensible -- each time the generator gets resumed, the next bit of code > runs in the context of whoever called next(), and the generator is > just passively inheriting context, so ... that makes sense. Now that you raise this point, I think it means that generators need to retain their current context inheritance behaviour, simply for backwards compatibility purposes. This means that the case we need to enable is the one where the generator *doesn't* dynamically adjust its execution context to match that of the calling function. One way that could work (using the cr_back/gi_back convention I suggested): - generators start with gi_back not set - if gi_back is NULL/None, gi.send() and gi.throw() set it to the calling frame for the duration of the synchronous call and *don't* adjust the execution context (i.e. the inverse of coroutine behaviour) - if gi_back is already set, then gi.send() and gi.throw() *do* save and restore the execution context around synchronous calls in to the generator frame To create an autonomous generator (i.e. one that didn't dynamically update its execution context), you'd use a decorator like: def autonomous_generator(gf): @functools.wraps(gf) def wrapper(*args, **kwds): gi = genfunc(*args, **kwds) gi.gi_back = gi.gi_frame return gi return wrapper Asynchronous generators would then work like synchronous generators: ag_back would be NULL/None by default, and dynamically set for the duration of each __anext__ call. If you wanted to create an autonomous one, you'd make it's back reference a circular reference to itself to disable the implicit dynamic updates. When I put it in those terms though, I think the cr_back/gi_back/ag_back idea should actually be orthogonal to the "revert_context" flag (so you can record the link back to the caller even when maintaining an autonomous context). Given that, you'd have the following initial states for "revert context" (currently called "isolated context" in the PEP): * unawaited coroutines: true (same as PEP) * awaited coroutines: false (same as PEP) * generators (both sync & async): false (opposite of current PEP) * autonomous generators: true (set "gi_revert_context" or "ag_revert_context" explicitly) Open question: whether having "yield" inside a with statement implies the creation of an autonomous generator (synchronous or otherwise), or whether you'd need a decorator to get your context management right in such cases. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From carl.input at gmail.com Sat Aug 12 13:44:41 2017 From: carl.input at gmail.com (Carl Smith) Date: Sat, 12 Aug 2017 18:44:41 +0100 Subject: [Python-ideas] Towards harmony with JavaScript? In-Reply-To: References: <87shgykpcl.fsf@ender.lizardnet> <87k229lxya.fsf@ender.lizardnet> Message-ID: Alberto, CoffeeScript is a popular language that is widely considered to represent JavaScript's best bits, and it only has anonymous functions, so there's a large part of the JS community that disagrees with you there. Browsers actually do identify anonymous functions, based on the variable/property names that reference them, so the following function would be identified as `square` in tracebacks: let square = function(x) { return x * x }; In any case, passing anonymous functions to higher order functions is commonplace in real-world JS. Chris may be right about using decorators as a Pythonic alternative [I haven't really considered that properly to be honest], but you can't just tell people not to do something that they see as elegant and idiomatic. Best -- Carl Smith -- Carl Smith carl.input at gmail.com On 12 August 2017 at 17:22, Nick Coghlan wrote: > On 12 August 2017 at 06:10, Chris Barker wrote: > > > >> > Taking this off the list as it's no longer on topic. > > > > > > not totally -- I'm going to add my thoughts: > > > > 1) If you want a smoother transition between server-side Python and > > in-browser code, maybe you're better off using one of the "python in the > > browser" solutions -- there are at least a few viable ones. > > More experimentally, there's also toga's "web" backend (which allows > you to take an application you developed with the primary intention of > running it as a rich client application on mobile or desktop devices, > and instead publishing it as a Django web application with a > JavaScript frontend). > > Essentially, the relationship we see between Python and JavaScript is > similar to the one that exists between Python and C/C++/Rust/Go/etc, > just on the side that sits between the Python code and the GUI, rather > than between the Python code and the compute & storage systems. > > As such, there are various libraries and transpilers that are designed > to handle writing the JavaScript *for* you (bokeh, toga, > JavaScripthon, etc), and the emergence of WASM as a frontend > equivalent to machine code on the backend is only going to make the > similarities in those dynamics more pronounced. > > In that vein, it's highly *un*likely we'd add any redundant constructs > to Python purely to make it easier for JS developers to use JS idioms > in Python instead of Pythonic ones, but JavaScript *is* one of the > languages we look at for syntactic consistency when considering > potential new additions to Python. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sat Aug 12 13:53:30 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 13:53:30 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Nick, Nathaniel, I'll be replying in full to your emails when I have time to do some experiments. Now I just want to address one point that I think is important: On Sat, Aug 12, 2017 at 1:09 PM, Nick Coghlan wrote: > On 12 August 2017 at 17:54, Nathaniel Smith wrote: >> ...and now that I've written that down, I sort of feel like that might >> be what you want for all the other sorts of context object too? Like, >> here's a convoluted example: >> >> def gen(): >> a = decimal.Decimal("1.111") >> b = decimal.Decimal("2.222") >> print(a + b) >> yield >> print(a + b) >> >> def caller(): >> # let's pretend this context manager exists, the actual API is >> more complicated >> with decimal_context_precision(3): >> g = gen() >> with decimal_context_precision(2): >> next(g) >> with decimal_context_precision(1): >> next(g) >> >> Currently, this will print "3.3 3", because when the generator is >> resumed it inherits the context of the resuming site. With PEP 550, it >> would print "3.33 3.33" (or maybe "3.3 3.3"? it's not totally clear >> from the text), because it inherits the context when the generator is >> created and then ignores the calling context. It's hard to get strong >> intuitions, but I feel like the current behavior is actually more >> sensible -- each time the generator gets resumed, the next bit of code >> runs in the context of whoever called next(), and the generator is >> just passively inheriting context, so ... that makes sense. > > Now that you raise this point, I think it means that generators need > to retain their current context inheritance behaviour, simply for > backwards compatibility purposes. This means that the case we need to > enable is the one where the generator *doesn't* dynamically adjust its > execution context to match that of the calling function. Nobody *intentionally* iterates a generator manually in different decimal contexts (or any other contexts). This is an extremely error prone thing to do, because one refactoring of generator -- rearranging yields -- would wreck your custom iteration/context logic. I don't think that any real code relies on this, and I don't think that we are breaking backwards compatibility here in any way. How many users need about this? If someone does need this, it's possible to flip `gi_isolated_execution_context` to `False` (as contextmanager does now) and get this behaviour. This might be needed for frameworks like Tornado which support coroutines via generators without 'yield from', but I'll have to verify this. What I'm saying here, is that any sort of context leaking *into* or *out of* generator *while* it is iterating will likely cause only bugs or undefined behaviour. Take a look at the precision example in the Rationale section of the PEP. Most of the time generators are created and are iterated in the same spot, you rarely create generator closures. One way the behaviour could be changed, however, is to capture the execution context when it's first iterated (as opposed to when it's instantiated), but I don't think it makes any real difference. Another idea: in one of my initial PEP implementations, I exposed gen.gi_execution_context (same for coroutines) to python as read/write attribute. That allowed to (a) get the execution context out of generator (for introspection or other purposes); (b) inject execution context for event loops; for instance asyncio.Task could do that for some purpose. Maybe this would be useful for someone who wants to mess with generators and contexts. [..] > > def autonomous_generator(gf): > @functools.wraps(gf) > def wrapper(*args, **kwds): > gi = genfunc(*args, **kwds) > gi.gi_back = gi.gi_frame > return gi > return wrapper Nick, I still have to fully grasp the idea of `gi_back`, but one quick thing: I specifically designed the PEP to avoid touching frames. The current design only needs TLS and a little help from the interpreter/core objects adjusting that TLS. It should be very straightforward to implement the PEP in any interpreter (with JIT or without) or compilers like Cython. [..] > Given that, you'd have the following initial states for "revert > context" (currently called "isolated context" in the PEP): > > * unawaited coroutines: true (same as PEP) > * awaited coroutines: false (same as PEP) > * generators (both sync & async): false (opposite of current PEP) > * autonomous generators: true (set "gi_revert_context" or > "ag_revert_context" explicitly) If generators do not isolate their context, then the example in the Rationale section will not work as expected (or am I missing something?). Fixing generators state leak was one of the main goals of the PEP. Yury From rymg19 at gmail.com Sat Aug 12 14:28:14 2017 From: rymg19 at gmail.com (rymg19 at gmail.com) Date: Sat, 12 Aug 2017 11:28:14 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: <<>> Message-ID: So, I'm hardly an expert when it comes to things like this, but there are two things about this that don't seem right to me. (Also, I'd love to respond inline, but that's kind of difficult from a mobile phone.) The first is how set/get_execution_context_item take strings. Inevitably, people are going to do things like: CONTEXT_ITEM_NAME = 'foo-bar' ... sys.set_execution_context_item(CONTEXT_ITEM_NAME, 'stuff') IMO it would be nicer if there could be a key object used instead, e.g. my_key = sys.execution_context_key('name-here-for-debugging-purposes') sys.set_execution_context_item(my_key, 'stuff') The advantage here would be no need for string constants and no potential naming conflicts (the string passed to the key creator would be used just for debugging, kind of like Thread names). Second thing is this: def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) If this would be done frequently, a context manager would be a *lot* more Pythonic, e.g.: with sys.temp_change_execution_context('x', new_x): # ... -- Ryan (????) Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone elsehttp://refi64.com On Aug 11, 2017 at 5:38 PM, > wrote: Hi, This is a new PEP to implement Execution Contexts in Python. The PEP is in-flight to python.org, and in the meanwhile can be read on GitHub: https://github.com/python/peps/blob/master/pep-0550.rst (it contains a few diagrams and charts, so please read it there.) Thank you! Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017 Abstract ======== This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in. A few examples of where having a reliable state storage is required: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``; * Storing request-related data such as security tokens and request data in web applications; * Profiling, tracing, and logging in complex and large code bases. The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as ``threading.local()``. Unfortunately, TLS does not work for isolating state of generators or asynchronous code because such code shares a single thread. Rationale ========= Traditionally a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:: def calculate(precision, ...): with decimal.localcontext() as ctx: # Set the precision for decimal calculations # inside this block ctx.prec = precision yield calculate_something() yield calculate_something_else() Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. The above code will not work correctly, if a user iterates over the ``calculate()`` generator with different precisions in parallel:: g1 = calculate(100) g2 = calculate(50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision, # second value from g2 calculated with 50 precision. An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix. Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:: async def handle_http_request(request): context.current_http_request = request await ... # Invoke your framework code, render templates, # make DB queries, etc, and use the global # 'current_http_request' in that code. # This isn't currently possible to do reliably # in asyncio out of the box. These examples are just a few out of many, where a reliable way to store context data is absolutely needed. The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, limited to be supported only by code that was explicitly enabled to work with them. Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3]_ as an example issue.) Some languages that have coroutines or generators recommend to manually pass a ``context`` object to every function, see [1]_ describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like ``decimal`` or ``numpy``, which use operator overloading. .NET runtime, which has support for async/await, has a generic solution of this problem, called ``ExecutionContext`` (see [2]_). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code. Goals ===== The goal of this PEP is to provide a more reliable alternative to ``threading.local()``. It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines. An acceptable solution for Python should meet the following requirements: * Transparent support for code executing in threads, coroutines, and generators with an easy to use API. * Negligible impact on the performance of the existing code or the code that will be using the new mechanism. * Fast C API for packages like ``decimal`` and ``numpy``. Explicit is still better than implicit, hence the new APIs should only be used when there is no option to pass the state explicitly. With this PEP implemented, it should be possible to update a context manager like the below:: _local = threading.local() @contextmanager def context(x): old_x = getattr(_local, 'x', None) _local.x = x try: yield finally: _local.x = old_x to a more robust version that can be reliably used in generators and async/await code, with a simple transformation:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) Specification ============= This proposal introduces a new concept called Execution Context (EC), along with a set of Python APIs and C APIs to interact with it. EC is implemented using an immutable mapping. Every modification of the mapping produces a new copy of it. To illustrate what it means let's compare it to how we work with tuples in Python:: a0 = () a1 = a0 + (1,) a2 = a1 + (2,) # a0 is an empty tuple # a1 is (1,) # a2 is (1, 2) Manipulating an EC object would be similar:: a0 = EC() a1 = a0.set('foo', 'bar') a2 = a1.set('spam', 'ham') # a0 is an empty mapping # a1 is {'foo': 'bar'} # a2 is {'foo': 'bar', 'spam': 'ham'} In CPython, every thread that can execute Python code has a corresponding ``PyThreadState`` object. It encapsulates important runtime information like a pointer to the current frame, and is being used by the ceval loop extensively. We add a new field to ``PyThreadState``, called ``exec_context``, which points to the current EC object. We also introduce a set of APIs to work with Execution Context. In this section we will only cover two functions that are needed to explain how Execution Context works. See the full list of new APIs in the `New APIs`_ section. * ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` in the EC of the executing thread. If not found, return ``default``. * ``sys.set_execution_context_item(key, value)``: get the current EC of the executing thread. Add a ``key``/``value`` item to it, which will produce a new EC object. Set the new object as the current one for the executing thread. In pseudo-code:: tstate = PyThreadState_GET() ec = tstate.exec_context ec2 = ec.set(key, value) tstate.exec_context = ec2 Note, that some important implementation details and optimizations are omitted here, and will be covered in later sections of this PEP. Now let's see how Execution Contexts work with regular multi-threaded code, generators, and coroutines. Regular & Multithreaded Code ---------------------------- For regular Python code, EC behaves just like a thread-local. Any modification of the EC object produces a new one, which is immediately set as the current one for the thread state. .. figure:: pep-0550/functions.png :align: center :width: 90% Figure 1. Execution Context flow in a thread. As Figure 1 illustrates, if a function calls ``set_execution_context_item()``, the modification of the execution context will be visible to all subsequent calls and to the caller:: def set_foo(): set_execution_context_item('foo', 'spam') set_execution_context_item('foo', 'bar') print(get_execution_context_item('foo')) set_foo() print(get_execution_context_item('foo')) # will print: # bar # spam Coroutines ---------- Python :pep:`492` coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state. An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an ``await`` expression. In async/await world, awaiting a coroutine can be viewed as a different calling convention: Tasks are similar to threads, and awaiting on coroutines within a Task is similar to calling functions within a thread. By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same thread. To achieve this, a small set of modifications to the coroutine object is needed: * When a coroutine object is instantiated, it saves a reference to the current execution context object to its ``cr_execution_context`` attribute. * Coroutine's ``.send()`` and ``.throw()`` methods are modified as follows (in pseudo-C):: if coro->cr_isolated_execution_context: # Save a reference to the current execution context old_context = tstate->execution_context # Set our saved execution context as the current # for the current thread. tstate->execution_context = coro->cr_execution_context try: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) finally: # Save a reference to the updated execution_context. # We will need it later, when `.send()` or `.throw()` # are called again. coro->cr_execution_context = tstate->execution_context # Restore thread's execution context to what it was before # invoking this coroutine. tstate->execution_context = old_context else: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) * ``cr_isolated_execution_context`` is a new attribute on coroutine objects. Set to ``True`` by default, it makes any execution context modifications performed by coroutine to stay visible only to that coroutine. When Python interpreter sees an ``await`` instruction, it flips ``cr_isolated_execution_context`` to ``False`` for the coroutine that is about to be awaited. This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task. Because the top-level coroutine (Task) cannot be scheduled with ``await`` (in asyncio you need to call ``loop.create_task()`` or ``asyncio.ensure_future()`` to schedule a Task), all execution context modifications are guaranteed to stay within the Task. * We always work with ``tstate->exec_context``. We use ``coro->cr_execution_context`` only to store coroutine's execution context when it is not executing. Figure 2 below illustrates how execution context mutations work with coroutines. .. figure:: pep-0550/coroutines.png :align: center :width: 90% Figure 2. Execution Context flow in coroutines. In the above diagram: * When "coro1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When it awaits on "coro2", any subsequent changes it does to the execution context are visible to "coro1", but not outside of it. In code:: async def inner_foo(): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', 2) async def foo(): print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 1) await inner_foo() print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) asyncio.get_event_loop().run_until_complete(foo()) print('main:', get_execution_context_item('key')) which will output:: main: spam foo: spam inner_foo: 1 foo: 2 main: spam Generator-based coroutines (generators decorated with ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as native coroutines with regards to execution context management: their ``yield from`` expression is semantically equivalent to ``await``. Generators ---------- Generators in Python, while similar to Coroutines, are used in a fundamentally different way. They are producers of data, and they use ``yield`` expression to suspend/resume their execution. A crucial difference between ``await coro`` and ``yield value`` is that the former expression guarantees that the ``coro`` will be executed to the end, while the latter is producing ``value`` and suspending the generator until it gets iterated again. Generators share 99% of their implementation with coroutines, and thus have similar new attributes ``gi_execution_context`` and ``gi_isolated_execution_context``. Similar to coroutines, generators save a reference to the current execution context when they are instantiated. The have the same implementation of ``.send()`` and ``.throw()`` methods. The only difference is that ``gi_isolated_execution_context`` is always set to ``True``, and is never modified by the interpreter. ``yield from o`` expression in regular generators that are not decorated with ``types.coroutine``, is semantically equivalent to ``for v in o: yield v``. .. figure:: pep-0550/generators.png :align: center :width: 90% Figure 3. Execution Context flow in a generator. In the above diagram: * When "gen1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When "gen2" is created, it saves a reference to the current execution context for it -- "2.1". * Any subsequent execution context updated in "gen2" will only be visible to "gen2". * Likewise, any context changes that "gen1" will do after it created "gen2" will not be visible to "gen2". In code:: def inner_foo(): for i in range(3): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', i) yield i def foo(): set_execution_context_item('key', 'spam') print('foo:', get_execution_context_item('key')) inner = inner_foo() while True: val = next(inner, None) if val is None: break yield val print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) list(foo()) print('main:', get_execution_context_item('key')) which will output:: main: ham foo: spam inner_foo: spam foo: spam inner_foo: 0 foo: spam inner_foo: 1 foo: spam main: ham As we see, any modification of the execution context in a generator is visible only to the generator itself. There is one use-case where it is desired for generators to affect the surrounding execution context: ``contextlib.contextmanager`` decorator. To make the following work:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) we modified ``contextmanager`` to flip ``gi_isolated_execution_context`` flag to ``False`` on its generator. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some ``PyThreadState`` fields and pushes/pops the C-stack. Since Execution Context is implemented on top of ``PyThreadState``, it's easy to add transparent support of it to greenlet. New APIs ======== Even though this PEP adds a number of new APIs, please keep in mind, that most Python users will likely ever use only two of them: ``sys.get_execution_context_item()`` and ``sys.set_execution_context_item()``. Python ------ 1. ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` for the current Execution Context. If not found, return ``default``. 2. ``sys.set_execution_context_item(key, value)``: set ``key``/``value`` item for the current Execution Context. If ``value`` is ``None``, the item will be removed. 3. ``sys.get_execution_context()``: return the current Execution Context object: ``sys.ExecutionContext``. 4. ``sys.set_execution_context(ec)``: set the passed ``sys.ExecutionContext`` instance as a current one for the current thread. 5. ``sys.ExecutionContext`` object. Implementation detail: ``sys.ExecutionContext`` wraps a low-level ``PyExecContextData`` object. ``sys.ExecutionContext`` has a mutable mapping API, abstracting away the real immutable ``PyExecContextData``. * ``ExecutionContext()``: construct a new, empty, execution context. * ``ec.run(func, *args)`` method: run ``func(*args)`` in the ``ec`` execution context. * ``ec[key]``: lookup ``key`` in ``ec`` context. * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and ``ec.copy()`` are similar to that of ``dict`` object. C API ----- C API is different from the Python one because it operates directly on the low-level immutable ``PyExecContextData`` object. 1. New ``PyThreadState->exec_context`` field, pointing to a ``PyExecContextData`` object. 2. ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` similar to ``sys.set_execution_context_item()`` and ``sys.get_execution_context_item()``. 3. ``PyThreadState_GetExecContext``: similar to ``sys.get_execution_context()``. Always returns an ``PyExecContextData`` object. If ``PyThreadState->exec_context`` is ``NULL`` an new and empty one will be created and assigned to ``PyThreadState->exec_context``. 4. ``PyThreadState_SetExecContext``: similar to ``sys.set_execution_context()``. 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` object. 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. The exact layout ``PyExecContextData`` is private, which allows to switch it to a different implementation later. More on that in the `Implementation Details`_ section. Modifications in Standard Library ================================= * ``contextlib.contextmanager`` was updated to flip the new ``gi_isolated_execution_context`` attribute on the generator. * ``asyncio.events.Handle`` object now captures the current execution context when it is created, and uses the saved execution context to run the callback (with ``ExecutionContext.run()`` method.) This makes ``loop.call_soon()`` to run callbacks in the execution context they were scheduled. No modifications in ``asyncio.Task`` or ``asyncio.Future`` were necessary. Some standard library modules like ``warnings`` and ``decimal`` can be updated to use new execution contexts. This will be considered in separate issues if this PEP is accepted. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Performance =========== Implementation Details ---------------------- The new ``PyExecContextData`` object is wrapping a ``dict`` object. Any modification requires creating a shallow copy of the dict. While working on the reference implementation of this PEP, we were able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for details. .. figure:: pep-0550/dict_copy.png :align: center :width: 100% Figure 4. Figure 4 shows that the performance of immutable dict implemented with shallow copying is expectedly O(n) for the ``set()`` operation. However, this is tolerable until dict has more than 100 items (1 ``set()`` takes about a microsecond.) Judging by the number of modules that need EC in Standard Library it is likely that real world Python applications will use significantly less than 100 execution context variables. The important point is that the cost of accessing a key in Execution Context is always O(1). If the ``set()`` operation performance is a major concern, we discuss alternative approaches that have O(1) or close ``set()`` performance in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and `Copy-on-write Execution Context`_ sections. Generators and Coroutines ------------------------- Using a microbenchmark for generators and coroutines from :pep:`492` ([12]_), it was possible to observe 0.5 to 1% performance degradation. asyncio echoserver microbechmarks from the uvloop project [13]_ showed 1-1.5% performance degradation for asyncio code. asyncpg benchmarks [14]_, that execute more code and are closer to a real-world application did not exhibit any noticeable performance change. Overall Performance Impact -------------------------- The total number of changed lines in the ceval loop is 2 -- in the ``YIELD_FROM`` opcode implementation. Only performance of generators and coroutines can be affected by the proposal. This was confirmed by running Python Performance Benchmark Suite [15]_, which demonstrated that there is no difference between 3.7 master branch and this PEP reference implementation branch (full benchmark results can be found here [16]_.) Design Considerations ===================== Alternative Immutable Dict Implementation ----------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) performance for both ``set()`` and ``get()`` operations, which will be essentially O(1) for relatively small mappings in EC. To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. .. figure:: pep-0550/hamt_vs_dict.png :align: center :width: 100% Figure 5. Benchmark code can be found here: [9]_. Figure 5 shows that HAMT indeed displays O(1) performance for all benchmarked dictionary sizes. For dictionaries with less than 100 items, HAMT is a bit slower than Python dict/shallow copy. .. figure:: pep-0550/lookup_hamt.png :align: center :width: 100% Figure 6. Benchmark code can be found here: [10]_. Figure 6 below shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. The bottom line is that the current approach with implementing an immutable mapping with shallow-copying dict will likely perform adequately in real-life applications. The HAMT solution is more future proof, however. The proposed API is designed in such a way that the underlying implementation of the mapping can be changed completely without affecting the Execution Context `Specification`_, which allows us to switch to HAMT at some point if necessary. Copy-on-write Execution Context ------------------------------- The implementation of Execution Context in .NET is different from this PEP. .NET uses copy-on-write mechanism and a regular mutable mapping. One way to implement this in CPython would be to have two new fields in ``PyThreadState``: * ``exec_context`` pointing to the current Execution Context mapping; * ``exec_context_copy_on_write`` flag, set to ``0`` initially. The idea is that whenever we are modifying the EC, the copy-on-write flag is checked, and if it is set to ``1``, the EC is copied. Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` methods described in the `Coroutines`_ section will be almost the same, except that in addition to the ``gi_execution_context`` they will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine or a generator starts, the flag will be set to ``1``. This will ensure that any modification of the EC performed within a coroutine or a generator will be isolated. This approach has one advantage: * For Execution Context that contains a large number of items, copy-on-write is a more efficient solution than the shallow-copy dict approach. However, we believe that copy-on-write disadvantages are more important to consider: * Copy-on-write behaviour for generators and coroutines makes EC semantics less predictable. With immutable EC approach, generators and coroutines always execute in the EC that was current at the moment of their creation. Any modifications to the outer EC while a generator or a coroutine is executing are not visible to them:: def generator(): yield 1 print(get_execution_context_item('key')) yield 2 set_execution_context_item('key', 'spam') gen = iter(generator()) next(gen) set_execution_context_item('key', 'ham') next(gen) The above script will always print 'spam' with immutable EC. With a copy-on-write approach, the above script will print 'ham'. Now, consider that ``generator()`` was refactored to call some library function, that uses Execution Context:: def generator(): yield 1 some_function_that_uses_decimal_context() print(get_execution_context_item('key')) yield 2 Now, the script will print 'spam', because ``some_function_that_uses_decimal_context`` forced the EC to copy, and ``set_execution_context_item('key', 'ham')`` line did not affect the ``generator()`` code after all. * Similarly to the previous point, ``sys.ExecutionContext.run()`` method will also become less predictable, as ``sys.get_execution_context()`` would still return a reference to the current mutable EC. We can't modify ``sys.get_execution_context()`` to return a shallow copy of the current EC, because this would seriously harm performance of ``asyncio.call_soon()`` and similar places, where it is important to propagate the Execution Context. * Even though copy-on-write requires to shallow copy the execution context object less frequently, copying will still take place in coroutines and generators. In which case, HAMT approach will perform better for medium to large sized execution contexts. All in all, we believe that the copy-on-write approach introduces very subtle corner cases that could lead to bugs that are exceptionally hard to discover and fix. The immutable EC solution in comparison is always predictable and easy to reason about. Therefore we believe that any slight performance gain that the copy-on-write solution might offer is not worth it. Faster C API ------------ Packages like numpy and standard library modules like decimal need to frequently query the global state for some local context configuration. It is important that the APIs that they use is as fast as possible. The proposed ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` functions need to get the current thread state with ``PyThreadState_GET()`` (fast) and then perform a hash lookup (relatively slow). We can eliminate the hash lookup by adding three additional C API functions: * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` introduced :pep:`523`. The idea is to request a unique index that can later be used to lookup context items. The ``key_name`` can later be used by ``sys.ExecutionContext`` to introspect items added with this API. * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)`` and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` to request an item by its index, avoiding the cost of hash lookup. Why setting a key to None removes the item? ------------------------------------------- Consider a context manager:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) With ``set_execution_context_item(key, None)`` call removing the ``key``, the user doesn't need to write additional code to remove the ``key`` if it wasn't in the execution context already. An alternative design with ``del_execution_context_item()`` method would look like the following:: @contextmanager def context(x): not_there = object() old_x = get_execution_context_item('x', not_there) set_execution_context_item('x', x) try: yield finally: if old_x is not_there: del_execution_context_item('x') else: set_execution_context_item('x', old_x) Can we fix ``PyThreadState_GetDict()``? --------------------------------------- ``PyThreadState_GetDict`` is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: ``__suspend__`` and ``__resume__``. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with ``__asuspend__`` and ``__aresume__``. This allows to implement context managers like decimal context and ``numpy.errstate`` for generators and coroutines. The following code:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) would become this:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __suspend__(self): set_execution_context_item('x', self.old_x) def __resume__(self): set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation. It also does not solve the leaking state problem for greenlet/gevent. :pep:`521` also does not provide any mechanism to propagate state in a local context, like storing a request object in an HTTP request handler to have better logging. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines. Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. Reference Implementation ======================== The reference implementation can be found here: [11]_. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c Copyright ========= This document has been placed in the public domain. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sat Aug 12 14:55:30 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 14:55:30 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Sure, I'll do. Yury From yselivanov.ml at gmail.com Sat Aug 12 15:00:09 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 15:00:09 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Sat, Aug 12, 2017 at 2:28 PM, rymg19 at gmail.com wrote: > So, I'm hardly an expert when it comes to things like this, but there are > two things about this that don't seem right to me. (Also, I'd love to > respond inline, but that's kind of difficult from a mobile phone.) > > The first is how set/get_execution_context_item take strings. Inevitably, > people are going to do things like: Yes, it accepts any hashable Python object as a key. > > CONTEXT_ITEM_NAME = 'foo-bar' > ... > sys.set_execution_context_item(CONTEXT_ITEM_NAME, 'stuff') > > IMO it would be nicer if there could be a key object used instead, e.g. > > my_key = sys.execution_context_key('name-here-for-debugging-purposes') > sys.set_execution_context_item(my_key, 'stuff') I thought about this, and decided that this is something that can be easily designed on top of the PEP and put to the 'contextlib' module. In practice, this issue can be entirely addressed in the documentation, asking users to prefix their keys with their library/framework/program name. > > The advantage here would be no need for string constants and no potential > naming conflicts (the string passed to the key creator would be used just > for debugging, kind of like Thread names). > > > Second thing is this: > > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > > > If this would be done frequently, a context manager would be a *lot* more > Pythonic, e.g.: > > with sys.temp_change_execution_context('x', new_x): > # ... Yes, this is a neat idea and I think we can add such a helper to contextlib. I want to focus PEP 550 API on correctness, minimalism, and performance. Nice APIs can then be easily developed on top of it later. Yury From yselivanov.ml at gmail.com Sat Aug 12 16:58:12 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 16:58:12 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Nathaniel, Nick, I'll reply only to point 9 in this email to split this threads into manageable sub-threads. I'll cover other points in later emails. On Sat, Aug 12, 2017 at 3:54 AM, Nathaniel Smith wrote: > 9. OK, my big question, about semantics. FWIW I took me a good hour to fully understand what you are doing with "fail_after" and what you want from PEP 550, and the actual associated problems with generators :) > > The PEP's design is based on the assumption that all context-local > state is scalar-like, and contexts split but never join. But there are > some cases where this isn't true, in particular for values that have > "stack-like" semantics. These are terms I just made up, but let me > give some examples. Python's sys.exc_info is one. Another I ran into > recently is for trio's cancel scopes. As you yourself show below, it's easy to implement stacks with the proposed EC spec. A linked list will work good enough. > > So basically the background is, in trio you can wrap a context manager > around any arbitrary chunk of code and then set a timeout or > explicitly cancel that code. It's called a "cancel scope". These are > fully nestable. Full details here: > https://trio.readthedocs.io/en/latest/reference-core.html#cancellation-and-timeouts > > Currently, the implementation involves keeping a stack of cancel > scopes in Task-local storage. This works fine for regular async code > because when we switch Tasks, we also switch the cancel scope stack. > But of course it falls apart for generators/async generators: > > async def agen(): > with fail_after(10): # 10 second timeout for finishing this block > await some_blocking_operation() > yield > await another_blocking_operation() > > async def caller(): > with fail_after(20): > ag = agen() > await ag.__anext__() > # now that cancel scope is on the stack, even though we're not > # inside the context manager! this will not end well. > await some_blocking_operation() # this might get cancelled > when it shouldn't > # even if it doesn't, we'll crash here when exiting the context manager > # because we try to pop a cancel scope that isn't at the top of the stack > > So I was thinking about whether I could implement this using PEP 550. > It requires some cleverness, but I could switch to representing the > stack as a singly-linked list, and then snapshot it and pass it back > to the coroutine runner every time I yield. Right. So the task always knows the EC at the point of "yield". It can then get the latest timeout from it and act accordingly if that yield did not resume in time. This should work. > That would fix the case > above. But, I think there's another case that's kind of a showstopper. > > async def agen(): > await some_blocking_operation() > yield > > async def caller(): > ag = agen() # context is captured here > with fail_after(10): > await ag.__anext__() > > Currently this case works correctly: the timeout is applied to the > __anext__ call, as you'd expect. But with PEP 550, it wouldn't work: > the generator's timeouts would all be fixed when it was instantiated, > and we wouldn't be able to detect that the second call has a timeout > imposed on it. So that's a pretty nasty footgun. Any time you have > code that's supposed to have a timeout applied, but in fact has no > timeout applied, then that's a really serious bug -- it can lead to > hangs, trivial DoS, pagers going off, etc. As I tried to explain in my last email, I generally don't believe that people would do this partial iteration with timeouts or other contexts around it. The only use case I can come up so far is implementing some sort of receiver using an AG, and then "listening" on it through "__anext__" calls. But the case is interesting nevertheless, and maybe we can fix it without relaxing any guarantees of the PEP. The idea that I have is to allow linking of ExecutionContext (this is similar in a way to what Nick proposed, but has a stricter semantics): 1. The internal ExecutionContext object will have a new "back" attribute. 2. For regular code and coroutines everything that is already in the PEP will stay the same. 3. For generators and asynchronous generators, when a generator is created, an empty ExecutionContext will be created for it, with its "back" attribute pointing to the current EC. 4. The lookup function will be adjusted to to check the "EC.back" if the key is not found in the current EC. 5. The max level of "back" chain will be 1. 6. When a generator is created inside another generator, it will inherit another generator's EC. Because contexts are immutable this should be OK. 7. When a coroutine is created inside an EC with a "back" link, it will merge EC and EC.back in one new EC. Merge can be done very efficiently for HAMT mappings which I believe we will end up using for this anyways (an O(log32 N) operation). An illustration of what it will allow: def gen(): yield with context(key='spam'): yield yield g = gen() context(key=1) g.send(None) # The code around first yield will see "key=1" context(key=2) g.send(None) # The code around second yield will see "key=spam" context(key=3) g.send(None) # The code around thrird yield will see "key=3" Essentially, it makes generators "transparent" to the outside context changes, but OTOH fully isolate their local context changes from the outside world. This should solve the "fail_after" over a generator case. Nathaniel and Nick, what do you think? Yury From pfreixes at gmail.com Sat Aug 12 17:03:16 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Sat, 12 Aug 2017 23:03:16 +0200 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Good work Yuri, going for all in one will help to not increase the diferences btw async and the sync world in Python. I do really like the idea of the immutable dicts, it makes easy inherit the context btw tasks/threads/whatever without put in risk the consistency if there is further key colisions. Ive just take a look at the asyncio modifications. Correct me if Im wrong, but the handler strategy has a side effect. The work done to save and restore the context will be done twice in some situations. It would happen when the callback is in charge of execute a task step, once by the run in context method and the other one by the coroutine. Is that correct? El 12/08/2017 00:38, "Yury Selivanov" escribi?: Hi, This is a new PEP to implement Execution Contexts in Python. The PEP is in-flight to python.org, and in the meanwhile can be read on GitHub: https://github.com/python/peps/blob/master/pep-0550.rst (it contains a few diagrams and charts, so please read it there.) Thank you! Yury PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017 Abstract ======== This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in. A few examples of where having a reliable state storage is required: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``; * Storing request-related data such as security tokens and request data in web applications; * Profiling, tracing, and logging in complex and large code bases. The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as ``threading.local()``. Unfortunately, TLS does not work for isolating state of generators or asynchronous code because such code shares a single thread. Rationale ========= Traditionally a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:: def calculate(precision, ...): with decimal.localcontext() as ctx: # Set the precision for decimal calculations # inside this block ctx.prec = precision yield calculate_something() yield calculate_something_else() Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. The above code will not work correctly, if a user iterates over the ``calculate()`` generator with different precisions in parallel:: g1 = calculate(100) g2 = calculate(50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision, # second value from g2 calculated with 50 precision. An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix. Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:: async def handle_http_request(request): context.current_http_request = request await ... # Invoke your framework code, render templates, # make DB queries, etc, and use the global # 'current_http_request' in that code. # This isn't currently possible to do reliably # in asyncio out of the box. These examples are just a few out of many, where a reliable way to store context data is absolutely needed. The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, limited to be supported only by code that was explicitly enabled to work with them. Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3]_ as an example issue.) Some languages that have coroutines or generators recommend to manually pass a ``context`` object to every function, see [1]_ describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like ``decimal`` or ``numpy``, which use operator overloading. .NET runtime, which has support for async/await, has a generic solution of this problem, called ``ExecutionContext`` (see [2]_). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code. Goals ===== The goal of this PEP is to provide a more reliable alternative to ``threading.local()``. It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines. An acceptable solution for Python should meet the following requirements: * Transparent support for code executing in threads, coroutines, and generators with an easy to use API. * Negligible impact on the performance of the existing code or the code that will be using the new mechanism. * Fast C API for packages like ``decimal`` and ``numpy``. Explicit is still better than implicit, hence the new APIs should only be used when there is no option to pass the state explicitly. With this PEP implemented, it should be possible to update a context manager like the below:: _local = threading.local() @contextmanager def context(x): old_x = getattr(_local, 'x', None) _local.x = x try: yield finally: _local.x = old_x to a more robust version that can be reliably used in generators and async/await code, with a simple transformation:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) Specification ============= This proposal introduces a new concept called Execution Context (EC), along with a set of Python APIs and C APIs to interact with it. EC is implemented using an immutable mapping. Every modification of the mapping produces a new copy of it. To illustrate what it means let's compare it to how we work with tuples in Python:: a0 = () a1 = a0 + (1,) a2 = a1 + (2,) # a0 is an empty tuple # a1 is (1,) # a2 is (1, 2) Manipulating an EC object would be similar:: a0 = EC() a1 = a0.set('foo', 'bar') a2 = a1.set('spam', 'ham') # a0 is an empty mapping # a1 is {'foo': 'bar'} # a2 is {'foo': 'bar', 'spam': 'ham'} In CPython, every thread that can execute Python code has a corresponding ``PyThreadState`` object. It encapsulates important runtime information like a pointer to the current frame, and is being used by the ceval loop extensively. We add a new field to ``PyThreadState``, called ``exec_context``, which points to the current EC object. We also introduce a set of APIs to work with Execution Context. In this section we will only cover two functions that are needed to explain how Execution Context works. See the full list of new APIs in the `New APIs`_ section. * ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` in the EC of the executing thread. If not found, return ``default``. * ``sys.set_execution_context_item(key, value)``: get the current EC of the executing thread. Add a ``key``/``value`` item to it, which will produce a new EC object. Set the new object as the current one for the executing thread. In pseudo-code:: tstate = PyThreadState_GET() ec = tstate.exec_context ec2 = ec.set(key, value) tstate.exec_context = ec2 Note, that some important implementation details and optimizations are omitted here, and will be covered in later sections of this PEP. Now let's see how Execution Contexts work with regular multi-threaded code, generators, and coroutines. Regular & Multithreaded Code ---------------------------- For regular Python code, EC behaves just like a thread-local. Any modification of the EC object produces a new one, which is immediately set as the current one for the thread state. .. figure:: pep-0550/functions.png :align: center :width: 90% Figure 1. Execution Context flow in a thread. As Figure 1 illustrates, if a function calls ``set_execution_context_item()``, the modification of the execution context will be visible to all subsequent calls and to the caller:: def set_foo(): set_execution_context_item('foo', 'spam') set_execution_context_item('foo', 'bar') print(get_execution_context_item('foo')) set_foo() print(get_execution_context_item('foo')) # will print: # bar # spam Coroutines ---------- Python :pep:`492` coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state. An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an ``await`` expression. In async/await world, awaiting a coroutine can be viewed as a different calling convention: Tasks are similar to threads, and awaiting on coroutines within a Task is similar to calling functions within a thread. By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same thread. To achieve this, a small set of modifications to the coroutine object is needed: * When a coroutine object is instantiated, it saves a reference to the current execution context object to its ``cr_execution_context`` attribute. * Coroutine's ``.send()`` and ``.throw()`` methods are modified as follows (in pseudo-C):: if coro->cr_isolated_execution_context: # Save a reference to the current execution context old_context = tstate->execution_context # Set our saved execution context as the current # for the current thread. tstate->execution_context = coro->cr_execution_context try: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) finally: # Save a reference to the updated execution_context. # We will need it later, when `.send()` or `.throw()` # are called again. coro->cr_execution_context = tstate->execution_context # Restore thread's execution context to what it was before # invoking this coroutine. tstate->execution_context = old_context else: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro->send(...) * ``cr_isolated_execution_context`` is a new attribute on coroutine objects. Set to ``True`` by default, it makes any execution context modifications performed by coroutine to stay visible only to that coroutine. When Python interpreter sees an ``await`` instruction, it flips ``cr_isolated_execution_context`` to ``False`` for the coroutine that is about to be awaited. This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task. Because the top-level coroutine (Task) cannot be scheduled with ``await`` (in asyncio you need to call ``loop.create_task()`` or ``asyncio.ensure_future()`` to schedule a Task), all execution context modifications are guaranteed to stay within the Task. * We always work with ``tstate->exec_context``. We use ``coro->cr_execution_context`` only to store coroutine's execution context when it is not executing. Figure 2 below illustrates how execution context mutations work with coroutines. .. figure:: pep-0550/coroutines.png :align: center :width: 90% Figure 2. Execution Context flow in coroutines. In the above diagram: * When "coro1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When it awaits on "coro2", any subsequent changes it does to the execution context are visible to "coro1", but not outside of it. In code:: async def inner_foo(): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', 2) async def foo(): print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 1) await inner_foo() print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) asyncio.get_event_loop().run_until_complete(foo()) print('main:', get_execution_context_item('key')) which will output:: main: spam foo: spam inner_foo: 1 foo: 2 main: spam Generator-based coroutines (generators decorated with ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as native coroutines with regards to execution context management: their ``yield from`` expression is semantically equivalent to ``await``. Generators ---------- Generators in Python, while similar to Coroutines, are used in a fundamentally different way. They are producers of data, and they use ``yield`` expression to suspend/resume their execution. A crucial difference between ``await coro`` and ``yield value`` is that the former expression guarantees that the ``coro`` will be executed to the end, while the latter is producing ``value`` and suspending the generator until it gets iterated again. Generators share 99% of their implementation with coroutines, and thus have similar new attributes ``gi_execution_context`` and ``gi_isolated_execution_context``. Similar to coroutines, generators save a reference to the current execution context when they are instantiated. The have the same implementation of ``.send()`` and ``.throw()`` methods. The only difference is that ``gi_isolated_execution_context`` is always set to ``True``, and is never modified by the interpreter. ``yield from o`` expression in regular generators that are not decorated with ``types.coroutine``, is semantically equivalent to ``for v in o: yield v``. .. figure:: pep-0550/generators.png :align: center :width: 90% Figure 3. Execution Context flow in a generator. In the above diagram: * When "gen1" is created, it saves a reference to the current execution context "2". * If it makes any change to the context, it will have its own execution context branch "2.1". * When "gen2" is created, it saves a reference to the current execution context for it -- "2.1". * Any subsequent execution context updated in "gen2" will only be visible to "gen2". * Likewise, any context changes that "gen1" will do after it created "gen2" will not be visible to "gen2". In code:: def inner_foo(): for i in range(3): print('inner_foo:', get_execution_context_item('key')) set_execution_context_item('key', i) yield i def foo(): set_execution_context_item('key', 'spam') print('foo:', get_execution_context_item('key')) inner = inner_foo() while True: val = next(inner, None) if val is None: break yield val print('foo:', get_execution_context_item('key')) set_execution_context_item('key', 'spam') print('main:', get_execution_context_item('key')) list(foo()) print('main:', get_execution_context_item('key')) which will output:: main: ham foo: spam inner_foo: spam foo: spam inner_foo: 0 foo: spam inner_foo: 1 foo: spam main: ham As we see, any modification of the execution context in a generator is visible only to the generator itself. There is one use-case where it is desired for generators to affect the surrounding execution context: ``contextlib.contextmanager`` decorator. To make the following work:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) we modified ``contextmanager`` to flip ``gi_isolated_execution_context`` flag to ``False`` on its generator. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some ``PyThreadState`` fields and pushes/pops the C-stack. Since Execution Context is implemented on top of ``PyThreadState``, it's easy to add transparent support of it to greenlet. New APIs ======== Even though this PEP adds a number of new APIs, please keep in mind, that most Python users will likely ever use only two of them: ``sys.get_execution_context_item()`` and ``sys.set_execution_context_item()``. Python ------ 1. ``sys.get_execution_context_item(key, default=None)``: lookup ``key`` for the current Execution Context. If not found, return ``default``. 2. ``sys.set_execution_context_item(key, value)``: set ``key``/``value`` item for the current Execution Context. If ``value`` is ``None``, the item will be removed. 3. ``sys.get_execution_context()``: return the current Execution Context object: ``sys.ExecutionContext``. 4. ``sys.set_execution_context(ec)``: set the passed ``sys.ExecutionContext`` instance as a current one for the current thread. 5. ``sys.ExecutionContext`` object. Implementation detail: ``sys.ExecutionContext`` wraps a low-level ``PyExecContextData`` object. ``sys.ExecutionContext`` has a mutable mapping API, abstracting away the real immutable ``PyExecContextData``. * ``ExecutionContext()``: construct a new, empty, execution context. * ``ec.run(func, *args)`` method: run ``func(*args)`` in the ``ec`` execution context. * ``ec[key]``: lookup ``key`` in ``ec`` context. * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and ``ec.copy()`` are similar to that of ``dict`` object. C API ----- C API is different from the Python one because it operates directly on the low-level immutable ``PyExecContextData`` object. 1. New ``PyThreadState->exec_context`` field, pointing to a ``PyExecContextData`` object. 2. ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` similar to ``sys.set_execution_context_item()`` and ``sys.get_execution_context_item()``. 3. ``PyThreadState_GetExecContext``: similar to ``sys.get_execution_context()``. Always returns an ``PyExecContextData`` object. If ``PyThreadState->exec_context`` is ``NULL`` an new and empty one will be created and assigned to ``PyThreadState->exec_context``. 4. ``PyThreadState_SetExecContext``: similar to ``sys.set_execution_context()``. 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` object. 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. The exact layout ``PyExecContextData`` is private, which allows to switch it to a different implementation later. More on that in the `Implementation Details`_ section. Modifications in Standard Library ================================= * ``contextlib.contextmanager`` was updated to flip the new ``gi_isolated_execution_context`` attribute on the generator. * ``asyncio.events.Handle`` object now captures the current execution context when it is created, and uses the saved execution context to run the callback (with ``ExecutionContext.run()`` method.) This makes ``loop.call_soon()`` to run callbacks in the execution context they were scheduled. No modifications in ``asyncio.Task`` or ``asyncio.Future`` were necessary. Some standard library modules like ``warnings`` and ``decimal`` can be updated to use new execution contexts. This will be considered in separate issues if this PEP is accepted. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Performance =========== Implementation Details ---------------------- The new ``PyExecContextData`` object is wrapping a ``dict`` object. Any modification requires creating a shallow copy of the dict. While working on the reference implementation of this PEP, we were able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for details. .. figure:: pep-0550/dict_copy.png :align: center :width: 100% Figure 4. Figure 4 shows that the performance of immutable dict implemented with shallow copying is expectedly O(n) for the ``set()`` operation. However, this is tolerable until dict has more than 100 items (1 ``set()`` takes about a microsecond.) Judging by the number of modules that need EC in Standard Library it is likely that real world Python applications will use significantly less than 100 execution context variables. The important point is that the cost of accessing a key in Execution Context is always O(1). If the ``set()`` operation performance is a major concern, we discuss alternative approaches that have O(1) or close ``set()`` performance in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and `Copy-on-write Execution Context`_ sections. Generators and Coroutines ------------------------- Using a microbenchmark for generators and coroutines from :pep:`492` ([12]_), it was possible to observe 0.5 to 1% performance degradation. asyncio echoserver microbechmarks from the uvloop project [13]_ showed 1-1.5% performance degradation for asyncio code. asyncpg benchmarks [14]_, that execute more code and are closer to a real-world application did not exhibit any noticeable performance change. Overall Performance Impact -------------------------- The total number of changed lines in the ceval loop is 2 -- in the ``YIELD_FROM`` opcode implementation. Only performance of generators and coroutines can be affected by the proposal. This was confirmed by running Python Performance Benchmark Suite [15]_, which demonstrated that there is no difference between 3.7 master branch and this PEP reference implementation branch (full benchmark results can be found here [16]_.) Design Considerations ===================== Alternative Immutable Dict Implementation ----------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) performance for both ``set()`` and ``get()`` operations, which will be essentially O(1) for relatively small mappings in EC. To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. .. figure:: pep-0550/hamt_vs_dict.png :align: center :width: 100% Figure 5. Benchmark code can be found here: [9]_. Figure 5 shows that HAMT indeed displays O(1) performance for all benchmarked dictionary sizes. For dictionaries with less than 100 items, HAMT is a bit slower than Python dict/shallow copy. .. figure:: pep-0550/lookup_hamt.png :align: center :width: 100% Figure 6. Benchmark code can be found here: [10]_. Figure 6 below shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. The bottom line is that the current approach with implementing an immutable mapping with shallow-copying dict will likely perform adequately in real-life applications. The HAMT solution is more future proof, however. The proposed API is designed in such a way that the underlying implementation of the mapping can be changed completely without affecting the Execution Context `Specification`_, which allows us to switch to HAMT at some point if necessary. Copy-on-write Execution Context ------------------------------- The implementation of Execution Context in .NET is different from this PEP. .NET uses copy-on-write mechanism and a regular mutable mapping. One way to implement this in CPython would be to have two new fields in ``PyThreadState``: * ``exec_context`` pointing to the current Execution Context mapping; * ``exec_context_copy_on_write`` flag, set to ``0`` initially. The idea is that whenever we are modifying the EC, the copy-on-write flag is checked, and if it is set to ``1``, the EC is copied. Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` methods described in the `Coroutines`_ section will be almost the same, except that in addition to the ``gi_execution_context`` they will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine or a generator starts, the flag will be set to ``1``. This will ensure that any modification of the EC performed within a coroutine or a generator will be isolated. This approach has one advantage: * For Execution Context that contains a large number of items, copy-on-write is a more efficient solution than the shallow-copy dict approach. However, we believe that copy-on-write disadvantages are more important to consider: * Copy-on-write behaviour for generators and coroutines makes EC semantics less predictable. With immutable EC approach, generators and coroutines always execute in the EC that was current at the moment of their creation. Any modifications to the outer EC while a generator or a coroutine is executing are not visible to them:: def generator(): yield 1 print(get_execution_context_item('key')) yield 2 set_execution_context_item('key', 'spam') gen = iter(generator()) next(gen) set_execution_context_item('key', 'ham') next(gen) The above script will always print 'spam' with immutable EC. With a copy-on-write approach, the above script will print 'ham'. Now, consider that ``generator()`` was refactored to call some library function, that uses Execution Context:: def generator(): yield 1 some_function_that_uses_decimal_context() print(get_execution_context_item('key')) yield 2 Now, the script will print 'spam', because ``some_function_that_uses_decimal_context`` forced the EC to copy, and ``set_execution_context_item('key', 'ham')`` line did not affect the ``generator()`` code after all. * Similarly to the previous point, ``sys.ExecutionContext.run()`` method will also become less predictable, as ``sys.get_execution_context()`` would still return a reference to the current mutable EC. We can't modify ``sys.get_execution_context()`` to return a shallow copy of the current EC, because this would seriously harm performance of ``asyncio.call_soon()`` and similar places, where it is important to propagate the Execution Context. * Even though copy-on-write requires to shallow copy the execution context object less frequently, copying will still take place in coroutines and generators. In which case, HAMT approach will perform better for medium to large sized execution contexts. All in all, we believe that the copy-on-write approach introduces very subtle corner cases that could lead to bugs that are exceptionally hard to discover and fix. The immutable EC solution in comparison is always predictable and easy to reason about. Therefore we believe that any slight performance gain that the copy-on-write solution might offer is not worth it. Faster C API ------------ Packages like numpy and standard library modules like decimal need to frequently query the global state for some local context configuration. It is important that the APIs that they use is as fast as possible. The proposed ``PyThreadState_SetExecContextItem`` and ``PyThreadState_GetExecContextItem`` functions need to get the current thread state with ``PyThreadState_GET()`` (fast) and then perform a hash lookup (relatively slow). We can eliminate the hash lookup by adding three additional C API functions: * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` introduced :pep:`523`. The idea is to request a unique index that can later be used to lookup context items. The ``key_name`` can later be used by ``sys.ExecutionContext`` to introspect items added with this API. * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)`` and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` to request an item by its index, avoiding the cost of hash lookup. Why setting a key to None removes the item? ------------------------------------------- Consider a context manager:: @contextmanager def context(x): old_x = get_execution_context_item('x') set_execution_context_item('x', x) try: yield finally: set_execution_context_item('x', old_x) With ``set_execution_context_item(key, None)`` call removing the ``key``, the user doesn't need to write additional code to remove the ``key`` if it wasn't in the execution context already. An alternative design with ``del_execution_context_item()`` method would look like the following:: @contextmanager def context(x): not_there = object() old_x = get_execution_context_item('x', not_there) set_execution_context_item('x', x) try: yield finally: if old_x is not_there: del_execution_context_item('x') else: set_execution_context_item('x', old_x) Can we fix ``PyThreadState_GetDict()``? --------------------------------------- ``PyThreadState_GetDict`` is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: ``__suspend__`` and ``__resume__``. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with ``__asuspend__`` and ``__aresume__``. This allows to implement context managers like decimal context and ``numpy.errstate`` for generators and coroutines. The following code:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) would become this:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __suspend__(self): set_execution_context_item('x', self.old_x) def __resume__(self): set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation. It also does not solve the leaking state problem for greenlet/gevent. :pep:`521` also does not provide any mechanism to propagate state in a local context, like storing a request object in an HTTP request handler to have better logging. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines. Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. Reference Implementation ======================== The reference implementation can be found here: [11]_. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading. executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures- persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/ bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c Copyright ========= This document has been placed in the public domain. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Aug 12 19:35:44 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 12 Aug 2017 16:35:44 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: I had an idea for an alternative API that exposes the same functionality/semantics as the current draft, but that might have some advantages. It would look like: # a "context item" is an object that holds a context-sensitive value # each call to create_context_item creates a new one ci = sys.create_context_item() # Set the value of this item in the current context ci.set(value) # Get the value of this item in the current context value = ci.get() value = ci.get(default) # To support async libraries, we need some way to capture the whole context # But an opaque token representing "all context item values" is enough state_token = sys.current_context_state_token() sys.set_context_state_token(state_token) coro.cr_state_token = state_token # etc. The advantages are: - Eliminates the current PEP's issues with namespace collision; every context item is automatically distinct from all others. - Eliminates the need for the None-means-del hack. - Lets the interpreter hide the details of garbage collecting context values. - Allows for more implementation flexibility. This could be implemented directly on top of Yury's current prototype. But it could also, for example, be implemented by storing the context values in a flat array, where each context item is assigned an index when it's allocated. In the current draft this is suggested as a possible extension for particularly performance-sensitive users, but this way we'd have the option of making everything fast without changing or extending the API. As precedent, this is basically the API that low-level thread-local storage implementations use; see e.g. pthread_key_create, pthread_getspecific, pthread_setspecific. (And the allocate-an-index-in-a-table is the implementation that fast thread-local storage implementations use too.) -n On Fri, Aug 11, 2017 at 3:37 PM, Yury Selivanov wrote: > Hi, > > This is a new PEP to implement Execution Contexts in Python. > > The PEP is in-flight to python.org, and in the meanwhile can > be read on GitHub: > > https://github.com/python/peps/blob/master/pep-0550.rst > > (it contains a few diagrams and charts, so please read it there.) > > Thank you! > Yury > > > PEP: 550 > Title: Execution Context > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 11-Aug-2017 > Python-Version: 3.7 > Post-History: 11-Aug-2017 > > > Abstract > ======== > > This PEP proposes a new mechanism to manage execution state--the > logical environment in which a function, a thread, a generator, > or a coroutine executes in. > > A few examples of where having a reliable state storage is required: > > * Context managers like decimal contexts, ``numpy.errstate``, > and ``warnings.catch_warnings``; > > * Storing request-related data such as security tokens and request > data in web applications; > > * Profiling, tracing, and logging in complex and large code bases. > > The usual solution for storing state is to use a Thread-local Storage > (TLS), implemented in the standard library as ``threading.local()``. > Unfortunately, TLS does not work for isolating state of generators or > asynchronous code because such code shares a single thread. > > > Rationale > ========= > > Traditionally a Thread-local Storage (TLS) is used for storing the > state. However, the major flaw of using the TLS is that it works only > for multi-threaded code. It is not possible to reliably contain the > state within a generator or a coroutine. For example, consider > the following generator:: > > def calculate(precision, ...): > with decimal.localcontext() as ctx: > # Set the precision for decimal calculations > # inside this block > ctx.prec = precision > > yield calculate_something() > yield calculate_something_else() > > Decimal context is using a TLS to store the state, and because TLS is > not aware of generators, the state can leak. The above code will > not work correctly, if a user iterates over the ``calculate()`` > generator with different precisions in parallel:: > > g1 = calculate(100) > g2 = calculate(50) > > items = list(zip(g1, g2)) > > # items[0] will be a tuple of: > # first value from g1 calculated with 100 precision, > # first value from g2 calculated with 50 precision. > # > # items[1] will be a tuple of: > # second value from g1 calculated with 50 precision, > # second value from g2 calculated with 50 precision. > > An even scarier example would be using decimals to represent money > in an async/await application: decimal calculations can suddenly > lose precision in the middle of processing a request. Currently, > bugs like this are extremely hard to find and fix. > > Another common need for web applications is to have access to the > current request object, or security context, or, simply, the request > URL for logging or submitting performance tracing data:: > > async def handle_http_request(request): > context.current_http_request = request > > await ... > # Invoke your framework code, render templates, > # make DB queries, etc, and use the global > # 'current_http_request' in that code. > > # This isn't currently possible to do reliably > # in asyncio out of the box. > > These examples are just a few out of many, where a reliable way to > store context data is absolutely needed. > > The inability to use TLS for asynchronous code has lead to > proliferation of ad-hoc solutions, limited to be supported only by > code that was explicitly enabled to work with them. > > Current status quo is that any library, including the standard > library, that uses a TLS, will likely not work as expected in > asynchronous code or with generators (see [3]_ as an example issue.) > > Some languages that have coroutines or generators recommend to > manually pass a ``context`` object to every function, see [1]_ > describing the pattern for Go. This approach, however, has limited > use for Python, where we have a huge ecosystem that was built to work > with a TLS-like context. Moreover, passing the context explicitly > does not work at all for libraries like ``decimal`` or ``numpy``, > which use operator overloading. > > .NET runtime, which has support for async/await, has a generic > solution of this problem, called ``ExecutionContext`` (see [2]_). > On the surface, working with it is very similar to working with a TLS, > but the former explicitly supports asynchronous code. > > > Goals > ===== > > The goal of this PEP is to provide a more reliable alternative to > ``threading.local()``. It should be explicitly designed to work with > Python execution model, equally supporting threads, generators, and > coroutines. > > An acceptable solution for Python should meet the following > requirements: > > * Transparent support for code executing in threads, coroutines, > and generators with an easy to use API. > > * Negligible impact on the performance of the existing code or the > code that will be using the new mechanism. > > * Fast C API for packages like ``decimal`` and ``numpy``. > > Explicit is still better than implicit, hence the new APIs should only > be used when there is no option to pass the state explicitly. > > With this PEP implemented, it should be possible to update a context > manager like the below:: > > _local = threading.local() > > @contextmanager > def context(x): > old_x = getattr(_local, 'x', None) > _local.x = x > try: > yield > finally: > _local.x = old_x > > to a more robust version that can be reliably used in generators > and async/await code, with a simple transformation:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > > Specification > ============= > > This proposal introduces a new concept called Execution Context (EC), > along with a set of Python APIs and C APIs to interact with it. > > EC is implemented using an immutable mapping. Every modification > of the mapping produces a new copy of it. To illustrate what it > means let's compare it to how we work with tuples in Python:: > > a0 = () > a1 = a0 + (1,) > a2 = a1 + (2,) > > # a0 is an empty tuple > # a1 is (1,) > # a2 is (1, 2) > > Manipulating an EC object would be similar:: > > a0 = EC() > a1 = a0.set('foo', 'bar') > a2 = a1.set('spam', 'ham') > > # a0 is an empty mapping > # a1 is {'foo': 'bar'} > # a2 is {'foo': 'bar', 'spam': 'ham'} > > In CPython, every thread that can execute Python code has a > corresponding ``PyThreadState`` object. It encapsulates important > runtime information like a pointer to the current frame, and is > being used by the ceval loop extensively. We add a new field to > ``PyThreadState``, called ``exec_context``, which points to the > current EC object. > > We also introduce a set of APIs to work with Execution Context. > In this section we will only cover two functions that are needed to > explain how Execution Context works. See the full list of new APIs > in the `New APIs`_ section. > > * ``sys.get_execution_context_item(key, default=None)``: lookup > ``key`` in the EC of the executing thread. If not found, > return ``default``. > > * ``sys.set_execution_context_item(key, value)``: get the > current EC of the executing thread. Add a ``key``/``value`` > item to it, which will produce a new EC object. Set the > new object as the current one for the executing thread. > In pseudo-code:: > > tstate = PyThreadState_GET() > ec = tstate.exec_context > ec2 = ec.set(key, value) > tstate.exec_context = ec2 > > Note, that some important implementation details and optimizations > are omitted here, and will be covered in later sections of this PEP. > > Now let's see how Execution Contexts work with regular multi-threaded > code, generators, and coroutines. > > > Regular & Multithreaded Code > ---------------------------- > > For regular Python code, EC behaves just like a thread-local. Any > modification of the EC object produces a new one, which is immediately > set as the current one for the thread state. > > .. figure:: pep-0550/functions.png > :align: center > :width: 90% > > Figure 1. Execution Context flow in a thread. > > As Figure 1 illustrates, if a function calls > ``set_execution_context_item()``, the modification of the execution > context will be visible to all subsequent calls and to the caller:: > > def set_foo(): > set_execution_context_item('foo', 'spam') > > set_execution_context_item('foo', 'bar') > print(get_execution_context_item('foo')) > > set_foo() > print(get_execution_context_item('foo')) > > # will print: > # bar > # spam > > > Coroutines > ---------- > > Python :pep:`492` coroutines are used to implement cooperative > multitasking. For a Python end-user they are similar to threads, > especially when it comes to sharing resources or modifying > the global state. > > An event loop is needed to schedule coroutines. Coroutines that > are explicitly scheduled by the user are usually called Tasks. > When a coroutine is scheduled, it can schedule other coroutines using > an ``await`` expression. In async/await world, awaiting a coroutine > can be viewed as a different calling convention: Tasks are similar to > threads, and awaiting on coroutines within a Task is similar to > calling functions within a thread. > > By drawing a parallel between regular multithreaded code and > async/await, it becomes apparent that any modification of the > execution context within one Task should be visible to all coroutines > scheduled within it. Any execution context modifications, however, > must not be visible to other Tasks executing within the same thread. > > To achieve this, a small set of modifications to the coroutine object > is needed: > > * When a coroutine object is instantiated, it saves a reference to > the current execution context object to its ``cr_execution_context`` > attribute. > > * Coroutine's ``.send()`` and ``.throw()`` methods are modified as > follows (in pseudo-C):: > > if coro->cr_isolated_execution_context: > # Save a reference to the current execution context > old_context = tstate->execution_context > > # Set our saved execution context as the current > # for the current thread. > tstate->execution_context = coro->cr_execution_context > > try: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro->send(...) > finally: > # Save a reference to the updated execution_context. > # We will need it later, when `.send()` or `.throw()` > # are called again. > coro->cr_execution_context = tstate->execution_context > > # Restore thread's execution context to what it was before > # invoking this coroutine. > tstate->execution_context = old_context > else: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro->send(...) > > * ``cr_isolated_execution_context`` is a new attribute on coroutine > objects. Set to ``True`` by default, it makes any execution context > modifications performed by coroutine to stay visible only to that > coroutine. > > When Python interpreter sees an ``await`` instruction, it flips > ``cr_isolated_execution_context`` to ``False`` for the coroutine > that is about to be awaited. This makes any changes to execution > context made by nested coroutine calls within a Task to be visible > throughout the Task. > > Because the top-level coroutine (Task) cannot be scheduled with > ``await`` (in asyncio you need to call ``loop.create_task()`` or > ``asyncio.ensure_future()`` to schedule a Task), all execution > context modifications are guaranteed to stay within the Task. > > * We always work with ``tstate->exec_context``. We use > ``coro->cr_execution_context`` only to store coroutine's execution > context when it is not executing. > > Figure 2 below illustrates how execution context mutations work with > coroutines. > > .. figure:: pep-0550/coroutines.png > :align: center > :width: 90% > > Figure 2. Execution Context flow in coroutines. > > In the above diagram: > > * When "coro1" is created, it saves a reference to the current > execution context "2". > > * If it makes any change to the context, it will have its own > execution context branch "2.1". > > * When it awaits on "coro2", any subsequent changes it does to > the execution context are visible to "coro1", but not outside > of it. > > In code:: > > async def inner_foo(): > print('inner_foo:', get_execution_context_item('key')) > set_execution_context_item('key', 2) > > async def foo(): > print('foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 1) > await inner_foo() > > print('foo:', get_execution_context_item('key')) > > > set_execution_context_item('key', 'spam') > print('main:', get_execution_context_item('key')) > > asyncio.get_event_loop().run_until_complete(foo()) > > print('main:', get_execution_context_item('key')) > > which will output:: > > main: spam > foo: spam > inner_foo: 1 > foo: 2 > main: spam > > Generator-based coroutines (generators decorated with > ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as > native coroutines with regards to execution context management: > their ``yield from`` expression is semantically equivalent to > ``await``. > > > Generators > ---------- > > Generators in Python, while similar to Coroutines, are used in a > fundamentally different way. They are producers of data, and > they use ``yield`` expression to suspend/resume their execution. > > A crucial difference between ``await coro`` and ``yield value`` is > that the former expression guarantees that the ``coro`` will be > executed to the end, while the latter is producing ``value`` and > suspending the generator until it gets iterated again. > > Generators share 99% of their implementation with coroutines, and > thus have similar new attributes ``gi_execution_context`` and > ``gi_isolated_execution_context``. Similar to coroutines, generators > save a reference to the current execution context when they are > instantiated. The have the same implementation of ``.send()`` and > ``.throw()`` methods. > > The only difference is that > ``gi_isolated_execution_context`` is always set to ``True``, and > is never modified by the interpreter. ``yield from o`` expression in > regular generators that are not decorated with ``types.coroutine``, > is semantically equivalent to ``for v in o: yield v``. > > .. figure:: pep-0550/generators.png > :align: center > :width: 90% > > Figure 3. Execution Context flow in a generator. > > In the above diagram: > > * When "gen1" is created, it saves a reference to the current > execution context "2". > > * If it makes any change to the context, it will have its own > execution context branch "2.1". > > * When "gen2" is created, it saves a reference to the current > execution context for it -- "2.1". > > * Any subsequent execution context updated in "gen2" will only > be visible to "gen2". > > * Likewise, any context changes that "gen1" will do after it > created "gen2" will not be visible to "gen2". > > In code:: > > def inner_foo(): > for i in range(3): > print('inner_foo:', get_execution_context_item('key')) > set_execution_context_item('key', i) > yield i > > > def foo(): > set_execution_context_item('key', 'spam') > print('foo:', get_execution_context_item('key')) > > inner = inner_foo() > > while True: > val = next(inner, None) > if val is None: > break > yield val > print('foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 'spam') > print('main:', get_execution_context_item('key')) > > list(foo()) > > print('main:', get_execution_context_item('key')) > > which will output:: > > main: ham > foo: spam > inner_foo: spam > foo: spam > inner_foo: 0 > foo: spam > inner_foo: 1 > foo: spam > main: ham > > As we see, any modification of the execution context in a generator > is visible only to the generator itself. > > There is one use-case where it is desired for generators to affect > the surrounding execution context: ``contextlib.contextmanager`` > decorator. To make the following work:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > we modified ``contextmanager`` to flip > ``gi_isolated_execution_context`` flag to ``False`` on its generator. > > > Greenlets > --------- > > Greenlet is an alternative implementation of cooperative > scheduling for Python. Although greenlet package is not part of > CPython, popular frameworks like gevent rely on it, and it is > important that greenlet can be modified to support execution > contexts. > > In a nutshell, greenlet design is very similar to design of > generators. The main difference is that for generators, the stack > is managed by the Python interpreter. Greenlet works outside of the > Python interpreter, and manually saves some ``PyThreadState`` > fields and pushes/pops the C-stack. Since Execution Context is > implemented on top of ``PyThreadState``, it's easy to add > transparent support of it to greenlet. > > > New APIs > ======== > > Even though this PEP adds a number of new APIs, please keep in mind, > that most Python users will likely ever use only two of them: > ``sys.get_execution_context_item()`` and > ``sys.set_execution_context_item()``. > > > Python > ------ > > 1. ``sys.get_execution_context_item(key, default=None)``: lookup > ``key`` for the current Execution Context. If not found, > return ``default``. > > 2. ``sys.set_execution_context_item(key, value)``: set > ``key``/``value`` item for the current Execution Context. > If ``value`` is ``None``, the item will be removed. > > 3. ``sys.get_execution_context()``: return the current Execution > Context object: ``sys.ExecutionContext``. > > 4. ``sys.set_execution_context(ec)``: set the passed > ``sys.ExecutionContext`` instance as a current one for the current > thread. > > 5. ``sys.ExecutionContext`` object. > > Implementation detail: ``sys.ExecutionContext`` wraps a low-level > ``PyExecContextData`` object. ``sys.ExecutionContext`` has a > mutable mapping API, abstracting away the real immutable > ``PyExecContextData``. > > * ``ExecutionContext()``: construct a new, empty, execution > context. > > * ``ec.run(func, *args)`` method: run ``func(*args)`` in the > ``ec`` execution context. > > * ``ec[key]``: lookup ``key`` in ``ec`` context. > > * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. > > * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and > ``ec.copy()`` are similar to that of ``dict`` object. > > > C API > ----- > > C API is different from the Python one because it operates directly > on the low-level immutable ``PyExecContextData`` object. > > 1. New ``PyThreadState->exec_context`` field, pointing to a > ``PyExecContextData`` object. > > 2. ``PyThreadState_SetExecContextItem`` and > ``PyThreadState_GetExecContextItem`` similar to > ``sys.set_execution_context_item()`` and > ``sys.get_execution_context_item()``. > > 3. ``PyThreadState_GetExecContext``: similar to > ``sys.get_execution_context()``. Always returns an > ``PyExecContextData`` object. If ``PyThreadState->exec_context`` > is ``NULL`` an new and empty one will be created and assigned > to ``PyThreadState->exec_context``. > > 4. ``PyThreadState_SetExecContext``: similar to > ``sys.set_execution_context()``. > > 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` > object. > > 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. > > The exact layout ``PyExecContextData`` is private, which allows > to switch it to a different implementation later. More on that > in the `Implementation Details`_ section. > > > Modifications in Standard Library > ================================= > > * ``contextlib.contextmanager`` was updated to flip the new > ``gi_isolated_execution_context`` attribute on the generator. > > * ``asyncio.events.Handle`` object now captures the current > execution context when it is created, and uses the saved > execution context to run the callback (with > ``ExecutionContext.run()`` method.) This makes > ``loop.call_soon()`` to run callbacks in the execution context > they were scheduled. > > No modifications in ``asyncio.Task`` or ``asyncio.Future`` were > necessary. > > Some standard library modules like ``warnings`` and ``decimal`` > can be updated to use new execution contexts. This will be considered > in separate issues if this PEP is accepted. > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > > Performance > =========== > > Implementation Details > ---------------------- > > The new ``PyExecContextData`` object is wrapping a ``dict`` object. > Any modification requires creating a shallow copy of the dict. > > While working on the reference implementation of this PEP, we were > able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for > details. > > .. figure:: pep-0550/dict_copy.png > :align: center > :width: 100% > > Figure 4. > > Figure 4 shows that the performance of immutable dict implemented > with shallow copying is expectedly O(n) for the ``set()`` operation. > However, this is tolerable until dict has more than 100 items > (1 ``set()`` takes about a microsecond.) > > Judging by the number of modules that need EC in Standard Library > it is likely that real world Python applications will use > significantly less than 100 execution context variables. > > The important point is that the cost of accessing a key in > Execution Context is always O(1). > > If the ``set()`` operation performance is a major concern, we discuss > alternative approaches that have O(1) or close ``set()`` performance > in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and > `Copy-on-write Execution Context`_ sections. > > > Generators and Coroutines > ------------------------- > > Using a microbenchmark for generators and coroutines from :pep:`492` > ([12]_), it was possible to observe 0.5 to 1% performance degradation. > > asyncio echoserver microbechmarks from the uvloop project [13]_ > showed 1-1.5% performance degradation for asyncio code. > > asyncpg benchmarks [14]_, that execute more code and are closer to a > real-world application did not exhibit any noticeable performance > change. > > > Overall Performance Impact > -------------------------- > > The total number of changed lines in the ceval loop is 2 -- in the > ``YIELD_FROM`` opcode implementation. Only performance of generators > and coroutines can be affected by the proposal. > > This was confirmed by running Python Performance Benchmark Suite > [15]_, which demonstrated that there is no difference between > 3.7 master branch and this PEP reference implementation branch > (full benchmark results can be found here [16]_.) > > > Design Considerations > ===================== > > Alternative Immutable Dict Implementation > ----------------------------------------- > > Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) > to implement high performance immutable collections [5]_, [6]_. > > Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) > performance for both ``set()`` and ``get()`` operations, which will > be essentially O(1) for relatively small mappings in EC. > > To assess if HAMT can be used for Execution Context, we implemented > it in CPython [7]_. > > .. figure:: pep-0550/hamt_vs_dict.png > :align: center > :width: 100% > > Figure 5. Benchmark code can be found here: [9]_. > > Figure 5 shows that HAMT indeed displays O(1) performance for all > benchmarked dictionary sizes. For dictionaries with less than 100 > items, HAMT is a bit slower than Python dict/shallow copy. > > .. figure:: pep-0550/lookup_hamt.png > :align: center > :width: 100% > > Figure 6. Benchmark code can be found here: [10]_. > > Figure 6 below shows comparison of lookup costs between Python dict > and an HAMT immutable mapping. HAMT lookup time is 30-40% worse > than Python dict lookups on average, which is a very good result, > considering how well Python dicts are optimized. > > Note, that according to [8]_, HAMT design can be further improved. > > The bottom line is that the current approach with implementing > an immutable mapping with shallow-copying dict will likely perform > adequately in real-life applications. The HAMT solution is more > future proof, however. > > The proposed API is designed in such a way that the underlying > implementation of the mapping can be changed completely without > affecting the Execution Context `Specification`_, which allows > us to switch to HAMT at some point if necessary. > > > Copy-on-write Execution Context > ------------------------------- > > The implementation of Execution Context in .NET is different from > this PEP. .NET uses copy-on-write mechanism and a regular mutable > mapping. > > One way to implement this in CPython would be to have two new > fields in ``PyThreadState``: > > * ``exec_context`` pointing to the current Execution Context mapping; > * ``exec_context_copy_on_write`` flag, set to ``0`` initially. > > The idea is that whenever we are modifying the EC, the copy-on-write > flag is checked, and if it is set to ``1``, the EC is copied. > > Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` > methods described in the `Coroutines`_ section will be almost the > same, except that in addition to the ``gi_execution_context`` they > will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine > or a generator starts, the flag will be set to ``1``. This will > ensure that any modification of the EC performed within a coroutine > or a generator will be isolated. > > This approach has one advantage: > > * For Execution Context that contains a large number of items, > copy-on-write is a more efficient solution than the shallow-copy > dict approach. > > However, we believe that copy-on-write disadvantages are more > important to consider: > > * Copy-on-write behaviour for generators and coroutines makes > EC semantics less predictable. > > With immutable EC approach, generators and coroutines always > execute in the EC that was current at the moment of their > creation. Any modifications to the outer EC while a generator > or a coroutine is executing are not visible to them:: > > def generator(): > yield 1 > print(get_execution_context_item('key')) > yield 2 > > set_execution_context_item('key', 'spam') > gen = iter(generator()) > next(gen) > set_execution_context_item('key', 'ham') > next(gen) > > The above script will always print 'spam' with immutable EC. > > With a copy-on-write approach, the above script will print 'ham'. > Now, consider that ``generator()`` was refactored to call some > library function, that uses Execution Context:: > > def generator(): > yield 1 > some_function_that_uses_decimal_context() > print(get_execution_context_item('key')) > yield 2 > > Now, the script will print 'spam', because > ``some_function_that_uses_decimal_context`` forced the EC to copy, > and ``set_execution_context_item('key', 'ham')`` line did not > affect the ``generator()`` code after all. > > * Similarly to the previous point, ``sys.ExecutionContext.run()`` > method will also become less predictable, as > ``sys.get_execution_context()`` would still return a reference to > the current mutable EC. > > We can't modify ``sys.get_execution_context()`` to return a shallow > copy of the current EC, because this would seriously harm > performance of ``asyncio.call_soon()`` and similar places, where > it is important to propagate the Execution Context. > > * Even though copy-on-write requires to shallow copy the execution > context object less frequently, copying will still take place > in coroutines and generators. In which case, HAMT approach will > perform better for medium to large sized execution contexts. > > All in all, we believe that the copy-on-write approach introduces > very subtle corner cases that could lead to bugs that are > exceptionally hard to discover and fix. > > The immutable EC solution in comparison is always predictable and > easy to reason about. Therefore we believe that any slight > performance gain that the copy-on-write solution might offer is not > worth it. > > > Faster C API > ------------ > > Packages like numpy and standard library modules like decimal need > to frequently query the global state for some local context > configuration. It is important that the APIs that they use is as > fast as possible. > > The proposed ``PyThreadState_SetExecContextItem`` and > ``PyThreadState_GetExecContextItem`` functions need to get the > current thread state with ``PyThreadState_GET()`` (fast) and then > perform a hash lookup (relatively slow). We can eliminate the hash > lookup by adding three additional C API functions: > > * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: > a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` > introduced :pep:`523`. The idea is to request a unique index > that can later be used to lookup context items. > > The ``key_name`` can later be used by ``sys.ExecutionContext`` to > introspect items added with this API. > > * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject *val)`` > and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` > to request an item by its index, avoiding the cost of hash lookup. > > > Why setting a key to None removes the item? > ------------------------------------------- > > Consider a context manager:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > With ``set_execution_context_item(key, None)`` call removing the > ``key``, the user doesn't need to write additional code to remove > the ``key`` if it wasn't in the execution context already. > > An alternative design with ``del_execution_context_item()`` method > would look like the following:: > > @contextmanager > def context(x): > not_there = object() > old_x = get_execution_context_item('x', not_there) > set_execution_context_item('x', x) > try: > yield > finally: > if old_x is not_there: > del_execution_context_item('x') > else: > set_execution_context_item('x', old_x) > > > Can we fix ``PyThreadState_GetDict()``? > --------------------------------------- > > ``PyThreadState_GetDict`` is a TLS, and some of its existing users > might depend on it being just a TLS. Changing its behaviour to follow > the Execution Context semantics would break backwards compatibility. > > > PEP 521 > ------- > > :pep:`521` proposes an alternative solution to the problem: > enhance Context Manager Protocol with two new methods: ``__suspend__`` > and ``__resume__``. To make it compatible with async/await, > the Asynchronous Context Manager Protocol will also need to be > extended with ``__asuspend__`` and ``__aresume__``. > > This allows to implement context managers like decimal context and > ``numpy.errstate`` for generators and coroutines. > > The following code:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > would become this:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __suspend__(self): > set_execution_context_item('x', self.old_x) > > def __resume__(self): > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > Besides complicating the protocol, the implementation will likely > negatively impact performance of coroutines, generators, and any code > that uses context managers, and will notably complicate the > interpreter implementation. It also does not solve the leaking state > problem for greenlet/gevent. > > :pep:`521` also does not provide any mechanism to propagate state > in a local context, like storing a request object in an HTTP request > handler to have better logging. > > > Can Execution Context be implemented outside of CPython? > -------------------------------------------------------- > > Because async/await code needs an event loop to run it, an EC-like > solution can be implemented in a limited way for coroutines. > > Generators, on the other hand, do not have an event loop or > trampoline, making it impossible to intercept their ``yield`` points > outside of the Python interpreter. > > > Reference Implementation > ======================== > > The reference implementation can be found here: [11]_. > > > References > ========== > > .. [1] https://blog.golang.org/context > > .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx > > .. [3] https://github.com/numpy/numpy/issues/9444 > > .. [4] http://bugs.python.org/issue31179 > > .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie > > .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html > > .. [7] https://github.com/1st1/cpython/tree/hamt > > .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf > > .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > .. [11] https://github.com/1st1/cpython/tree/pep550 > > .. [12] https://www.python.org/dev/peps/pep-0492/#async-await > > .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py > > .. [14] https://github.com/MagicStack/pgbench > > .. [15] https://github.com/python/performance > > .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Nathaniel J. Smith -- https://vorpus.org From yselivanov.ml at gmail.com Sat Aug 12 21:27:16 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 21:27:16 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Yes, I considered this idea myself, but ultimately rejected it because: 1. Current solution makes it easy to introspect things. Get the current EC and print it out. Although the context item idea could be extended to `sys.create_context_item('description')` to allow that. 2. What if we want to pickle the EC? If all items in it are pickleable, it's possible to dump the EC, send it over the network, and re-use in some other process. It's not something I want to consider in the PEP right now, but it's something that the current design theoretically allows. AFAIU, `ci = sys.create_context_item()` context item wouldn't be possible to pickle/unpickle correctly, no? Some more comments: On Sat, Aug 12, 2017 at 7:35 PM, Nathaniel Smith wrote: [..] > The advantages are: > - Eliminates the current PEP's issues with namespace collision; every > context item is automatically distinct from all others. TBH I think that the collision issue is slightly exaggerated. > - Eliminates the need for the None-means-del hack. I consider Execution Context to be an API, not a collection. It's an important distinction, If you view it that way, deletion on None is doesn't look that esoteric. > - Lets the interpreter hide the details of garbage collecting context values. I'm not sure I understand how the current PEP design is bad from the GC standpoint. Or how this proposal can be different, FWIW. > - Allows for more implementation flexibility. This could be > implemented directly on top of Yury's current prototype. But it could > also, for example, be implemented by storing the context values in a > flat array, where each context item is assigned an index when it's > allocated. You still want to have this optimization only for *some* keys. So I think a separate API is still needed. Yury From ncoghlan at gmail.com Sat Aug 12 22:09:47 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Aug 2017 12:09:47 +1000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On 13 August 2017 at 03:53, Yury Selivanov wrote: > On Sat, Aug 12, 2017 at 1:09 PM, Nick Coghlan wrote: >> Now that you raise this point, I think it means that generators need >> to retain their current context inheritance behaviour, simply for >> backwards compatibility purposes. This means that the case we need to >> enable is the one where the generator *doesn't* dynamically adjust its >> execution context to match that of the calling function. > > Nobody *intentionally* iterates a generator manually in different > decimal contexts (or any other contexts). This is an extremely error > prone thing to do, because one refactoring of generator -- rearranging > yields -- would wreck your custom iteration/context logic. I don't > think that any real code relies on this, and I don't think that we are > breaking backwards compatibility here in any way. How many users need > about this? I think this is a reasonable stance for the PEP to take, but the hidden execution state around the "isolated or not" behaviour still bothers me. In some ways it reminds me of the way function parameters work: the bound parameters are effectively a *shallow* copy of the passed arguments, so callers can decide whether or not they want the callee to be able to modify them based on the arguments' mutability (or lack thereof). The execution context proposal uses copy-on-write semantics for runtime efficiency, but it's essentially the same shallow copy concept applied to __next__(), send() and throw() operations (and perhaps __anext__(), asend(), and athrow() - I haven't wrapped my head around the implications for async generators and context managers yet). That similarity makes me wonder whether the "isolated or not" behaviour could be moved from the object being executed and directly into the key/value pairs themselves based on whether or not the values were mutable, as that's the way function calls work: if the argument is immutable, the callee *can't* change it, while if it's mutable, the callee can mutate it, but it still can't rebind it to refer to a different object. The way I'd see that working with an always-reverted copy-on-write execution context: 1. If a parent context wants child contexts to be able to make changes, then it should put a *mutable* object in the context (e.g. a list or class instance) 2. If a parent context *does not* want child contexts to be able to make changes, then it should put an *immutable* object in the context (e.g. a tuple or number) 3. If a child context *wants to share a context key with its parent, then it should *mutate* it in place 4. If a child context *does not* want to share a context key with its parent, then it should *rebind* it to a different object That way, instead of reverted-or-not-reverted being an all-or-nothing interpreter level decision, it can be made on a key-by-key basis by choosing whether or not to use a mutable value. To make that a little less abstract, consider a concrete example like setting a "my_web_framework.request" key: 1. The step of *setting* the key will *not* be shared with the parent context, as that modifies the underlying copy-on-write namespace, and will hence be reverted when control is passed back to the parent 2. Any *mutation* of the request object *will* be shared, since mutating the value doesn't have any effect on the copy-on-write namespace Nathaniel's example of wanting stack-like behaviour could be modeled using tuples as values: when the child context appends to the tuple, it will necessarily have to create a new tuple and rebind the corresponding key, causing the changes to be invisible to the parent context. The contextlib.contextmanager use case could then be modeled as a *separate* method that skipped the save/revert context management step (e.g. "send_with_shared_context", "throw_with_shared_context") > If someone does need this, it's possible to flip > `gi_isolated_execution_context` to `False` (as contextmanager does > now) and get this behaviour. This might be needed for frameworks like > Tornado which support coroutines via generators without 'yield from', > but I'll have to verify this. Working through this above, I think the key points that bother me about the stateful revert-or-not setting is that whether or not context reversion is desirable depends mainly on two things: - the specific key in question (indicated by mutable vs immutable values) - the intent of the code in the parent context (which could be indicated by calling different methods) It *doesn't* seem to be an inherent property of a given generator or coroutine, except insofar as there's a correlation between the code that creates generators & coroutines and the code that subsequently invokes them. > Another idea: in one of my initial PEP implementations, I exposed > gen.gi_execution_context (same for coroutines) to python as read/write > attribute. That allowed to > > (a) get the execution context out of generator (for introspection or > other purposes); > > (b) inject execution context for event loops; for instance > asyncio.Task could do that for some purpose. > > Maybe this would be useful for someone who wants to mess with > generators and contexts. Yeah, this would be useful, and could potentially avoid the need to expose a parallel set of "*_with_shared_context" methods - instead, contextlib.contextmanager could just invoke the underlying generator with an isolated context, and then set the parent context to the generator's one if it changed. > [..] >> >> def autonomous_generator(gf): >> @functools.wraps(gf) >> def wrapper(*args, **kwds): >> gi = genfunc(*args, **kwds) >> gi.gi_back = gi.gi_frame >> return gi >> return wrapper > > Nick, I still have to fully grasp the idea of `gi_back`, but one quick > thing: I specifically designed the PEP to avoid touching frames. The > current design only needs TLS and a little help from the > interpreter/core objects adjusting that TLS. It should be very > straightforward to implement the PEP in any interpreter (with JIT or > without) or compilers like Cython. I think you can just ignore that idea for now, as I've convinced myself it's orthogonal to the question of how we handle execution contexts. > [..] >> Given that, you'd have the following initial states for "revert >> context" (currently called "isolated context" in the PEP): >> >> * unawaited coroutines: true (same as PEP) >> * awaited coroutines: false (same as PEP) >> * generators (both sync & async): false (opposite of current PEP) >> * autonomous generators: true (set "gi_revert_context" or >> "ag_revert_context" explicitly) > > If generators do not isolate their context, then the example in the > Rationale section will not work as expected (or am I missing > something?). Fixing generators state leak was one of the main goals of > the PEP. Agreed - see above :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Sat Aug 12 22:15:48 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 12 Aug 2017 19:15:48 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Sat, Aug 12, 2017 at 6:27 PM, Yury Selivanov wrote: > Yes, I considered this idea myself, but ultimately rejected it because: > > 1. Current solution makes it easy to introspect things. Get the > current EC and print it out. Although the context item idea could be > extended to `sys.create_context_item('description')` to allow that. My first draft actually had the description argument :-). But then I deleted it on the grounds that there's also no way to introspect a list of all threading.local objects, and no-one seems to be bothered by that, so why should we bother here. Obviously it'd be trivial to add though, yeah; I don't really care either way. > 2. What if we want to pickle the EC? If all items in it are > pickleable, it's possible to dump the EC, send it over the network, > and re-use in some other process. It's not something I want to > consider in the PEP right now, but it's something that the current > design theoretically allows. AFAIU, `ci = sys.create_context_item()` > context item wouldn't be possible to pickle/unpickle correctly, no? That's true. In this API, supporting pickling would require some kind of opt-in on the part of EC users. But... pickling would actually need to be opt-in anyway. Remember, the set of all EC items is a piece of global shared state; we expect new entries to appear when random 3rd party libraries are imported. So we have no idea what is in there or what it's being used for. Blindly pickling the whole context will lead to bugs (when code unexpectedly ends up with context that wasn't designed to go across processes) and crashes (there's no guarantee that all the objects are even pickleable). If we do decide we want to support this in the future then we could add a generic opt-in mechanism something like: MY_CI = sys.create_context_item(__name__, "MY_CI", pickleable=True) But I'm not sure that it even make sense to have a global flag enabling pickle. Probably it's better to have separate flags to opt-in to different libraries that might want to pickle in different situations for different reasons: pickleable-by-dask, pickleable-by-curio.run_in_process, ... And that's doable without any special interpreter support. E.g. you could have curio.Local(pickle=True) coordinate with curio.run_in_process. > Some more comments: > > On Sat, Aug 12, 2017 at 7:35 PM, Nathaniel Smith wrote: > [..] >> The advantages are: >> - Eliminates the current PEP's issues with namespace collision; every >> context item is automatically distinct from all others. > > TBH I think that the collision issue is slightly exaggerated. > >> - Eliminates the need for the None-means-del hack. > > I consider Execution Context to be an API, not a collection. It's an > important distinction, If you view it that way, deletion on None is > doesn't look that esoteric. Deletion on None is still a special case that API users need to remember, and it's a small footgun that you can't just take an arbitrary Python object and round-trip it through the context. Obviously these are both APIs and they can do anything that makes sense, but all else being equal I prefer APIs that have fewer special cases :-). >> - Lets the interpreter hide the details of garbage collecting context values. > > I'm not sure I understand how the current PEP design is bad from the > GC standpoint. Or how this proposal can be different, FWIW. When the ContextItem object becomes unreachable and is collected, then the interpreter knows that all of the values associated with it in different contexts are also unreachable and can be collected. I mentioned this in my email yesterday -- look at the hoops threading.local jumps through to avoid breaking garbage collection. This is closely related to the previous point, actually -- AFAICT the only reason why it *really* matters that None deletes the item is that you need to be able to delete to free the item from the dictionary, which only matters if you want to dynamically allocate keys and then throw them away again. In the ContextItem approach, there's no need to manually delete the entry, you can just drop your reference to the ContextItem and the the garbage collector take care of it. >> - Allows for more implementation flexibility. This could be >> implemented directly on top of Yury's current prototype. But it could >> also, for example, be implemented by storing the context values in a >> flat array, where each context item is assigned an index when it's >> allocated. > > You still want to have this optimization only for *some* keys. So I > think a separate API is still needed. Wait, why is it a requirement that some keys be slow? That seems like weird requirement :-). -n -- Nathaniel J. Smith -- https://vorpus.org From kevinjacobconway at gmail.com Sat Aug 12 22:26:40 2017 From: kevinjacobconway at gmail.com (Kevin Conway) Date: Sun, 13 Aug 2017 02:26:40 +0000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: As far as providing a thread-local like surrogate for coroutine based systems in Python, we had to solve this for Twisted with https://bitbucket.org/hipchat/txlocal. Because of the way the Twisted threadpooling works we also had to make a context system that was both coroutine and thread safe at the same time. We have a similar setup for asyncio but it seems we haven't open sourced it. I'll ask around for it if this group feels that an asyncio example would be beneficial. We implemented both of these in plain-old Python so they should be compatible beyond CPython. It's been over a year since I was directly involved with either of these projects, but added memory and CPU consumption were stats we watched closely and we found a negligible increase in both as we rolled out async context. On Sat, Aug 12, 2017 at 9:16 PM Nathaniel Smith wrote: > On Sat, Aug 12, 2017 at 6:27 PM, Yury Selivanov > wrote: > > Yes, I considered this idea myself, but ultimately rejected it because: > > > > 1. Current solution makes it easy to introspect things. Get the > > current EC and print it out. Although the context item idea could be > > extended to `sys.create_context_item('description')` to allow that. > > My first draft actually had the description argument :-). But then I > deleted it on the grounds that there's also no way to introspect a > list of all threading.local objects, and no-one seems to be bothered > by that, so why should we bother here. Obviously it'd be trivial to > add though, yeah; I don't really care either way. > > > 2. What if we want to pickle the EC? If all items in it are > > pickleable, it's possible to dump the EC, send it over the network, > > and re-use in some other process. It's not something I want to > > consider in the PEP right now, but it's something that the current > > design theoretically allows. AFAIU, `ci = sys.create_context_item()` > > context item wouldn't be possible to pickle/unpickle correctly, no? > > That's true. In this API, supporting pickling would require some kind > of opt-in on the part of EC users. > > But... pickling would actually need to be opt-in anyway. Remember, the > set of all EC items is a piece of global shared state; we expect new > entries to appear when random 3rd party libraries are imported. So we > have no idea what is in there or what it's being used for. Blindly > pickling the whole context will lead to bugs (when code unexpectedly > ends up with context that wasn't designed to go across processes) and > crashes (there's no guarantee that all the objects are even > pickleable). > > If we do decide we want to support this in the future then we could > add a generic opt-in mechanism something like: > > MY_CI = sys.create_context_item(__name__, "MY_CI", pickleable=True) > > But I'm not sure that it even make sense to have a global flag > enabling pickle. Probably it's better to have separate flags to opt-in > to different libraries that might want to pickle in different > situations for different reasons: pickleable-by-dask, > pickleable-by-curio.run_in_process, ... And that's doable without any > special interpreter support. E.g. you could have > curio.Local(pickle=True) coordinate with curio.run_in_process. > > > Some more comments: > > > > On Sat, Aug 12, 2017 at 7:35 PM, Nathaniel Smith wrote: > > [..] > >> The advantages are: > >> - Eliminates the current PEP's issues with namespace collision; every > >> context item is automatically distinct from all others. > > > > TBH I think that the collision issue is slightly exaggerated. > > > >> - Eliminates the need for the None-means-del hack. > > > > I consider Execution Context to be an API, not a collection. It's an > > important distinction, If you view it that way, deletion on None is > > doesn't look that esoteric. > > Deletion on None is still a special case that API users need to > remember, and it's a small footgun that you can't just take an > arbitrary Python object and round-trip it through the context. > Obviously these are both APIs and they can do anything that makes > sense, but all else being equal I prefer APIs that have fewer special > cases :-). > > >> - Lets the interpreter hide the details of garbage collecting context > values. > > > > I'm not sure I understand how the current PEP design is bad from the > > GC standpoint. Or how this proposal can be different, FWIW. > > When the ContextItem object becomes unreachable and is collected, then > the interpreter knows that all of the values associated with it in > different contexts are also unreachable and can be collected. > > I mentioned this in my email yesterday -- look at the hoops > threading.local jumps through to avoid breaking garbage collection. > > This is closely related to the previous point, actually -- AFAICT the > only reason why it *really* matters that None deletes the item is that > you need to be able to delete to free the item from the dictionary, > which only matters if you want to dynamically allocate keys and then > throw them away again. In the ContextItem approach, there's no need to > manually delete the entry, you can just drop your reference to the > ContextItem and the the garbage collector take care of it. > > >> - Allows for more implementation flexibility. This could be > >> implemented directly on top of Yury's current prototype. But it could > >> also, for example, be implemented by storing the context values in a > >> flat array, where each context item is assigned an index when it's > >> allocated. > > > > You still want to have this optimization only for *some* keys. So I > > think a separate API is still needed. > > Wait, why is it a requirement that some keys be slow? That seems like > weird requirement :-). > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Aug 12 22:56:20 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Aug 2017 12:56:20 +1000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On 13 August 2017 at 11:27, Yury Selivanov wrote: > Yes, I considered this idea myself, but ultimately rejected it because: > > 1. Current solution makes it easy to introspect things. Get the > current EC and print it out. Although the context item idea could be > extended to `sys.create_context_item('description')` to allow that. I think the TLS/TSS precedent means we should seriously consider the ContextItem + ContextStateToken approach for the core low level API. We also have a long history of pain and quirks arising from the locals() builtin being defined as returning a mapping even though function locals are managed as a linear array, so if we can avoid that for the execution context, it will likely be beneficial for both end users (due to less quirky runtime behaviour, especially across implementations) and language implementation developers (due to a reduced need to make something behave like an ordinary mapping when it really isn't). If we decide we want a separate context introspection API (akin to inspect.getcouroutinelocals() and inspect.getgeneratorlocals()), then an otherwise opaque ContextStateToken would be sufficient to enable that. Even if we don't need it for any other reason, having such an API available would be desirable for the regression test suite. For example, if context items are hashable, we could have the following arrangement: # Create new context items sys.create_context_item(name) # Opaque token for the current execution context sys.get_context_token() # Switch the current execution context to the given one sys.set_context(context_token) # Snapshot mapping context items to their values in given context sys.get_context_items(context_token) As Nathaniel suggestion, getting/setting/deleting individual items in the current context would be implemented as methods on the ContextItem objects, allowing the return value of "get_context_items" to be a plain dictionary, rather than a special type that directly supported updates to the underlying context. > 2. What if we want to pickle the EC? If all items in it are > pickleable, it's possible to dump the EC, send it over the network, > and re-use in some other process. It's not something I want to > consider in the PEP right now, but it's something that the current > design theoretically allows. AFAIU, `ci = sys.create_context_item()` > context item wouldn't be possible to pickle/unpickle correctly, no? As Nathaniel notes, cooperative partial pickling will be possible regardless of how the low level API works, and starting with a simpler low level API still doesn't rule out adding features like this at a later date. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Sat Aug 12 23:17:07 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 12 Aug 2017 23:17:07 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: [replying to list] On Sat, Aug 12, 2017 at 10:56 PM, Nick Coghlan wrote: > On 13 August 2017 at 11:27, Yury Selivanov wrote: >> Yes, I considered this idea myself, but ultimately rejected it because: >> >> 1. Current solution makes it easy to introspect things. Get the >> current EC and print it out. Although the context item idea could be >> extended to `sys.create_context_item('description')` to allow that. > > I think the TLS/TSS precedent means we should seriously consider the > ContextItem + ContextStateToken approach for the core low level API. I actually like the idea and am fully open to it. I'm also curious if it's possible to adapt the flat-array/fast access ideas that Nathaniel mentioned. Yury From ncoghlan at gmail.com Sun Aug 13 00:05:06 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Aug 2017 14:05:06 +1000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On 13 August 2017 at 12:15, Nathaniel Smith wrote: > On Sat, Aug 12, 2017 at 6:27 PM, Yury Selivanov wrote: >> Yes, I considered this idea myself, but ultimately rejected it because: >> >> 1. Current solution makes it easy to introspect things. Get the >> current EC and print it out. Although the context item idea could be >> extended to `sys.create_context_item('description')` to allow that. > > My first draft actually had the description argument :-). But then I > deleted it on the grounds that there's also no way to introspect a > list of all threading.local objects, and no-one seems to be bothered > by that, so why should we bother here. In the TLS/TSS case, we have the design constraint of wanting to use the platform provided TLS/TSS implementation when available, and standard C APIs generally aren't designed to support rich runtime introspection from regular C code - instead, they expect the debugger, compiler, and standard library to be co-developed such that the debugger knows how to figure out where the latter two have put things at runtime. > Obviously it'd be trivial to > add though, yeah; I don't really care either way. As noted in my other email, I like the idea of making the context dependent state introspection API clearly distinct from the core context dependent state management API. That way the API implementation can focus on using the most efficient data structures for the purpose, rather than being limited to the most efficient data structures that can readily export a Python-style mapping interface. The latter can then be provided purely for introspection purposes. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Sun Aug 13 00:14:38 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 12 Aug 2017 21:14:38 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Sat, Aug 12, 2017 at 9:05 PM, Nick Coghlan wrote: > On 13 August 2017 at 12:15, Nathaniel Smith wrote: >> On Sat, Aug 12, 2017 at 6:27 PM, Yury Selivanov wrote: >>> Yes, I considered this idea myself, but ultimately rejected it because: >>> >>> 1. Current solution makes it easy to introspect things. Get the >>> current EC and print it out. Although the context item idea could be >>> extended to `sys.create_context_item('description')` to allow that. >> >> My first draft actually had the description argument :-). But then I >> deleted it on the grounds that there's also no way to introspect a >> list of all threading.local objects, and no-one seems to be bothered >> by that, so why should we bother here. > > In the TLS/TSS case, we have the design constraint of wanting to use > the platform provided TLS/TSS implementation when available, and > standard C APIs generally aren't designed to support rich runtime > introspection from regular C code - instead, they expect the debugger, > compiler, and standard library to be co-developed such that the > debugger knows how to figure out where the latter two have put things > at runtime. Excellent point. >> Obviously it'd be trivial to >> add though, yeah; I don't really care either way. > > As noted in my other email, I like the idea of making the context > dependent state introspection API clearly distinct from the core > context dependent state management API. > > That way the API implementation can focus on using the most efficient > data structures for the purpose, rather than being limited to the most > efficient data structures that can readily export a Python-style > mapping interface. The latter can then be provided purely for > introspection purposes. Also an excellent point :-). -n -- Nathaniel J. Smith -- https://vorpus.org From yselivanov.ml at gmail.com Sun Aug 13 02:15:27 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 02:15:27 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Sat, Aug 12, 2017 at 10:56 PM, Nick Coghlan wrote: [..] > As Nathaniel suggestion, getting/setting/deleting individual items in > the current context would be implemented as methods on the ContextItem > objects, allowing the return value of "get_context_items" to be a > plain dictionary, rather than a special type that directly supported > updates to the underlying context. The current PEP 550 design returns a "snapshot" of the current EC with sys.get_execution_context(). I.e. if you do ec = sys.get_execution_context() ec['a'] = 'b' # sys.get_execution_context_item('a') will return None You did get a snapshot and you modified it -- but your modifications are not visible anywhere. You can run a function in that modified EC with `ec.run(function)` and that function will see that new 'a' key, but that's it. There's no "magical" updates to the underlying context. Yury From yselivanov.ml at gmail.com Sun Aug 13 02:16:09 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 02:16:09 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Sat, Aug 12, 2017 at 10:12 AM, Nick Coghlan wrote: [..] > > 1. Are you sure you want to expose the CoW type to pure Python code? Ultimately, why not? The execution context object you get with sys.get_execution_context() is yours to change. Any change to it won't be propagated anywhere, unless you execute something in that context with ExecutionContext.run or set it as a current one. > > The draft API looks fairly error prone to me, as I'm not sure of the > intended differences in behaviour between the following: > > @contextmanager > def context(x): > old_x = sys.get_execution_context_item('x') > sys.set_execution_context_item('x', x) > try: > yield > finally: > sys.set_execution_context_item('x', old_x) > > @contextmanager > def context(x): > old_x = sys.get_execution_context().get('x') > sys.get_execution_context()['x'] = x > try: > yield > finally: > sys.get_execution_context()['x'] = old_x This one (the second example) won't do anything. > > @contextmanager > def context(x): > ec = sys.get_execution_context() > old_x = ec.get('x') > ec['x'] = x > try: > yield > finally: > ec['x'] = old_x This one (the third one) won't do anything either. You can do this: ec = sys.get_execution_context() ec['x'] = x ec.run(my_function) or `sys.set_execution_context(ec)` > > It seems to me that everything would be a lot safer if the *only* > Python level API was a live dynamic view that completely hid the > copy-on-write behaviour behind an "ExecutionContextProxy" type, such > that the last two examples were functionally equivalent to each other > and to the current PEP's get/set functions (rendering the latter > redundant, and allowing it to be dropped from the PEP). So there's no copy-on-write exposed to Python actually. What I am thinking about, though, is that we might not need the sys.set_execution_context() function. If you want to run something with a modified or empty execution context, do it through ExecutionContext.run method. > 2. Do we need an ag_isolated_execution_context for asynchronous > generators? (Modify this question as needed for the answer to the next > question) Yes, we'll need it for contextlib.asynccontextmanager at least. > > 3. It bothers me that *_execution_context points to an actual > execution context, while *_isolated_execution_context is a boolean. > With names that similar I'd expect them to point to the same kind of > object. I think we touched upon this in a parallel thread. But I think we can rename "gi_isolated_execution_context" to "gi_execution_context_isolated" or something more readable/obvious. Yury From jonathan at slenders.be Sun Aug 13 05:58:03 2017 From: jonathan at slenders.be (Jonathan Slenders) Date: Sun, 13 Aug 2017 11:58:03 +0200 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: For what it's worth, as part of prompt_toolkit 2.0, I implemented something very similar to Nathaniel's idea some time ago. It works pretty well, but I don't have a strong opinion against an alternative implementation. - The active context is stored as a monotonically increasing integer. - For each local, the actual values are stored in a dictionary that maps the context ID to the value. (Could cause a GC issue - I'm not sure.) - Every time when an executor is started, I have to wrap the callable in a context manager that applies the current context to that thread. - When a new 'Future' is created, I grab the context ID and apply it to the callbacks when the result is set. https://github.com/jonathanslenders/python-prompt-toolkit/blob/5c9ceb42ad9422a3c6a218a939843bdd2cc76f16/prompt_toolkit/eventloop/context.py https://github.com/jonathanslenders/python-prompt-toolkit/blob/5c9ceb42ad9422a3c6a218a939843bdd2cc76f16/prompt_toolkit/eventloop/future.py FYI: In my case, I did not want to pass the currently active "Application" object around all of the code. But when I started supporting telnet, multiple applications could be alive at once, each with a different I/O backend. Therefore the active application needed to be stored in a kind of executing context. When PEP550 gets approved I'll probably make this compatible. It should at least be possible to run prompt_toolkit on the asyncio event loop. Jonathan 2017-08-13 1:35 GMT+02:00 Nathaniel Smith : > I had an idea for an alternative API that exposes the same > functionality/semantics as the current draft, but that might have some > advantages. It would look like: > > # a "context item" is an object that holds a context-sensitive value > # each call to create_context_item creates a new one > ci = sys.create_context_item() > > # Set the value of this item in the current context > ci.set(value) > > # Get the value of this item in the current context > value = ci.get() > value = ci.get(default) > > # To support async libraries, we need some way to capture the whole context > # But an opaque token representing "all context item values" is enough > state_token = sys.current_context_state_token() > sys.set_context_state_token(state_token) > coro.cr_state_token = state_token > # etc. > > The advantages are: > - Eliminates the current PEP's issues with namespace collision; every > context item is automatically distinct from all others. > - Eliminates the need for the None-means-del hack. > - Lets the interpreter hide the details of garbage collecting context > values. > - Allows for more implementation flexibility. This could be > implemented directly on top of Yury's current prototype. But it could > also, for example, be implemented by storing the context values in a > flat array, where each context item is assigned an index when it's > allocated. In the current draft this is suggested as a possible > extension for particularly performance-sensitive users, but this way > we'd have the option of making everything fast without changing or > extending the API. > > As precedent, this is basically the API that low-level thread-local > storage implementations use; see e.g. pthread_key_create, > pthread_getspecific, pthread_setspecific. (And the > allocate-an-index-in-a-table is the implementation that fast > thread-local storage implementations use too.) > > -n > > On Fri, Aug 11, 2017 at 3:37 PM, Yury Selivanov > wrote: > > Hi, > > > > This is a new PEP to implement Execution Contexts in Python. > > > > The PEP is in-flight to python.org, and in the meanwhile can > > be read on GitHub: > > > > https://github.com/python/peps/blob/master/pep-0550.rst > > > > (it contains a few diagrams and charts, so please read it there.) > > > > Thank you! > > Yury > > > > > > PEP: 550 > > Title: Execution Context > > Version: $Revision$ > > Last-Modified: $Date$ > > Author: Yury Selivanov > > Status: Draft > > Type: Standards Track > > Content-Type: text/x-rst > > Created: 11-Aug-2017 > > Python-Version: 3.7 > > Post-History: 11-Aug-2017 > > > > > > Abstract > > ======== > > > > This PEP proposes a new mechanism to manage execution state--the > > logical environment in which a function, a thread, a generator, > > or a coroutine executes in. > > > > A few examples of where having a reliable state storage is required: > > > > * Context managers like decimal contexts, ``numpy.errstate``, > > and ``warnings.catch_warnings``; > > > > * Storing request-related data such as security tokens and request > > data in web applications; > > > > * Profiling, tracing, and logging in complex and large code bases. > > > > The usual solution for storing state is to use a Thread-local Storage > > (TLS), implemented in the standard library as ``threading.local()``. > > Unfortunately, TLS does not work for isolating state of generators or > > asynchronous code because such code shares a single thread. > > > > > > Rationale > > ========= > > > > Traditionally a Thread-local Storage (TLS) is used for storing the > > state. However, the major flaw of using the TLS is that it works only > > for multi-threaded code. It is not possible to reliably contain the > > state within a generator or a coroutine. For example, consider > > the following generator:: > > > > def calculate(precision, ...): > > with decimal.localcontext() as ctx: > > # Set the precision for decimal calculations > > # inside this block > > ctx.prec = precision > > > > yield calculate_something() > > yield calculate_something_else() > > > > Decimal context is using a TLS to store the state, and because TLS is > > not aware of generators, the state can leak. The above code will > > not work correctly, if a user iterates over the ``calculate()`` > > generator with different precisions in parallel:: > > > > g1 = calculate(100) > > g2 = calculate(50) > > > > items = list(zip(g1, g2)) > > > > # items[0] will be a tuple of: > > # first value from g1 calculated with 100 precision, > > # first value from g2 calculated with 50 precision. > > # > > # items[1] will be a tuple of: > > # second value from g1 calculated with 50 precision, > > # second value from g2 calculated with 50 precision. > > > > An even scarier example would be using decimals to represent money > > in an async/await application: decimal calculations can suddenly > > lose precision in the middle of processing a request. Currently, > > bugs like this are extremely hard to find and fix. > > > > Another common need for web applications is to have access to the > > current request object, or security context, or, simply, the request > > URL for logging or submitting performance tracing data:: > > > > async def handle_http_request(request): > > context.current_http_request = request > > > > await ... > > # Invoke your framework code, render templates, > > # make DB queries, etc, and use the global > > # 'current_http_request' in that code. > > > > # This isn't currently possible to do reliably > > # in asyncio out of the box. > > > > These examples are just a few out of many, where a reliable way to > > store context data is absolutely needed. > > > > The inability to use TLS for asynchronous code has lead to > > proliferation of ad-hoc solutions, limited to be supported only by > > code that was explicitly enabled to work with them. > > > > Current status quo is that any library, including the standard > > library, that uses a TLS, will likely not work as expected in > > asynchronous code or with generators (see [3]_ as an example issue.) > > > > Some languages that have coroutines or generators recommend to > > manually pass a ``context`` object to every function, see [1]_ > > describing the pattern for Go. This approach, however, has limited > > use for Python, where we have a huge ecosystem that was built to work > > with a TLS-like context. Moreover, passing the context explicitly > > does not work at all for libraries like ``decimal`` or ``numpy``, > > which use operator overloading. > > > > .NET runtime, which has support for async/await, has a generic > > solution of this problem, called ``ExecutionContext`` (see [2]_). > > On the surface, working with it is very similar to working with a TLS, > > but the former explicitly supports asynchronous code. > > > > > > Goals > > ===== > > > > The goal of this PEP is to provide a more reliable alternative to > > ``threading.local()``. It should be explicitly designed to work with > > Python execution model, equally supporting threads, generators, and > > coroutines. > > > > An acceptable solution for Python should meet the following > > requirements: > > > > * Transparent support for code executing in threads, coroutines, > > and generators with an easy to use API. > > > > * Negligible impact on the performance of the existing code or the > > code that will be using the new mechanism. > > > > * Fast C API for packages like ``decimal`` and ``numpy``. > > > > Explicit is still better than implicit, hence the new APIs should only > > be used when there is no option to pass the state explicitly. > > > > With this PEP implemented, it should be possible to update a context > > manager like the below:: > > > > _local = threading.local() > > > > @contextmanager > > def context(x): > > old_x = getattr(_local, 'x', None) > > _local.x = x > > try: > > yield > > finally: > > _local.x = old_x > > > > to a more robust version that can be reliably used in generators > > and async/await code, with a simple transformation:: > > > > @contextmanager > > def context(x): > > old_x = get_execution_context_item('x') > > set_execution_context_item('x', x) > > try: > > yield > > finally: > > set_execution_context_item('x', old_x) > > > > > > Specification > > ============= > > > > This proposal introduces a new concept called Execution Context (EC), > > along with a set of Python APIs and C APIs to interact with it. > > > > EC is implemented using an immutable mapping. Every modification > > of the mapping produces a new copy of it. To illustrate what it > > means let's compare it to how we work with tuples in Python:: > > > > a0 = () > > a1 = a0 + (1,) > > a2 = a1 + (2,) > > > > # a0 is an empty tuple > > # a1 is (1,) > > # a2 is (1, 2) > > > > Manipulating an EC object would be similar:: > > > > a0 = EC() > > a1 = a0.set('foo', 'bar') > > a2 = a1.set('spam', 'ham') > > > > # a0 is an empty mapping > > # a1 is {'foo': 'bar'} > > # a2 is {'foo': 'bar', 'spam': 'ham'} > > > > In CPython, every thread that can execute Python code has a > > corresponding ``PyThreadState`` object. It encapsulates important > > runtime information like a pointer to the current frame, and is > > being used by the ceval loop extensively. We add a new field to > > ``PyThreadState``, called ``exec_context``, which points to the > > current EC object. > > > > We also introduce a set of APIs to work with Execution Context. > > In this section we will only cover two functions that are needed to > > explain how Execution Context works. See the full list of new APIs > > in the `New APIs`_ section. > > > > * ``sys.get_execution_context_item(key, default=None)``: lookup > > ``key`` in the EC of the executing thread. If not found, > > return ``default``. > > > > * ``sys.set_execution_context_item(key, value)``: get the > > current EC of the executing thread. Add a ``key``/``value`` > > item to it, which will produce a new EC object. Set the > > new object as the current one for the executing thread. > > In pseudo-code:: > > > > tstate = PyThreadState_GET() > > ec = tstate.exec_context > > ec2 = ec.set(key, value) > > tstate.exec_context = ec2 > > > > Note, that some important implementation details and optimizations > > are omitted here, and will be covered in later sections of this PEP. > > > > Now let's see how Execution Contexts work with regular multi-threaded > > code, generators, and coroutines. > > > > > > Regular & Multithreaded Code > > ---------------------------- > > > > For regular Python code, EC behaves just like a thread-local. Any > > modification of the EC object produces a new one, which is immediately > > set as the current one for the thread state. > > > > .. figure:: pep-0550/functions.png > > :align: center > > :width: 90% > > > > Figure 1. Execution Context flow in a thread. > > > > As Figure 1 illustrates, if a function calls > > ``set_execution_context_item()``, the modification of the execution > > context will be visible to all subsequent calls and to the caller:: > > > > def set_foo(): > > set_execution_context_item('foo', 'spam') > > > > set_execution_context_item('foo', 'bar') > > print(get_execution_context_item('foo')) > > > > set_foo() > > print(get_execution_context_item('foo')) > > > > # will print: > > # bar > > # spam > > > > > > Coroutines > > ---------- > > > > Python :pep:`492` coroutines are used to implement cooperative > > multitasking. For a Python end-user they are similar to threads, > > especially when it comes to sharing resources or modifying > > the global state. > > > > An event loop is needed to schedule coroutines. Coroutines that > > are explicitly scheduled by the user are usually called Tasks. > > When a coroutine is scheduled, it can schedule other coroutines using > > an ``await`` expression. In async/await world, awaiting a coroutine > > can be viewed as a different calling convention: Tasks are similar to > > threads, and awaiting on coroutines within a Task is similar to > > calling functions within a thread. > > > > By drawing a parallel between regular multithreaded code and > > async/await, it becomes apparent that any modification of the > > execution context within one Task should be visible to all coroutines > > scheduled within it. Any execution context modifications, however, > > must not be visible to other Tasks executing within the same thread. > > > > To achieve this, a small set of modifications to the coroutine object > > is needed: > > > > * When a coroutine object is instantiated, it saves a reference to > > the current execution context object to its ``cr_execution_context`` > > attribute. > > > > * Coroutine's ``.send()`` and ``.throw()`` methods are modified as > > follows (in pseudo-C):: > > > > if coro->cr_isolated_execution_context: > > # Save a reference to the current execution context > > old_context = tstate->execution_context > > > > # Set our saved execution context as the current > > # for the current thread. > > tstate->execution_context = coro->cr_execution_context > > > > try: > > # Perform the actual `Coroutine.send()` or > > # `Coroutine.throw()` call. > > return coro->send(...) > > finally: > > # Save a reference to the updated execution_context. > > # We will need it later, when `.send()` or `.throw()` > > # are called again. > > coro->cr_execution_context = tstate->execution_context > > > > # Restore thread's execution context to what it was before > > # invoking this coroutine. > > tstate->execution_context = old_context > > else: > > # Perform the actual `Coroutine.send()` or > > # `Coroutine.throw()` call. > > return coro->send(...) > > > > * ``cr_isolated_execution_context`` is a new attribute on coroutine > > objects. Set to ``True`` by default, it makes any execution context > > modifications performed by coroutine to stay visible only to that > > coroutine. > > > > When Python interpreter sees an ``await`` instruction, it flips > > ``cr_isolated_execution_context`` to ``False`` for the coroutine > > that is about to be awaited. This makes any changes to execution > > context made by nested coroutine calls within a Task to be visible > > throughout the Task. > > > > Because the top-level coroutine (Task) cannot be scheduled with > > ``await`` (in asyncio you need to call ``loop.create_task()`` or > > ``asyncio.ensure_future()`` to schedule a Task), all execution > > context modifications are guaranteed to stay within the Task. > > > > * We always work with ``tstate->exec_context``. We use > > ``coro->cr_execution_context`` only to store coroutine's execution > > context when it is not executing. > > > > Figure 2 below illustrates how execution context mutations work with > > coroutines. > > > > .. figure:: pep-0550/coroutines.png > > :align: center > > :width: 90% > > > > Figure 2. Execution Context flow in coroutines. > > > > In the above diagram: > > > > * When "coro1" is created, it saves a reference to the current > > execution context "2". > > > > * If it makes any change to the context, it will have its own > > execution context branch "2.1". > > > > * When it awaits on "coro2", any subsequent changes it does to > > the execution context are visible to "coro1", but not outside > > of it. > > > > In code:: > > > > async def inner_foo(): > > print('inner_foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 2) > > > > async def foo(): > > print('foo:', get_execution_context_item('key')) > > > > set_execution_context_item('key', 1) > > await inner_foo() > > > > print('foo:', get_execution_context_item('key')) > > > > > > set_execution_context_item('key', 'spam') > > print('main:', get_execution_context_item('key')) > > > > asyncio.get_event_loop().run_until_complete(foo()) > > > > print('main:', get_execution_context_item('key')) > > > > which will output:: > > > > main: spam > > foo: spam > > inner_foo: 1 > > foo: 2 > > main: spam > > > > Generator-based coroutines (generators decorated with > > ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as > > native coroutines with regards to execution context management: > > their ``yield from`` expression is semantically equivalent to > > ``await``. > > > > > > Generators > > ---------- > > > > Generators in Python, while similar to Coroutines, are used in a > > fundamentally different way. They are producers of data, and > > they use ``yield`` expression to suspend/resume their execution. > > > > A crucial difference between ``await coro`` and ``yield value`` is > > that the former expression guarantees that the ``coro`` will be > > executed to the end, while the latter is producing ``value`` and > > suspending the generator until it gets iterated again. > > > > Generators share 99% of their implementation with coroutines, and > > thus have similar new attributes ``gi_execution_context`` and > > ``gi_isolated_execution_context``. Similar to coroutines, generators > > save a reference to the current execution context when they are > > instantiated. The have the same implementation of ``.send()`` and > > ``.throw()`` methods. > > > > The only difference is that > > ``gi_isolated_execution_context`` is always set to ``True``, and > > is never modified by the interpreter. ``yield from o`` expression in > > regular generators that are not decorated with ``types.coroutine``, > > is semantically equivalent to ``for v in o: yield v``. > > > > .. figure:: pep-0550/generators.png > > :align: center > > :width: 90% > > > > Figure 3. Execution Context flow in a generator. > > > > In the above diagram: > > > > * When "gen1" is created, it saves a reference to the current > > execution context "2". > > > > * If it makes any change to the context, it will have its own > > execution context branch "2.1". > > > > * When "gen2" is created, it saves a reference to the current > > execution context for it -- "2.1". > > > > * Any subsequent execution context updated in "gen2" will only > > be visible to "gen2". > > > > * Likewise, any context changes that "gen1" will do after it > > created "gen2" will not be visible to "gen2". > > > > In code:: > > > > def inner_foo(): > > for i in range(3): > > print('inner_foo:', get_execution_context_item('key')) > > set_execution_context_item('key', i) > > yield i > > > > > > def foo(): > > set_execution_context_item('key', 'spam') > > print('foo:', get_execution_context_item('key')) > > > > inner = inner_foo() > > > > while True: > > val = next(inner, None) > > if val is None: > > break > > yield val > > print('foo:', get_execution_context_item('key')) > > > > set_execution_context_item('key', 'spam') > > print('main:', get_execution_context_item('key')) > > > > list(foo()) > > > > print('main:', get_execution_context_item('key')) > > > > which will output:: > > > > main: ham > > foo: spam > > inner_foo: spam > > foo: spam > > inner_foo: 0 > > foo: spam > > inner_foo: 1 > > foo: spam > > main: ham > > > > As we see, any modification of the execution context in a generator > > is visible only to the generator itself. > > > > There is one use-case where it is desired for generators to affect > > the surrounding execution context: ``contextlib.contextmanager`` > > decorator. To make the following work:: > > > > @contextmanager > > def context(x): > > old_x = get_execution_context_item('x') > > set_execution_context_item('x', x) > > try: > > yield > > finally: > > set_execution_context_item('x', old_x) > > > > we modified ``contextmanager`` to flip > > ``gi_isolated_execution_context`` flag to ``False`` on its generator. > > > > > > Greenlets > > --------- > > > > Greenlet is an alternative implementation of cooperative > > scheduling for Python. Although greenlet package is not part of > > CPython, popular frameworks like gevent rely on it, and it is > > important that greenlet can be modified to support execution > > contexts. > > > > In a nutshell, greenlet design is very similar to design of > > generators. The main difference is that for generators, the stack > > is managed by the Python interpreter. Greenlet works outside of the > > Python interpreter, and manually saves some ``PyThreadState`` > > fields and pushes/pops the C-stack. Since Execution Context is > > implemented on top of ``PyThreadState``, it's easy to add > > transparent support of it to greenlet. > > > > > > New APIs > > ======== > > > > Even though this PEP adds a number of new APIs, please keep in mind, > > that most Python users will likely ever use only two of them: > > ``sys.get_execution_context_item()`` and > > ``sys.set_execution_context_item()``. > > > > > > Python > > ------ > > > > 1. ``sys.get_execution_context_item(key, default=None)``: lookup > > ``key`` for the current Execution Context. If not found, > > return ``default``. > > > > 2. ``sys.set_execution_context_item(key, value)``: set > > ``key``/``value`` item for the current Execution Context. > > If ``value`` is ``None``, the item will be removed. > > > > 3. ``sys.get_execution_context()``: return the current Execution > > Context object: ``sys.ExecutionContext``. > > > > 4. ``sys.set_execution_context(ec)``: set the passed > > ``sys.ExecutionContext`` instance as a current one for the current > > thread. > > > > 5. ``sys.ExecutionContext`` object. > > > > Implementation detail: ``sys.ExecutionContext`` wraps a low-level > > ``PyExecContextData`` object. ``sys.ExecutionContext`` has a > > mutable mapping API, abstracting away the real immutable > > ``PyExecContextData``. > > > > * ``ExecutionContext()``: construct a new, empty, execution > > context. > > > > * ``ec.run(func, *args)`` method: run ``func(*args)`` in the > > ``ec`` execution context. > > > > * ``ec[key]``: lookup ``key`` in ``ec`` context. > > > > * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. > > > > * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and > > ``ec.copy()`` are similar to that of ``dict`` object. > > > > > > C API > > ----- > > > > C API is different from the Python one because it operates directly > > on the low-level immutable ``PyExecContextData`` object. > > > > 1. New ``PyThreadState->exec_context`` field, pointing to a > > ``PyExecContextData`` object. > > > > 2. ``PyThreadState_SetExecContextItem`` and > > ``PyThreadState_GetExecContextItem`` similar to > > ``sys.set_execution_context_item()`` and > > ``sys.get_execution_context_item()``. > > > > 3. ``PyThreadState_GetExecContext``: similar to > > ``sys.get_execution_context()``. Always returns an > > ``PyExecContextData`` object. If ``PyThreadState->exec_context`` > > is ``NULL`` an new and empty one will be created and assigned > > to ``PyThreadState->exec_context``. > > > > 4. ``PyThreadState_SetExecContext``: similar to > > ``sys.set_execution_context()``. > > > > 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` > > object. > > > > 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. > > > > The exact layout ``PyExecContextData`` is private, which allows > > to switch it to a different implementation later. More on that > > in the `Implementation Details`_ section. > > > > > > Modifications in Standard Library > > ================================= > > > > * ``contextlib.contextmanager`` was updated to flip the new > > ``gi_isolated_execution_context`` attribute on the generator. > > > > * ``asyncio.events.Handle`` object now captures the current > > execution context when it is created, and uses the saved > > execution context to run the callback (with > > ``ExecutionContext.run()`` method.) This makes > > ``loop.call_soon()`` to run callbacks in the execution context > > they were scheduled. > > > > No modifications in ``asyncio.Task`` or ``asyncio.Future`` were > > necessary. > > > > Some standard library modules like ``warnings`` and ``decimal`` > > can be updated to use new execution contexts. This will be considered > > in separate issues if this PEP is accepted. > > > > > > Backwards Compatibility > > ======================= > > > > This proposal preserves 100% backwards compatibility. > > > > > > Performance > > =========== > > > > Implementation Details > > ---------------------- > > > > The new ``PyExecContextData`` object is wrapping a ``dict`` object. > > Any modification requires creating a shallow copy of the dict. > > > > While working on the reference implementation of this PEP, we were > > able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for > > details. > > > > .. figure:: pep-0550/dict_copy.png > > :align: center > > :width: 100% > > > > Figure 4. > > > > Figure 4 shows that the performance of immutable dict implemented > > with shallow copying is expectedly O(n) for the ``set()`` operation. > > However, this is tolerable until dict has more than 100 items > > (1 ``set()`` takes about a microsecond.) > > > > Judging by the number of modules that need EC in Standard Library > > it is likely that real world Python applications will use > > significantly less than 100 execution context variables. > > > > The important point is that the cost of accessing a key in > > Execution Context is always O(1). > > > > If the ``set()`` operation performance is a major concern, we discuss > > alternative approaches that have O(1) or close ``set()`` performance > > in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and > > `Copy-on-write Execution Context`_ sections. > > > > > > Generators and Coroutines > > ------------------------- > > > > Using a microbenchmark for generators and coroutines from :pep:`492` > > ([12]_), it was possible to observe 0.5 to 1% performance degradation. > > > > asyncio echoserver microbechmarks from the uvloop project [13]_ > > showed 1-1.5% performance degradation for asyncio code. > > > > asyncpg benchmarks [14]_, that execute more code and are closer to a > > real-world application did not exhibit any noticeable performance > > change. > > > > > > Overall Performance Impact > > -------------------------- > > > > The total number of changed lines in the ceval loop is 2 -- in the > > ``YIELD_FROM`` opcode implementation. Only performance of generators > > and coroutines can be affected by the proposal. > > > > This was confirmed by running Python Performance Benchmark Suite > > [15]_, which demonstrated that there is no difference between > > 3.7 master branch and this PEP reference implementation branch > > (full benchmark results can be found here [16]_.) > > > > > > Design Considerations > > ===================== > > > > Alternative Immutable Dict Implementation > > ----------------------------------------- > > > > Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) > > to implement high performance immutable collections [5]_, [6]_. > > > > Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) > > performance for both ``set()`` and ``get()`` operations, which will > > be essentially O(1) for relatively small mappings in EC. > > > > To assess if HAMT can be used for Execution Context, we implemented > > it in CPython [7]_. > > > > .. figure:: pep-0550/hamt_vs_dict.png > > :align: center > > :width: 100% > > > > Figure 5. Benchmark code can be found here: [9]_. > > > > Figure 5 shows that HAMT indeed displays O(1) performance for all > > benchmarked dictionary sizes. For dictionaries with less than 100 > > items, HAMT is a bit slower than Python dict/shallow copy. > > > > .. figure:: pep-0550/lookup_hamt.png > > :align: center > > :width: 100% > > > > Figure 6. Benchmark code can be found here: [10]_. > > > > Figure 6 below shows comparison of lookup costs between Python dict > > and an HAMT immutable mapping. HAMT lookup time is 30-40% worse > > than Python dict lookups on average, which is a very good result, > > considering how well Python dicts are optimized. > > > > Note, that according to [8]_, HAMT design can be further improved. > > > > The bottom line is that the current approach with implementing > > an immutable mapping with shallow-copying dict will likely perform > > adequately in real-life applications. The HAMT solution is more > > future proof, however. > > > > The proposed API is designed in such a way that the underlying > > implementation of the mapping can be changed completely without > > affecting the Execution Context `Specification`_, which allows > > us to switch to HAMT at some point if necessary. > > > > > > Copy-on-write Execution Context > > ------------------------------- > > > > The implementation of Execution Context in .NET is different from > > this PEP. .NET uses copy-on-write mechanism and a regular mutable > > mapping. > > > > One way to implement this in CPython would be to have two new > > fields in ``PyThreadState``: > > > > * ``exec_context`` pointing to the current Execution Context mapping; > > * ``exec_context_copy_on_write`` flag, set to ``0`` initially. > > > > The idea is that whenever we are modifying the EC, the copy-on-write > > flag is checked, and if it is set to ``1``, the EC is copied. > > > > Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` > > methods described in the `Coroutines`_ section will be almost the > > same, except that in addition to the ``gi_execution_context`` they > > will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine > > or a generator starts, the flag will be set to ``1``. This will > > ensure that any modification of the EC performed within a coroutine > > or a generator will be isolated. > > > > This approach has one advantage: > > > > * For Execution Context that contains a large number of items, > > copy-on-write is a more efficient solution than the shallow-copy > > dict approach. > > > > However, we believe that copy-on-write disadvantages are more > > important to consider: > > > > * Copy-on-write behaviour for generators and coroutines makes > > EC semantics less predictable. > > > > With immutable EC approach, generators and coroutines always > > execute in the EC that was current at the moment of their > > creation. Any modifications to the outer EC while a generator > > or a coroutine is executing are not visible to them:: > > > > def generator(): > > yield 1 > > print(get_execution_context_item('key')) > > yield 2 > > > > set_execution_context_item('key', 'spam') > > gen = iter(generator()) > > next(gen) > > set_execution_context_item('key', 'ham') > > next(gen) > > > > The above script will always print 'spam' with immutable EC. > > > > With a copy-on-write approach, the above script will print 'ham'. > > Now, consider that ``generator()`` was refactored to call some > > library function, that uses Execution Context:: > > > > def generator(): > > yield 1 > > some_function_that_uses_decimal_context() > > print(get_execution_context_item('key')) > > yield 2 > > > > Now, the script will print 'spam', because > > ``some_function_that_uses_decimal_context`` forced the EC to copy, > > and ``set_execution_context_item('key', 'ham')`` line did not > > affect the ``generator()`` code after all. > > > > * Similarly to the previous point, ``sys.ExecutionContext.run()`` > > method will also become less predictable, as > > ``sys.get_execution_context()`` would still return a reference to > > the current mutable EC. > > > > We can't modify ``sys.get_execution_context()`` to return a shallow > > copy of the current EC, because this would seriously harm > > performance of ``asyncio.call_soon()`` and similar places, where > > it is important to propagate the Execution Context. > > > > * Even though copy-on-write requires to shallow copy the execution > > context object less frequently, copying will still take place > > in coroutines and generators. In which case, HAMT approach will > > perform better for medium to large sized execution contexts. > > > > All in all, we believe that the copy-on-write approach introduces > > very subtle corner cases that could lead to bugs that are > > exceptionally hard to discover and fix. > > > > The immutable EC solution in comparison is always predictable and > > easy to reason about. Therefore we believe that any slight > > performance gain that the copy-on-write solution might offer is not > > worth it. > > > > > > Faster C API > > ------------ > > > > Packages like numpy and standard library modules like decimal need > > to frequently query the global state for some local context > > configuration. It is important that the APIs that they use is as > > fast as possible. > > > > The proposed ``PyThreadState_SetExecContextItem`` and > > ``PyThreadState_GetExecContextItem`` functions need to get the > > current thread state with ``PyThreadState_GET()`` (fast) and then > > perform a hash lookup (relatively slow). We can eliminate the hash > > lookup by adding three additional C API functions: > > > > * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: > > a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` > > introduced :pep:`523`. The idea is to request a unique index > > that can later be used to lookup context items. > > > > The ``key_name`` can later be used by ``sys.ExecutionContext`` to > > introspect items added with this API. > > > > * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject > *val)`` > > and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` > > to request an item by its index, avoiding the cost of hash lookup. > > > > > > Why setting a key to None removes the item? > > ------------------------------------------- > > > > Consider a context manager:: > > > > @contextmanager > > def context(x): > > old_x = get_execution_context_item('x') > > set_execution_context_item('x', x) > > try: > > yield > > finally: > > set_execution_context_item('x', old_x) > > > > With ``set_execution_context_item(key, None)`` call removing the > > ``key``, the user doesn't need to write additional code to remove > > the ``key`` if it wasn't in the execution context already. > > > > An alternative design with ``del_execution_context_item()`` method > > would look like the following:: > > > > @contextmanager > > def context(x): > > not_there = object() > > old_x = get_execution_context_item('x', not_there) > > set_execution_context_item('x', x) > > try: > > yield > > finally: > > if old_x is not_there: > > del_execution_context_item('x') > > else: > > set_execution_context_item('x', old_x) > > > > > > Can we fix ``PyThreadState_GetDict()``? > > --------------------------------------- > > > > ``PyThreadState_GetDict`` is a TLS, and some of its existing users > > might depend on it being just a TLS. Changing its behaviour to follow > > the Execution Context semantics would break backwards compatibility. > > > > > > PEP 521 > > ------- > > > > :pep:`521` proposes an alternative solution to the problem: > > enhance Context Manager Protocol with two new methods: ``__suspend__`` > > and ``__resume__``. To make it compatible with async/await, > > the Asynchronous Context Manager Protocol will also need to be > > extended with ``__asuspend__`` and ``__aresume__``. > > > > This allows to implement context managers like decimal context and > > ``numpy.errstate`` for generators and coroutines. > > > > The following code:: > > > > class Context: > > > > def __enter__(self): > > self.old_x = get_execution_context_item('x') > > set_execution_context_item('x', 'something') > > > > def __exit__(self, *err): > > set_execution_context_item('x', self.old_x) > > > > would become this:: > > > > class Context: > > > > def __enter__(self): > > self.old_x = get_execution_context_item('x') > > set_execution_context_item('x', 'something') > > > > def __suspend__(self): > > set_execution_context_item('x', self.old_x) > > > > def __resume__(self): > > set_execution_context_item('x', 'something') > > > > def __exit__(self, *err): > > set_execution_context_item('x', self.old_x) > > > > Besides complicating the protocol, the implementation will likely > > negatively impact performance of coroutines, generators, and any code > > that uses context managers, and will notably complicate the > > interpreter implementation. It also does not solve the leaking state > > problem for greenlet/gevent. > > > > :pep:`521` also does not provide any mechanism to propagate state > > in a local context, like storing a request object in an HTTP request > > handler to have better logging. > > > > > > Can Execution Context be implemented outside of CPython? > > -------------------------------------------------------- > > > > Because async/await code needs an event loop to run it, an EC-like > > solution can be implemented in a limited way for coroutines. > > > > Generators, on the other hand, do not have an event loop or > > trampoline, making it impossible to intercept their ``yield`` points > > outside of the Python interpreter. > > > > > > Reference Implementation > > ======================== > > > > The reference implementation can be found here: [11]_. > > > > > > References > > ========== > > > > .. [1] https://blog.golang.org/context > > > > .. [2] https://msdn.microsoft.com/en-us/library/system.threading. > executioncontext.aspx > > > > .. [3] https://github.com/numpy/numpy/issues/9444 > > > > .. [4] http://bugs.python.org/issue31179 > > > > .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie > > > > .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures- > persistenthashmap-part-ii.html > > > > .. [7] https://github.com/1st1/cpython/tree/hamt > > > > .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf > > > > .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > > > .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > > > .. [11] https://github.com/1st1/cpython/tree/pep550 > > > > .. [12] https://www.python.org/dev/peps/pep-0492/#async-await > > > > .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/ > bench/echoserver.py > > > > .. [14] https://github.com/MagicStack/pgbench > > > > .. [15] https://github.com/python/performance > > > > .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c > > > > > > Copyright > > ========= > > > > This document has been placed in the public domain. > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pfreixes at gmail.com Sun Aug 13 06:18:12 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Sun, 13 Aug 2017 12:18:12 +0200 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Finally got an almost decent internet connection. Seeing the changes related to that PEP I can confirm that the context will be saved twice in any "task switch" in an Asyncio environment. Once made by the run in context function executed by the Handler [1] and immediately after by the send [2] method belonging to the coroutine that belongs to that task. Formally from my understanding, there is no use of the context in the Asyncio layer, at least nowadays. Saving the context at the moment to schedule a Task is, at first sight, useless and might have a performance impact. Don't you think that this edge case that happens a lot might be in somehow optimized? Am I missing something? [1] https://github.com/1st1/cpython/blob/pep550/Lib/asyncio/events.py#L124 [2] https://github.com/1st1/cpython/blob/pep550/Lib/asyncio/tasks.py#L176 On Sat, Aug 12, 2017 at 11:03 PM, Pau Freixes wrote: > Good work Yuri, going for all in one will help to not increase the > diferences btw async and the sync world in Python. > > I do really like the idea of the immutable dicts, it makes easy inherit the > context btw tasks/threads/whatever without put in risk the consistency if > there is further key colisions. > > Ive just take a look at the asyncio modifications. Correct me if Im wrong, > but the handler strategy has a side effect. The work done to save and > restore the context will be done twice in some situations. It would happen > when the callback is in charge of execute a task step, once by the run in > context method and the other one by the coroutine. Is that correct? > > El 12/08/2017 00:38, "Yury Selivanov" escribi?: > > Hi, > > This is a new PEP to implement Execution Contexts in Python. > > The PEP is in-flight to python.org, and in the meanwhile can > be read on GitHub: > > https://github.com/python/peps/blob/master/pep-0550.rst > > (it contains a few diagrams and charts, so please read it there.) > > Thank you! > Yury > > > PEP: 550 > Title: Execution Context > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 11-Aug-2017 > Python-Version: 3.7 > Post-History: 11-Aug-2017 > > > Abstract > ======== > > This PEP proposes a new mechanism to manage execution state--the > logical environment in which a function, a thread, a generator, > or a coroutine executes in. > > A few examples of where having a reliable state storage is required: > > * Context managers like decimal contexts, ``numpy.errstate``, > and ``warnings.catch_warnings``; > > * Storing request-related data such as security tokens and request > data in web applications; > > * Profiling, tracing, and logging in complex and large code bases. > > The usual solution for storing state is to use a Thread-local Storage > (TLS), implemented in the standard library as ``threading.local()``. > Unfortunately, TLS does not work for isolating state of generators or > asynchronous code because such code shares a single thread. > > > Rationale > ========= > > Traditionally a Thread-local Storage (TLS) is used for storing the > state. However, the major flaw of using the TLS is that it works only > for multi-threaded code. It is not possible to reliably contain the > state within a generator or a coroutine. For example, consider > the following generator:: > > def calculate(precision, ...): > with decimal.localcontext() as ctx: > # Set the precision for decimal calculations > # inside this block > ctx.prec = precision > > yield calculate_something() > yield calculate_something_else() > > Decimal context is using a TLS to store the state, and because TLS is > not aware of generators, the state can leak. The above code will > not work correctly, if a user iterates over the ``calculate()`` > generator with different precisions in parallel:: > > g1 = calculate(100) > g2 = calculate(50) > > items = list(zip(g1, g2)) > > # items[0] will be a tuple of: > # first value from g1 calculated with 100 precision, > # first value from g2 calculated with 50 precision. > # > # items[1] will be a tuple of: > # second value from g1 calculated with 50 precision, > # second value from g2 calculated with 50 precision. > > An even scarier example would be using decimals to represent money > in an async/await application: decimal calculations can suddenly > lose precision in the middle of processing a request. Currently, > bugs like this are extremely hard to find and fix. > > Another common need for web applications is to have access to the > current request object, or security context, or, simply, the request > URL for logging or submitting performance tracing data:: > > async def handle_http_request(request): > context.current_http_request = request > > await ... > # Invoke your framework code, render templates, > # make DB queries, etc, and use the global > # 'current_http_request' in that code. > > # This isn't currently possible to do reliably > # in asyncio out of the box. > > These examples are just a few out of many, where a reliable way to > store context data is absolutely needed. > > The inability to use TLS for asynchronous code has lead to > proliferation of ad-hoc solutions, limited to be supported only by > code that was explicitly enabled to work with them. > > Current status quo is that any library, including the standard > library, that uses a TLS, will likely not work as expected in > asynchronous code or with generators (see [3]_ as an example issue.) > > Some languages that have coroutines or generators recommend to > manually pass a ``context`` object to every function, see [1]_ > describing the pattern for Go. This approach, however, has limited > use for Python, where we have a huge ecosystem that was built to work > with a TLS-like context. Moreover, passing the context explicitly > does not work at all for libraries like ``decimal`` or ``numpy``, > which use operator overloading. > > .NET runtime, which has support for async/await, has a generic > solution of this problem, called ``ExecutionContext`` (see [2]_). > On the surface, working with it is very similar to working with a TLS, > but the former explicitly supports asynchronous code. > > > Goals > ===== > > The goal of this PEP is to provide a more reliable alternative to > ``threading.local()``. It should be explicitly designed to work with > Python execution model, equally supporting threads, generators, and > coroutines. > > An acceptable solution for Python should meet the following > requirements: > > * Transparent support for code executing in threads, coroutines, > and generators with an easy to use API. > > * Negligible impact on the performance of the existing code or the > code that will be using the new mechanism. > > * Fast C API for packages like ``decimal`` and ``numpy``. > > Explicit is still better than implicit, hence the new APIs should only > be used when there is no option to pass the state explicitly. > > With this PEP implemented, it should be possible to update a context > manager like the below:: > > _local = threading.local() > > @contextmanager > def context(x): > old_x = getattr(_local, 'x', None) > _local.x = x > try: > yield > finally: > _local.x = old_x > > to a more robust version that can be reliably used in generators > and async/await code, with a simple transformation:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > > Specification > ============= > > This proposal introduces a new concept called Execution Context (EC), > along with a set of Python APIs and C APIs to interact with it. > > EC is implemented using an immutable mapping. Every modification > of the mapping produces a new copy of it. To illustrate what it > means let's compare it to how we work with tuples in Python:: > > a0 = () > a1 = a0 + (1,) > a2 = a1 + (2,) > > # a0 is an empty tuple > # a1 is (1,) > # a2 is (1, 2) > > Manipulating an EC object would be similar:: > > a0 = EC() > a1 = a0.set('foo', 'bar') > a2 = a1.set('spam', 'ham') > > # a0 is an empty mapping > # a1 is {'foo': 'bar'} > # a2 is {'foo': 'bar', 'spam': 'ham'} > > In CPython, every thread that can execute Python code has a > corresponding ``PyThreadState`` object. It encapsulates important > runtime information like a pointer to the current frame, and is > being used by the ceval loop extensively. We add a new field to > ``PyThreadState``, called ``exec_context``, which points to the > current EC object. > > We also introduce a set of APIs to work with Execution Context. > In this section we will only cover two functions that are needed to > explain how Execution Context works. See the full list of new APIs > in the `New APIs`_ section. > > * ``sys.get_execution_context_item(key, default=None)``: lookup > ``key`` in the EC of the executing thread. If not found, > return ``default``. > > * ``sys.set_execution_context_item(key, value)``: get the > current EC of the executing thread. Add a ``key``/``value`` > item to it, which will produce a new EC object. Set the > new object as the current one for the executing thread. > In pseudo-code:: > > tstate = PyThreadState_GET() > ec = tstate.exec_context > ec2 = ec.set(key, value) > tstate.exec_context = ec2 > > Note, that some important implementation details and optimizations > are omitted here, and will be covered in later sections of this PEP. > > Now let's see how Execution Contexts work with regular multi-threaded > code, generators, and coroutines. > > > Regular & Multithreaded Code > ---------------------------- > > For regular Python code, EC behaves just like a thread-local. Any > modification of the EC object produces a new one, which is immediately > set as the current one for the thread state. > > .. figure:: pep-0550/functions.png > :align: center > :width: 90% > > Figure 1. Execution Context flow in a thread. > > As Figure 1 illustrates, if a function calls > ``set_execution_context_item()``, the modification of the execution > context will be visible to all subsequent calls and to the caller:: > > def set_foo(): > set_execution_context_item('foo', 'spam') > > set_execution_context_item('foo', 'bar') > print(get_execution_context_item('foo')) > > set_foo() > print(get_execution_context_item('foo')) > > # will print: > # bar > # spam > > > Coroutines > ---------- > > Python :pep:`492` coroutines are used to implement cooperative > multitasking. For a Python end-user they are similar to threads, > especially when it comes to sharing resources or modifying > the global state. > > An event loop is needed to schedule coroutines. Coroutines that > are explicitly scheduled by the user are usually called Tasks. > When a coroutine is scheduled, it can schedule other coroutines using > an ``await`` expression. In async/await world, awaiting a coroutine > can be viewed as a different calling convention: Tasks are similar to > threads, and awaiting on coroutines within a Task is similar to > calling functions within a thread. > > By drawing a parallel between regular multithreaded code and > async/await, it becomes apparent that any modification of the > execution context within one Task should be visible to all coroutines > scheduled within it. Any execution context modifications, however, > must not be visible to other Tasks executing within the same thread. > > To achieve this, a small set of modifications to the coroutine object > is needed: > > * When a coroutine object is instantiated, it saves a reference to > the current execution context object to its ``cr_execution_context`` > attribute. > > * Coroutine's ``.send()`` and ``.throw()`` methods are modified as > follows (in pseudo-C):: > > if coro->cr_isolated_execution_context: > # Save a reference to the current execution context > old_context = tstate->execution_context > > # Set our saved execution context as the current > # for the current thread. > tstate->execution_context = coro->cr_execution_context > > try: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro->send(...) > finally: > # Save a reference to the updated execution_context. > # We will need it later, when `.send()` or `.throw()` > # are called again. > coro->cr_execution_context = tstate->execution_context > > # Restore thread's execution context to what it was before > # invoking this coroutine. > tstate->execution_context = old_context > else: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro->send(...) > > * ``cr_isolated_execution_context`` is a new attribute on coroutine > objects. Set to ``True`` by default, it makes any execution context > modifications performed by coroutine to stay visible only to that > coroutine. > > When Python interpreter sees an ``await`` instruction, it flips > ``cr_isolated_execution_context`` to ``False`` for the coroutine > that is about to be awaited. This makes any changes to execution > context made by nested coroutine calls within a Task to be visible > throughout the Task. > > Because the top-level coroutine (Task) cannot be scheduled with > ``await`` (in asyncio you need to call ``loop.create_task()`` or > ``asyncio.ensure_future()`` to schedule a Task), all execution > context modifications are guaranteed to stay within the Task. > > * We always work with ``tstate->exec_context``. We use > ``coro->cr_execution_context`` only to store coroutine's execution > context when it is not executing. > > Figure 2 below illustrates how execution context mutations work with > coroutines. > > .. figure:: pep-0550/coroutines.png > :align: center > :width: 90% > > Figure 2. Execution Context flow in coroutines. > > In the above diagram: > > * When "coro1" is created, it saves a reference to the current > execution context "2". > > * If it makes any change to the context, it will have its own > execution context branch "2.1". > > * When it awaits on "coro2", any subsequent changes it does to > the execution context are visible to "coro1", but not outside > of it. > > In code:: > > async def inner_foo(): > print('inner_foo:', get_execution_context_item('key')) > set_execution_context_item('key', 2) > > async def foo(): > print('foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 1) > await inner_foo() > > print('foo:', get_execution_context_item('key')) > > > set_execution_context_item('key', 'spam') > print('main:', get_execution_context_item('key')) > > asyncio.get_event_loop().run_until_complete(foo()) > > print('main:', get_execution_context_item('key')) > > which will output:: > > main: spam > foo: spam > inner_foo: 1 > foo: 2 > main: spam > > Generator-based coroutines (generators decorated with > ``types.coroutine`` or ``asyncio.coroutine``) behave exactly as > native coroutines with regards to execution context management: > their ``yield from`` expression is semantically equivalent to > ``await``. > > > Generators > ---------- > > Generators in Python, while similar to Coroutines, are used in a > fundamentally different way. They are producers of data, and > they use ``yield`` expression to suspend/resume their execution. > > A crucial difference between ``await coro`` and ``yield value`` is > that the former expression guarantees that the ``coro`` will be > executed to the end, while the latter is producing ``value`` and > suspending the generator until it gets iterated again. > > Generators share 99% of their implementation with coroutines, and > thus have similar new attributes ``gi_execution_context`` and > ``gi_isolated_execution_context``. Similar to coroutines, generators > save a reference to the current execution context when they are > instantiated. The have the same implementation of ``.send()`` and > ``.throw()`` methods. > > The only difference is that > ``gi_isolated_execution_context`` is always set to ``True``, and > is never modified by the interpreter. ``yield from o`` expression in > regular generators that are not decorated with ``types.coroutine``, > is semantically equivalent to ``for v in o: yield v``. > > .. figure:: pep-0550/generators.png > :align: center > :width: 90% > > Figure 3. Execution Context flow in a generator. > > In the above diagram: > > * When "gen1" is created, it saves a reference to the current > execution context "2". > > * If it makes any change to the context, it will have its own > execution context branch "2.1". > > * When "gen2" is created, it saves a reference to the current > execution context for it -- "2.1". > > * Any subsequent execution context updated in "gen2" will only > be visible to "gen2". > > * Likewise, any context changes that "gen1" will do after it > created "gen2" will not be visible to "gen2". > > In code:: > > def inner_foo(): > for i in range(3): > print('inner_foo:', get_execution_context_item('key')) > set_execution_context_item('key', i) > yield i > > > def foo(): > set_execution_context_item('key', 'spam') > print('foo:', get_execution_context_item('key')) > > inner = inner_foo() > > while True: > val = next(inner, None) > if val is None: > break > yield val > print('foo:', get_execution_context_item('key')) > > set_execution_context_item('key', 'spam') > print('main:', get_execution_context_item('key')) > > list(foo()) > > print('main:', get_execution_context_item('key')) > > which will output:: > > main: ham > foo: spam > inner_foo: spam > foo: spam > inner_foo: 0 > foo: spam > inner_foo: 1 > foo: spam > main: ham > > As we see, any modification of the execution context in a generator > is visible only to the generator itself. > > There is one use-case where it is desired for generators to affect > the surrounding execution context: ``contextlib.contextmanager`` > decorator. To make the following work:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > we modified ``contextmanager`` to flip > ``gi_isolated_execution_context`` flag to ``False`` on its generator. > > > Greenlets > --------- > > Greenlet is an alternative implementation of cooperative > scheduling for Python. Although greenlet package is not part of > CPython, popular frameworks like gevent rely on it, and it is > important that greenlet can be modified to support execution > contexts. > > In a nutshell, greenlet design is very similar to design of > generators. The main difference is that for generators, the stack > is managed by the Python interpreter. Greenlet works outside of the > Python interpreter, and manually saves some ``PyThreadState`` > fields and pushes/pops the C-stack. Since Execution Context is > implemented on top of ``PyThreadState``, it's easy to add > transparent support of it to greenlet. > > > New APIs > ======== > > Even though this PEP adds a number of new APIs, please keep in mind, > that most Python users will likely ever use only two of them: > ``sys.get_execution_context_item()`` and > ``sys.set_execution_context_item()``. > > > Python > ------ > > 1. ``sys.get_execution_context_item(key, default=None)``: lookup > ``key`` for the current Execution Context. If not found, > return ``default``. > > 2. ``sys.set_execution_context_item(key, value)``: set > ``key``/``value`` item for the current Execution Context. > If ``value`` is ``None``, the item will be removed. > > 3. ``sys.get_execution_context()``: return the current Execution > Context object: ``sys.ExecutionContext``. > > 4. ``sys.set_execution_context(ec)``: set the passed > ``sys.ExecutionContext`` instance as a current one for the current > thread. > > 5. ``sys.ExecutionContext`` object. > > Implementation detail: ``sys.ExecutionContext`` wraps a low-level > ``PyExecContextData`` object. ``sys.ExecutionContext`` has a > mutable mapping API, abstracting away the real immutable > ``PyExecContextData``. > > * ``ExecutionContext()``: construct a new, empty, execution > context. > > * ``ec.run(func, *args)`` method: run ``func(*args)`` in the > ``ec`` execution context. > > * ``ec[key]``: lookup ``key`` in ``ec`` context. > > * ``ec[key] = value``: assign ``key``/``value`` item to the ``ec``. > > * ``ec.get()``, ``ec.items()``, ``ec.values()``, ``ec.keys()``, and > ``ec.copy()`` are similar to that of ``dict`` object. > > > C API > ----- > > C API is different from the Python one because it operates directly > on the low-level immutable ``PyExecContextData`` object. > > 1. New ``PyThreadState->exec_context`` field, pointing to a > ``PyExecContextData`` object. > > 2. ``PyThreadState_SetExecContextItem`` and > ``PyThreadState_GetExecContextItem`` similar to > ``sys.set_execution_context_item()`` and > ``sys.get_execution_context_item()``. > > 3. ``PyThreadState_GetExecContext``: similar to > ``sys.get_execution_context()``. Always returns an > ``PyExecContextData`` object. If ``PyThreadState->exec_context`` > is ``NULL`` an new and empty one will be created and assigned > to ``PyThreadState->exec_context``. > > 4. ``PyThreadState_SetExecContext``: similar to > ``sys.set_execution_context()``. > > 5. ``PyExecContext_New``: create a new empty ``PyExecContextData`` > object. > > 6. ``PyExecContext_SetItem`` and ``PyExecContext_GetItem``. > > The exact layout ``PyExecContextData`` is private, which allows > to switch it to a different implementation later. More on that > in the `Implementation Details`_ section. > > > Modifications in Standard Library > ================================= > > * ``contextlib.contextmanager`` was updated to flip the new > ``gi_isolated_execution_context`` attribute on the generator. > > * ``asyncio.events.Handle`` object now captures the current > execution context when it is created, and uses the saved > execution context to run the callback (with > ``ExecutionContext.run()`` method.) This makes > ``loop.call_soon()`` to run callbacks in the execution context > they were scheduled. > > No modifications in ``asyncio.Task`` or ``asyncio.Future`` were > necessary. > > Some standard library modules like ``warnings`` and ``decimal`` > can be updated to use new execution contexts. This will be considered > in separate issues if this PEP is accepted. > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > > Performance > =========== > > Implementation Details > ---------------------- > > The new ``PyExecContextData`` object is wrapping a ``dict`` object. > Any modification requires creating a shallow copy of the dict. > > While working on the reference implementation of this PEP, we were > able to optimize ``dict.copy()`` operation **5.5x**, see [4]_ for > details. > > .. figure:: pep-0550/dict_copy.png > :align: center > :width: 100% > > Figure 4. > > Figure 4 shows that the performance of immutable dict implemented > with shallow copying is expectedly O(n) for the ``set()`` operation. > However, this is tolerable until dict has more than 100 items > (1 ``set()`` takes about a microsecond.) > > Judging by the number of modules that need EC in Standard Library > it is likely that real world Python applications will use > significantly less than 100 execution context variables. > > The important point is that the cost of accessing a key in > Execution Context is always O(1). > > If the ``set()`` operation performance is a major concern, we discuss > alternative approaches that have O(1) or close ``set()`` performance > in `Alternative Immutable Dict Implementation`_, `Faster C API`_, and > `Copy-on-write Execution Context`_ sections. > > > Generators and Coroutines > ------------------------- > > Using a microbenchmark for generators and coroutines from :pep:`492` > ([12]_), it was possible to observe 0.5 to 1% performance degradation. > > asyncio echoserver microbechmarks from the uvloop project [13]_ > showed 1-1.5% performance degradation for asyncio code. > > asyncpg benchmarks [14]_, that execute more code and are closer to a > real-world application did not exhibit any noticeable performance > change. > > > Overall Performance Impact > -------------------------- > > The total number of changed lines in the ceval loop is 2 -- in the > ``YIELD_FROM`` opcode implementation. Only performance of generators > and coroutines can be affected by the proposal. > > This was confirmed by running Python Performance Benchmark Suite > [15]_, which demonstrated that there is no difference between > 3.7 master branch and this PEP reference implementation branch > (full benchmark results can be found here [16]_.) > > > Design Considerations > ===================== > > Alternative Immutable Dict Implementation > ----------------------------------------- > > Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) > to implement high performance immutable collections [5]_, [6]_. > > Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) > performance for both ``set()`` and ``get()`` operations, which will > be essentially O(1) for relatively small mappings in EC. > > To assess if HAMT can be used for Execution Context, we implemented > it in CPython [7]_. > > .. figure:: pep-0550/hamt_vs_dict.png > :align: center > :width: 100% > > Figure 5. Benchmark code can be found here: [9]_. > > Figure 5 shows that HAMT indeed displays O(1) performance for all > benchmarked dictionary sizes. For dictionaries with less than 100 > items, HAMT is a bit slower than Python dict/shallow copy. > > .. figure:: pep-0550/lookup_hamt.png > :align: center > :width: 100% > > Figure 6. Benchmark code can be found here: [10]_. > > Figure 6 below shows comparison of lookup costs between Python dict > and an HAMT immutable mapping. HAMT lookup time is 30-40% worse > than Python dict lookups on average, which is a very good result, > considering how well Python dicts are optimized. > > Note, that according to [8]_, HAMT design can be further improved. > > The bottom line is that the current approach with implementing > an immutable mapping with shallow-copying dict will likely perform > adequately in real-life applications. The HAMT solution is more > future proof, however. > > The proposed API is designed in such a way that the underlying > implementation of the mapping can be changed completely without > affecting the Execution Context `Specification`_, which allows > us to switch to HAMT at some point if necessary. > > > Copy-on-write Execution Context > ------------------------------- > > The implementation of Execution Context in .NET is different from > this PEP. .NET uses copy-on-write mechanism and a regular mutable > mapping. > > One way to implement this in CPython would be to have two new > fields in ``PyThreadState``: > > * ``exec_context`` pointing to the current Execution Context mapping; > * ``exec_context_copy_on_write`` flag, set to ``0`` initially. > > The idea is that whenever we are modifying the EC, the copy-on-write > flag is checked, and if it is set to ``1``, the EC is copied. > > Modifications to Coroutine and Generator ``.send()`` and ``.throw()`` > methods described in the `Coroutines`_ section will be almost the > same, except that in addition to the ``gi_execution_context`` they > will have a ``gi_exec_context_copy_on_write`` flag. When a coroutine > or a generator starts, the flag will be set to ``1``. This will > ensure that any modification of the EC performed within a coroutine > or a generator will be isolated. > > This approach has one advantage: > > * For Execution Context that contains a large number of items, > copy-on-write is a more efficient solution than the shallow-copy > dict approach. > > However, we believe that copy-on-write disadvantages are more > important to consider: > > * Copy-on-write behaviour for generators and coroutines makes > EC semantics less predictable. > > With immutable EC approach, generators and coroutines always > execute in the EC that was current at the moment of their > creation. Any modifications to the outer EC while a generator > or a coroutine is executing are not visible to them:: > > def generator(): > yield 1 > print(get_execution_context_item('key')) > yield 2 > > set_execution_context_item('key', 'spam') > gen = iter(generator()) > next(gen) > set_execution_context_item('key', 'ham') > next(gen) > > The above script will always print 'spam' with immutable EC. > > With a copy-on-write approach, the above script will print 'ham'. > Now, consider that ``generator()`` was refactored to call some > library function, that uses Execution Context:: > > def generator(): > yield 1 > some_function_that_uses_decimal_context() > print(get_execution_context_item('key')) > yield 2 > > Now, the script will print 'spam', because > ``some_function_that_uses_decimal_context`` forced the EC to copy, > and ``set_execution_context_item('key', 'ham')`` line did not > affect the ``generator()`` code after all. > > * Similarly to the previous point, ``sys.ExecutionContext.run()`` > method will also become less predictable, as > ``sys.get_execution_context()`` would still return a reference to > the current mutable EC. > > We can't modify ``sys.get_execution_context()`` to return a shallow > copy of the current EC, because this would seriously harm > performance of ``asyncio.call_soon()`` and similar places, where > it is important to propagate the Execution Context. > > * Even though copy-on-write requires to shallow copy the execution > context object less frequently, copying will still take place > in coroutines and generators. In which case, HAMT approach will > perform better for medium to large sized execution contexts. > > All in all, we believe that the copy-on-write approach introduces > very subtle corner cases that could lead to bugs that are > exceptionally hard to discover and fix. > > The immutable EC solution in comparison is always predictable and > easy to reason about. Therefore we believe that any slight > performance gain that the copy-on-write solution might offer is not > worth it. > > > Faster C API > ------------ > > Packages like numpy and standard library modules like decimal need > to frequently query the global state for some local context > configuration. It is important that the APIs that they use is as > fast as possible. > > The proposed ``PyThreadState_SetExecContextItem`` and > ``PyThreadState_GetExecContextItem`` functions need to get the > current thread state with ``PyThreadState_GET()`` (fast) and then > perform a hash lookup (relatively slow). We can eliminate the hash > lookup by adding three additional C API functions: > > * ``Py_ssize_t PyExecContext_RequestIndex(char *key_name)``: > a function similar to the existing ``_PyEval_RequestCodeExtraIndex`` > introduced :pep:`523`. The idea is to request a unique index > that can later be used to lookup context items. > > The ``key_name`` can later be used by ``sys.ExecutionContext`` to > introspect items added with this API. > > * ``PyThreadState_SetExecContextIndexedItem(Py_ssize_t index, PyObject > *val)`` > and ``PyThreadState_GetExecContextIndexedItem(Py_ssize_t index)`` > to request an item by its index, avoiding the cost of hash lookup. > > > Why setting a key to None removes the item? > ------------------------------------------- > > Consider a context manager:: > > @contextmanager > def context(x): > old_x = get_execution_context_item('x') > set_execution_context_item('x', x) > try: > yield > finally: > set_execution_context_item('x', old_x) > > With ``set_execution_context_item(key, None)`` call removing the > ``key``, the user doesn't need to write additional code to remove > the ``key`` if it wasn't in the execution context already. > > An alternative design with ``del_execution_context_item()`` method > would look like the following:: > > @contextmanager > def context(x): > not_there = object() > old_x = get_execution_context_item('x', not_there) > set_execution_context_item('x', x) > try: > yield > finally: > if old_x is not_there: > del_execution_context_item('x') > else: > set_execution_context_item('x', old_x) > > > Can we fix ``PyThreadState_GetDict()``? > --------------------------------------- > > ``PyThreadState_GetDict`` is a TLS, and some of its existing users > might depend on it being just a TLS. Changing its behaviour to follow > the Execution Context semantics would break backwards compatibility. > > > PEP 521 > ------- > > :pep:`521` proposes an alternative solution to the problem: > enhance Context Manager Protocol with two new methods: ``__suspend__`` > and ``__resume__``. To make it compatible with async/await, > the Asynchronous Context Manager Protocol will also need to be > extended with ``__asuspend__`` and ``__aresume__``. > > This allows to implement context managers like decimal context and > ``numpy.errstate`` for generators and coroutines. > > The following code:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > would become this:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __suspend__(self): > set_execution_context_item('x', self.old_x) > > def __resume__(self): > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > Besides complicating the protocol, the implementation will likely > negatively impact performance of coroutines, generators, and any code > that uses context managers, and will notably complicate the > interpreter implementation. It also does not solve the leaking state > problem for greenlet/gevent. > > :pep:`521` also does not provide any mechanism to propagate state > in a local context, like storing a request object in an HTTP request > handler to have better logging. > > > Can Execution Context be implemented outside of CPython? > -------------------------------------------------------- > > Because async/await code needs an event loop to run it, an EC-like > solution can be implemented in a limited way for coroutines. > > Generators, on the other hand, do not have an event loop or > trampoline, making it impossible to intercept their ``yield`` points > outside of the Python interpreter. > > > Reference Implementation > ======================== > > The reference implementation can be found here: [11]_. > > > References > ========== > > .. [1] https://blog.golang.org/context > > .. [2] > https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx > > .. [3] https://github.com/numpy/numpy/issues/9444 > > .. [4] http://bugs.python.org/issue31179 > > .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie > > .. [6] > http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html > > .. [7] https://github.com/1st1/cpython/tree/hamt > > .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf > > .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > .. [11] https://github.com/1st1/cpython/tree/pep550 > > .. [12] https://www.python.org/dev/peps/pep-0492/#async-await > > .. [13] > https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py > > .. [14] https://github.com/MagicStack/pgbench > > .. [15] https://github.com/python/performance > > .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --pau From yselivanov.ml at gmail.com Sun Aug 13 12:06:59 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 12:06:59 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Hi Jonathan, Thanks for the feedback. I'll update the PEP to use Nathaniel's idea of of `sys.get_context_key`. It will be a pretty similar API to what you currently have in prompt_toolkit. Yury From yselivanov.ml at gmail.com Sun Aug 13 12:10:49 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 12:10:49 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Hi Pau, Re string keys collisions -- I decided to update the PEP to follow Nathaniel's suggestion to use a get_context_key api, which will eliminate this problem entirely. Re call_soon in asyncio.Task -- yes, it does use ec.run() to invoke coroutine.send(). However, this has almost no visible effect, as ExecutionContext.run() is a very cheap operation (think 1-2 function calls). It's possible to add a new keyword arg to call_soon like "ignore_execution_context" to eliminate even this small overhead, but this is something we can easily do later. Yury From yselivanov.ml at gmail.com Sun Aug 13 12:14:53 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 12:14:53 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: <20170812103339.GA2735@bytereef.org> References: <20170812103339.GA2735@bytereef.org> Message-ID: >> This is a new PEP to implement Execution Contexts in Python. > The idea is of course great! Thanks! > A couple of issues for decimal: > >> Moreover, passing the context explicitly does not work at all for >> libraries like ``decimal`` or ``numpy``, which use operator overloading. > > Instead of "with localcontext() ...", each coroutine can create a new > Context() and use its methods, without any loss of functionality. > > All one loses is the inline operator syntax sugar. > > I'm aware you know all this, but the entire decimal paragraph sounds a bit > as if this option did not exist. The problem is that almost everybody does use the Decimal type directly, as overloaded operators make it so convenient. It's not apparent that using the decimal this way has a dangerous flaw. > >> Fast C API for packages like ``decimal`` and ``numpy``. > > _decimal relies on caching the most recently used thread-local context, > which gives a speedup of about 25% for inline operators: > > https://github.com/python/cpython/blob/master/Modules/_decimal/_decimal.c#L1639 I've seen that, it's a clever trick! With the current PEP 550 semantics it's possible to replicate this trick, you just store a reference to the latest EC in your decimal context for cache invalidation. Because ECs are immutable, it's a safe thing to do. Yury From yselivanov.ml at gmail.com Sun Aug 13 12:33:51 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 12:33:51 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Sat, Aug 12, 2017 at 10:09 PM, Nick Coghlan wrote: > On 13 August 2017 at 03:53, Yury Selivanov wrote: >> On Sat, Aug 12, 2017 at 1:09 PM, Nick Coghlan wrote: >>> Now that you raise this point, I think it means that generators need >>> to retain their current context inheritance behaviour, simply for >>> backwards compatibility purposes. This means that the case we need to >>> enable is the one where the generator *doesn't* dynamically adjust its >>> execution context to match that of the calling function. >> >> Nobody *intentionally* iterates a generator manually in different >> decimal contexts (or any other contexts). This is an extremely error >> prone thing to do, because one refactoring of generator -- rearranging >> yields -- would wreck your custom iteration/context logic. I don't >> think that any real code relies on this, and I don't think that we are >> breaking backwards compatibility here in any way. How many users need >> about this? > > I think this is a reasonable stance for the PEP to take, but the > hidden execution state around the "isolated or not" behaviour still > bothers me. > > In some ways it reminds me of the way function parameters work: the > bound parameters are effectively a *shallow* copy of the passed > arguments, so callers can decide whether or not they want the callee > to be able to modify them based on the arguments' mutability (or lack > thereof). Mutable default values for function arguments is one of the most confusing things to its users. I've seen numerous threads on StackOverflow/Reddit with people complaining about it. > That similarity makes me wonder whether the "isolated or not" > behaviour could be moved from the object being executed and directly > into the key/value pairs themselves based on whether or not the values > were mutable, as that's the way function calls work: if the argument > is immutable, the callee *can't* change it, while if it's mutable, the > callee can mutate it, but it still can't rebind it to refer to a > different object. I'm afraid that if we design EC context to behave differently for mutable/immutable values, it will be an even harder thing to understand to end users. > 1. If a parent context wants child contexts to be able to make > changes, then it should put a *mutable* object in the context (e.g. a > list or class instance) > 2. If a parent context *does not* want child contexts to be able to > make changes, then it should put an *immutable* object in the context > (e.g. a tuple or number) > 3. If a child context *wants to share a context key with its parent, > then it should *mutate* it in place > 4. If a child context *does not* want to share a context key with its > parent, then it should *rebind* it to a different object It's possible to put mutable values even with the current PEP 550 API. The issue that Nathaniel has with it, is that he actually wants the API to behave exactly like it does to implement his timeouts logic, but: there's a corner case, where isolating generator state at the time when it is created doesn't work in his favor. FWIW I believe that I now have a complete solution for the generator.send() problem that will make it possible for Nathaniel to implement his Trio APIs. The functional PoC is here: https://github.com/1st1/cpython/tree/pep550_gen The key change is to make generators and asynchronous generators to: 1. Have their own empty execution context when created. It will be used for whatever local modifications they do to it, ensuring that their state never escapes to the outside world (gi_isolated_execution_context flag is still here for contextmanager). 2. ExecutionContext has a new internal pointer called ec_back. In the Generator.send/throw method, ec_back is dynamically set to the current execution context. 3. This makes it possible for generators to see any outside changes in the execution context *and* have their own, where they can make *local* changes. So (pseudo-code): def gen(): print('1', context) yield print('2', context) with context(spam=ham): yield print('3', context) yield print('4', context) yield g = gen() context(foo=1, spam='bar') next(g) context(foo=2) next(g) context(foo=3) next(g) context(foo=4) next(g) will print: 1 {foo=1, spam=bar} 2 {foo=2, spam=bar} 3 {foo=3, spam=ham} 4 {foo=4, spam=bar} There are some downsides to the approach, mainly from the performance standpoint, but in a common case they will be negligible, if detectable at all. Yury From yselivanov.ml at gmail.com Sun Aug 13 12:57:20 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 12:57:20 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: [replying to the list] On Sun, Aug 13, 2017 at 6:14 AM, Nick Coghlan wrote: > On 13 August 2017 at 16:01, Yury Selivanov wrote: >> On Sat, Aug 12, 2017 at 10:56 PM, Nick Coghlan wrote: >> [..] >>> As Nathaniel suggestion, getting/setting/deleting individual items in >>> the current context would be implemented as methods on the ContextItem >>> objects, allowing the return value of "get_context_items" to be a >>> plain dictionary, rather than a special type that directly supported >>> updates to the underlying context. >> >> The current PEP 550 design returns a "snapshot" of the current EC with >> sys.get_execution_context(). >> >> I.e. if you do >> >> ec = sys.get_execution_context() >> ec['a'] = 'b' >> >> # sys.get_execution_context_item('a') will return None >> >> You did get a snapshot and you modified it -- but your modifications >> are not visible anywhere. You can run a function in that modified EC >> with `ec.run(function)` and that function will see that new 'a' key, >> but that's it. There's no "magical" updates to the underlying context. > > In that case, I think "get_execution_context()" is quite misleading as > a name, and is going to be prone to exactly the confusion we currently > have with the mapping returned by locals(), which is that regardless > of whether writes to it affect the target namespace or not, it's going > to be surprising in at least some situations. > > So despite being initially in favour of exposing a mapping-like API at > the Python level, I'm now coming around to Armin Ronacher's point of > view: the copy-on-write semantics for the active context are > sufficiently different from any other mapping type in Python that we > should just avoid the use of __setitem__ and __delitem__ as syntactic > sugar entirely. I agree. I'll be redesigning the PEP to use the following API (please ignore the naming peculiarities, there are so many proposals at this point that I'll just stick to something I have in my head): 1. sys.new_execution_context_key('description') -> sys.ContextItem (or maybe we should just expose the sys.ContextItem type and let people instantiate it?) A key (or "token") to use with the execution context. Besides eliminating the names collision issue, it'll also have a slightly better performance, because its __hash__ method will always return a constant. (Strings cache their __hash__, but other types don't). 2. ContextItem.has(), ContextItem.get(), ContextItem.set(), ContextItem.delete() -- pretty self-explanatory. 3. sys.get_active_context() -> sys.ExecutionContext -- an immutable object, has no methods to modify the context. 3a. sys.ExecutionContext.run(callable, *args) -- run a callable(*args) in some execution context. 3b. sys.ExecutionContext.items() -- an iterator of ContextItem -> value for introspection and debugging purposes. 4. No sys.set_execution_context() method. At this point I'm not sure it's a good idea to allow users to change the current execution context to something else entirely. For use cases like enabling concurrent.futures to run your function within the current EC, you just use the sys.get_active_context()/ExecutionContext.run combination. If anything, we can add this function later. > Instead, we'd lay out the essential primitive operations that *only* > the interpreter can provide and define procedural interfaces for > those, and if anyone wanted to build a higher level object-oriented > interface on top of those primitives, they'd be free to do so, with > the procedural API acting as the abstraction layer that decouples "how > interpreters actually implement it" (e.g. copy-on-write mappings) from > "how libraries and frameworks model it for their own use" (e.g. rich > application context objects). That way, each interpreter would also be > free to define their *internal* object model in whichever way made the > most sense for them, rather than enshrining a point-in-time snaphot of > CPython's preferred implementation model as part of the language > definition. I agree. I like that this idea gives us more flexibility with the exact implementation strategy. [..] > The essential capabilities for active context manipulation would then be: > > - get_active_context_token() > - set_active_context(context_token) As I mentioned above, at this point I'm not entirely sure that we even need "set_active_context". The only useful thing for it that I can imagine is creating a decorator that isolates any changes of the context, but the only usecase for this I see is unittests. But even for unittests, a better solution is to use a decorator that detects keys that were added but not deleted during the test (leaks). > - implicitly saving and reverting the active context around various operations Usually we need to save/revert one particular context item, not the whole context. > - accessing the active context id for suspended coroutines and > generators (so parent contexts can opt-in to seeing changes made in > child contexts) Yes, this might be useful, let's keep it. > > Running commands in a particular context *wouldn't* be a primitive > operation given those building blocks, since you can implement that > for yourself using the above primitives: > > def run_in_context(target_context_token, func, *args, **kwds): > old_context_token = get_active_context_token() > set_active_context(target_context_token) > try: > func(*args, **kwds) > finally: > set_active_context(old_context_token) I'd still prefer to implement this as part of the spec. There are some tricks that I want to use to make ExecutionContext.run() much faster than a pure Python version. This is a highly performance critical part of the PEP -- call_soon in asyncio is a VERY frequent thing. Besides, having ExecutionContext.run eliminates the need to sys.set_active_context() -- again, we need to discuss this, but I see less and less utility for it now. > > The public manipulation API here would be deliberately based on opaque > tokens to make it clear that creating and mutating execution contexts > is entirely within the realm of the interpreter implementation, and > user level code can only control *which* execution context is active > in the current thread, not create arbitrary new execution contexts of > its own (at least, not without writing a CPython-specific C > extension). > > For manipulation of values within the active context, looking at other > comparable APIs, I think the main prior art within the language would > be: > > 1. threading.local(), which uses the descriptor protocol to handle > arbitrary attributes > 2. Cell variable references in function `__closure__` attributes, > which also uses the descriptor protocol by way of the "cell_contents" > attribute > > In 3.7, those two examples are being brought closer by way of > `cell_contents` becoming a read/write attribute: > > >>> def f(i): > ... def g(): > ... nonlocal i > ... return i > ... return g > ... > >>> g = f(0) > >>> g() > 0 > >>> cell = g.__closure__[0] > >>> cell.cell_contents > 0 > >>> cell.cell_contents = 5 > >>> g() > 5 > >>> del cell.cell_contents > >>> g() > Traceback (most recent call last): > ... > NameError: free variable 'i' referenced before assignment in enclosing scope > >>> cell.cell_contents = 0 > >>> g() > 0 > > This is very similar to the way manipulation of entries within a > thread local namespace works, but with each cell containing exactly > one attribute. > > For context items, I agree with Nathaniel that the cell-style > one-value-per-item approach is likely to be the way to go. To > emphasise that changes to that attribute only affect the *active* > context, I think "active_value" would be a good name: > > >>> request_id = > sys.create_context_item("my_web_framework.request_id", "Request > identifier for my_web_framework") > >>> request_id.active_value > Traceback (most recent call last): > ... > RuntimeError: Context item "my_web_framework.request" not set in > context > >>> request_id.active_value = "12345" > >>> request_id.active_value > '12345' I myself prefer a functional API to to __getattr__. I don't like the "del local.x" syntax. I don't think we are forced to follow the threading.local() API here, aren't we? Yury > > Finally, given opaque context tokens, and context items that worked > like closure cells (only accessing the active context rather than > lexically scoped variables), the one introspection primitive the > *interpreter* would need to provide is either: > > 1. Given a context token, return a mapping from context items to their > defined values in the given context > 2. A way to get a listing of the context items defined in the active context > > Since either of those can be defined in terms of the other, my own > preference goes to the first one, since using it to implement the > second alternative just requires a simple > `sys.get_active_context_token()` call, while implementing the first > one in terms of the second one requires a helper like > `run_in_context()` above to manipulate the active context in the > current thread. > > The first one also makes it fairly straightforward to *diff* a given > context against the active one - get the mappings for both contexts, > check which keys they have in common, compare the values for the > common keys, and then report on > > - keys that appear in one context but not the other > - values which differ between them for common keys > - (optionally) values which are the same for common keys > > Cheers, > Nick. From twshere at outlook.com Sun Aug 13 08:49:45 2017 From: twshere at outlook.com (=?iso-2022-jp?B?GyRCMiZAaxsoQiA/?=) Date: Sun, 13 Aug 2017 12:49:45 +0000 Subject: [Python-ideas] How do you think about these language extensions? Message-ID: Hi all, I've just finished a language extension for CPython 3.6.x to support some additional grammars like Pattern Matching. And It's compatible with CPython. I'm looking for constructive advice, and I wonder if you will be interested in this one. ? the project address is https://github.com/thautwarm/flowpython) [https://avatars1.githubusercontent.com/u/22536460?v=4&s=400] thautwarm/flowpython github.com flowpython - tasty feature extensions for python(python3). Some examples here: # where syntax from math import pi r = 1 # the radius h = 10 # the height S = (2*S_top + S_side) where: S_top = pi*r**2 S_side = C * h where: C = 2*pi*r # lambda&curry : lambda x: lambda y: lambda z: ret where: ret = x+y ret -= z .x -> .y -> .z -> ret where: ret = x+y ret -= z as-with x def as y def as z def ret where: ret = x+y ret -= z # arrow transform (to avoid endless parentheses and try to be more readable. >> range(5) -> map(.x->x+2, _) -> list(_) >> [2,3,4,5,6] # pattern matching # use "condic" as keyword is for avoiding the conflictions against the standard libraries and packages from third party. "switch" and "match" both lead to conflictions. condic+(type) 1: case a:int => assert a == 1 and type(a) == 1 [>] case 0 => assert 1 > 0 [is not] case 1 => assert 1 is not 1 otherwise => print("nothing") condic+() [1,2,3]: case (a,*b)->b:list => sum(b) +[] case [] => print('empty list') +[==] case (a,b):(1,2) => print("the list is [1,2]") The grammars with more details and examples can be found in https://github.com/thautwarm/flowpython/wiki Does it interest you? If so, you can try it if you have CPython 3.6.x. pip install flowpython python -m flowpython -m enable/disable Here is an example to use flowpython, which gives the permutations of a sequence. from copy import deepcopy permutations = .seq -> seq_seq where: condic+[] seq: case (a, ) => seq_seq = [a,] case (a, b) => seq_seq = [[a,b],[b,a]] case (a,*b) => seq_seq = permutations(b) -> map(.x -> insertAll(x, a), _) -> sum(_, []) where: insertAll = . x, a -> ret where: ret = [ deepcopy(x) -> _.insert(i, a) or _ for i in (len(x) -> range(_+1)) ] If the object permutations are defined, try these codes in console: >> range(3) -> permutations(_) >> [[0, 1, 2], [1, 0, 2], [1, 2, 0], [0, 2, 1], [2, 0, 1], [2, 1, 0]] Does it seem to be interesting? Thanks, Thautwarm -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sun Aug 13 14:44:24 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 14:44:24 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: I'll start a new thread to discuss is we want this specific semantics change soon (with some updates). Yury From njs at pobox.com Sun Aug 13 15:14:07 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Aug 2017 12:14:07 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Sun, Aug 13, 2017 at 9:57 AM, Yury Selivanov wrote: > 2. ContextItem.has(), ContextItem.get(), ContextItem.set(), > ContextItem.delete() -- pretty self-explanatory. It might make sense to simplify even further and declare that context items are initialized to None to start, and the only operations are set() and get(). And then get() can't fail, b/c there is no "value missing" state. -n -- Nathaniel J. Smith -- https://vorpus.org From twshere at outlook.com Sun Aug 13 15:46:57 2017 From: twshere at outlook.com (=?gb2312?B?zfXQ+yDV1A==?=) Date: Sun, 13 Aug 2017 19:46:57 +0000 Subject: [Python-ideas] Python-ideas Digest, Vol 129, Issue 44 In-Reply-To: References: Message-ID: Thank you for your consideration. ?? Outlook for Android ???: python-ideas-request at python.org ????: 8?14???? 03:14 ??: Python-ideas Digest, Vol 129, Issue 44 ???: python-ideas at python.org Send Python-ideas mailing list submissions to python-ideas at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/python-ideas or, via email, send a message with subject or body 'help' to python-ideas-request at python.org You can reach the person managing the list at python-ideas-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Python-ideas digest..." Today's Topics: 1. How do you think about these language extensions? (?? ?) 2. Re: New PEP 550: Execution Context (Yury Selivanov) 3. Re: New PEP 550: Execution Context (Nathaniel Smith) ---------------------------------------------------------------------- Message: 1 Date: Sun, 13 Aug 2017 12:49:45 +0000 From: ?? ? To: "python-ideas at python.org" Subject: [Python-ideas] How do you think about these language extensions? Message-ID: Content-Type: text/plain; charset="iso-2022-jp" Hi all, I've just finished a language extension for CPython 3.6.x to support some additional grammars like Pattern Matching. And It's compatible with CPython. I'm looking for constructive advice, and I wonder if you will be interested in this one. ? the project address is https://github.com/thautwarm/flowpython) [https://avatars1.githubusercontent.com/u/22536460?v=4&s=400] thautwarm/flowpython github.com flowpython - tasty feature extensions for python(python3). Some examples here: # where syntax from math import pi r = 1 # the radius h = 10 # the height S = (2*S_top + S_side) where: S_top = pi*r**2 S_side = C * h where: C = 2*pi*r # lambda&curry : lambda x: lambda y: lambda z: ret where: ret = x+y ret -= z .x -> .y -> .z -> ret where: ret = x+y ret -= z as-with x def as y def as z def ret where: ret = x+y ret -= z # arrow transform (to avoid endless parentheses and try to be more readable. >> range(5) -> map(.x->x+2, _) -> list(_) >> [2,3,4,5,6] # pattern matching # use "condic" as keyword is for avoiding the conflictions against the standard libraries and packages from third party. "switch" and "match" both lead to conflictions. condic+(type) 1: case a:int => assert a == 1 and type(a) == 1 [>] case 0 => assert 1 > 0 [is not] case 1 => assert 1 is not 1 otherwise => print("nothing") condic+() [1,2,3]: case (a,*b)->b:list => sum(b) +[] case [] => print('empty list') +[==] case (a,b):(1,2) => print("the list is [1,2]") The grammars with more details and examples can be found in https://github.com/thautwarm/flowpython/wiki Does it interest you? If so, you can try it if you have CPython 3.6.x. pip install flowpython python -m flowpython -m enable/disable Here is an example to use flowpython, which gives the permutations of a sequence. from copy import deepcopy permutations = .seq -> seq_seq where: condic+[] seq: case (a, ) => seq_seq = [a,] case (a, b) => seq_seq = [[a,b],[b,a]] case (a,*b) => seq_seq = permutations(b) -> map(.x -> insertAll(x, a), _) -> sum(_, []) where: insertAll = . x, a -> ret where: ret = [ deepcopy(x) -> _.insert(i, a) or _ for i in (len(x) -> range(_+1)) ] If the object permutations are defined, try these codes in console: >> range(3) -> permutations(_) >> [[0, 1, 2], [1, 0, 2], [1, 2, 0], [0, 2, 1], [2, 0, 1], [2, 1, 0]] Does it seem to be interesting? Thanks, Thautwarm -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sun, 13 Aug 2017 14:44:24 -0400 From: Yury Selivanov To: Nick Coghlan Cc: Nathaniel Smith , "python-ideas at python.org" Subject: Re: [Python-ideas] New PEP 550: Execution Context Message-ID: Content-Type: text/plain; charset="UTF-8" I'll start a new thread to discuss is we want this specific semantics change soon (with some updates). Yury ------------------------------ Message: 3 Date: Sun, 13 Aug 2017 12:14:07 -0700 From: Nathaniel Smith To: Yury Selivanov Cc: Nick Coghlan , Python-Ideas Subject: Re: [Python-ideas] New PEP 550: Execution Context Message-ID: Content-Type: text/plain; charset="UTF-8" On Sun, Aug 13, 2017 at 9:57 AM, Yury Selivanov wrote: > 2. ContextItem.has(), ContextItem.get(), ContextItem.set(), > ContextItem.delete() -- pretty self-explanatory. It might make sense to simplify even further and declare that context items are initialized to None to start, and the only operations are set() and get(). And then get() can't fail, b/c there is no "value missing" state. -n -- Nathaniel J. Smith -- https://vorpus.org ------------------------------ Subject: Digest Footer _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas ------------------------------ End of Python-ideas Digest, Vol 129, Issue 44 ********************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sun Aug 13 16:54:23 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 13 Aug 2017 16:54:23 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Sun, Aug 13, 2017 at 3:14 PM, Nathaniel Smith wrote: > On Sun, Aug 13, 2017 at 9:57 AM, Yury Selivanov wrote: >> 2. ContextItem.has(), ContextItem.get(), ContextItem.set(), >> ContextItem.delete() -- pretty self-explanatory. > > It might make sense to simplify even further and declare that context > items are initialized to None to start, and the only operations are > set() and get(). And then get() can't fail, b/c there is no "value > missing" state. I like this idea! It aligns with what I wanted to do in PEP 550 initially, but without the awkwardness of "delete on None". Will add this to the PEP. Yury From ncoghlan at gmail.com Mon Aug 14 04:10:01 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 14 Aug 2017 18:10:01 +1000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On 14 August 2017 at 02:33, Yury Selivanov wrote: > On Sat, Aug 12, 2017 at 10:09 PM, Nick Coghlan wrote: >> That similarity makes me wonder whether the "isolated or not" >> behaviour could be moved from the object being executed and directly >> into the key/value pairs themselves based on whether or not the values >> were mutable, as that's the way function calls work: if the argument >> is immutable, the callee *can't* change it, while if it's mutable, the >> callee can mutate it, but it still can't rebind it to refer to a >> different object. > > I'm afraid that if we design EC context to behave differently for > mutable/immutable values, it will be an even harder thing to > understand to end users. There's nothing to design, as storing a list (or other mutable object) in an EC will necessarily be the same as storing one in a tuple: the fact you acquired the reference via an immutable container will do *nothing* to keep you from mutating the referenced object. And for use cases like web requests, that's exactly the behaviour we want - changing the active web request is an EC level operation, but making changes to the state of the currently active request (e.g. in a middleware processor) won't require anything special. [I'm going to snip the rest of the post, as it sounds pretty reasonable to me, and my questions about the interaction between sys.set_execution_context() and ec_back go away if sys.set_execution_context() doesn't exist as you're currently proposing] > (gi_isolated_execution_context flag is still here for contextmanager). This hidden flag variable on the types managing suspendable frames is still the piece of the proposal that strikes me as being the most potentially problematic, as it at least doubles the number of flows of control that need to be tested. Essentially what we're aiming to model is: 1. Performing operations in a way that modifies the active execution context 2. Performing them in a way that saves & restores the execution context For synchronous calls, this distinction is straightforward: - plain calls may alter the active execution context via state mutation - use ec.run() to save/restore the execution context around the operation (The ec_back idea means we may also need an "ec.run()" variant that sets ec_back appropriately before making the call - for example, "ec.run()" could set ec_back, while a separate "ec.run_isolated()" could skip setting it. Alternatively, full isolation could be the default, and "ec.run_shared()" would set ec_back. If we go with the latter option, then "ec_shared" might be a better attribute name than "ec_back") A function can be marked as always having its own private context using a decorator like so: def private_context(f) @functools.wraps(f) def wrapper(*args, **kwds): ec = sys.get_active_context() return ec.run(f, *args, **kwds) return wrapper For next/send/throw and anext/asend/athrow, however, the proposal is to bake the save/restore into the *target objects*, rather than having to request it explicitly in the way those objects get called. This means that unless we apply some implicit decorator magic to the affected slot definitions, there's now going to be a major behavioural difference between: some_state = sys.new_context_item() def local_state_changer(x): for i in range(x): some_state.set(x) yield x class ParentStateChanger: def __init__(self, x): self._itr = iter(range(x)) def __iter__(self): return self def __next__(self): x = next(self._itr) some_state.set(x) return x The latter would need the equivalent of `@private_context` on the `__next__` method definition to get the behaviour that generators would have by default (and similarly for __anext__ and asynchronous generators). I haven't fully thought through the implications of this problem yet, but some initial unordered thoughts: - implicit method decorators are always suspicious, but skipping them in this case feels like we'd be setting up developers of custom iterators for really subtle context management bugs - contextlib's own helper classes would be fine, since they define __enter__ & __exit__, which wouldn't be affected by this - for lru_cache, we rely on `__wrapped__` to get access to the underlying function without caching applied. Might it make sense to do something similar for these implicitly context-restoring methods? If so, should we use a dedicated name so that additional wrapper layers don't overwrite it? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Mon Aug 14 12:56:33 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 14 Aug 2017 09:56:33 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Could someone (perhaps in a new thread?) summarize the current proposal, with some examples of how typical use cases would look? This is an important topic but the discussion is way too voluminous for me to follow while I'm on vacation with my family, and the PEP spends too many words on motivation and not enough on crisply explaining how the proposed feature works (what state is stored where how it's accessed, and how it's manipulated behind the scenes). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 14 14:02:11 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 14 Aug 2017 14:02:11 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On Mon, Aug 14, 2017 at 12:56 PM, Guido van Rossum wrote: > Could someone (perhaps in a new thread?) summarize the current proposal, > with some examples of how typical use cases would look? This is an important > topic but the discussion is way too voluminous for me to follow while I'm on > vacation with my family, and the PEP spends too many words on motivation and > not enough on crisply explaining how the proposed feature works (what state > is stored where how it's accessed, and how it's manipulated behind the > scenes). I'm working on it. Will start a new thread today. Yury From barry at python.org Mon Aug 14 14:09:41 2017 From: barry at python.org (Barry Warsaw) Date: Mon, 14 Aug 2017 11:09:41 -0700 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Yury Selivanov wrote: > This is a new PEP to implement Execution Contexts in Python. It dawns on me that I might be able to use ECs to do a better job of implementing flufl.i18n's translation contexts. I think this is another example of what the PEP's abstract describes as "Context managers like decimal contexts, numpy.errstate, and warnings.catch_warnings;" The _ object maintains a stack of the language codes being used, and you can push a new code onto the stack (typically using `with` so they get automatically popped when exiting). The use case for this is translating say a notification to multiple recipients in the same request, one who speaks French, one who speaks German, and another that speaks English. The problem is that _ is usually a global in a typical application, so in an async environment, if one request is translating to 'fr', another might be translating to 'de', or even a deferred context (e.g. because you want to mark a string but not translate it until some later use). While I haven't used it in an async environment yet, the current approach probably doesn't work very well, or at all. I'd probably start by recommending a separate _ object in each thread, but that's less convenient to use in practice. It seems like it would be better to either attach an _ object to each EC, or to implement the stack of codes in the EC and let the global _ access that stack. It feels a lot like `let` in lisp, but without the implicit addition of the contextual keys into the local namespace. E.g. in a PEP 550 world, you'd have to explicitly retrieve the key/values from the EC rather than have them magically appear in the local namespace, the former of course being the Pythonic way to do it. Cheers, -Barry From yselivanov.ml at gmail.com Mon Aug 14 15:17:51 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 14 Aug 2017 15:17:51 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Hi Barry, Yes, i18n is another use-case for execution context, and ec should be a perfect fit for it. Yury From yselivanov.ml at gmail.com Mon Aug 14 15:25:43 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 14 Aug 2017 15:25:43 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Nick, you nailed it with your example. In short: current PEP 550 defines Execution Context in such a way, that generators and iterators will interact differently with it. That means that it won't be possible to refactor an iterator class to a generator and that's not acceptable. I'll be rewriting the whole specification section of the PEP today. Yury From ncoghlan at gmail.com Tue Aug 15 06:49:53 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Aug 2017 20:49:53 +1000 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: On 15 August 2017 at 05:25, Yury Selivanov wrote: > Nick, you nailed it with your example. > > In short: current PEP 550 defines Execution Context in such a way, > that generators and iterators will interact differently with it. That > means that it won't be possible to refactor an iterator class to a > generator and that's not acceptable. > > I'll be rewriting the whole specification section of the PEP today. Trying to summarise something I thought of this morning regarding ec_back and implicitly isolating iterator contexts: With the notion of generators running with their own private context by default, that means the state needed to call __next__ on the generator is as follows: - current thread EC - generator's private EC (stored on the generator) - the generator's __next__ method This means that if the EC manipulation were to live in the next() builtin rather than in the individual __next__() methods, then this can be made a general context isolation protocol: - provide a `sys.create_execution_context()` interface - set `__private_context__` on your iterable if you want `next()` to use `ec.run()` (and update __private_context__ afterwards) - set `__private_context__ = None` if you want `next()` to just call `obj.__next__()` directly - generators have __private_context__ set by default, but wrappers like contextlib.contextmanager can clear it That would also suggest that ec.run() will need to return a 2-tuple: def run(self, f: Callable, *args, **kwds) -> Tuple[Any, ExecutionContext]: """Run the given function in this execution context Returns a 2-tuple containing the function result and the execution context that was active when the function returned. """ That way next(itr) will be able to update itr.__private_context__ appropriately if it was initially set and the call changes the active context. We could then give send(), throw() and their asynchronous counterparts the builtin+protocol method treatment, and put the EC manipulation in their builtins as well. Anyway, potentially a useful option to consider as you work on revising the proposal - I'll refrain from further comments until you have an updated draft available :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Tue Aug 15 10:39:27 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 15 Aug 2017 10:39:27 -0400 Subject: [Python-ideas] New PEP 550: Execution Context In-Reply-To: References: Message-ID: Hi Nick, Thanks for writing this! You reminded me that it's crucial to have an ability to fully recreate generator behaviour in an iterator. Besides this being a requirement for a complete EC model, it is something that compilers like Cython absolutely need. I'm still working on a rewrite (which is now a completely different PEP), will probably finish it today. Yury From yselivanov.ml at gmail.com Tue Aug 15 19:55:45 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 15 Aug 2017 19:55:45 -0400 Subject: [Python-ideas] PEP 550 v2 Message-ID: Hi, Here's the PEP 550 version 2. Thanks to a very active and insightful discussion here on Python-ideas, we've discovered a number of problems with the first version of the PEP. This version is a complete rewrite (only Abstract, Rationale, and Goals sections were not updated). The updated PEP is live on python.org: https://www.python.org/dev/peps/pep-0550/ There is no reference implementation at this point, but I'm confident that this version of the spec will have the same extremely low runtime overhead as the first version. Thanks to the new ContextItem design, accessing values in the context is even faster now. Thank you! PEP: 550 Title: Execution Context Version: $Revision$ Last-Modified: $Date$ Author: Yury Selivanov Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2017 Python-Version: 3.7 Post-History: 11-Aug-2017, 15-Aug-2017 Abstract ======== This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in. A few examples of where having a reliable state storage is required: * Context managers like decimal contexts, ``numpy.errstate``, and ``warnings.catch_warnings``; * Storing request-related data such as security tokens and request data in web applications, implementing i18n; * Profiling, tracing, and logging in complex and large code bases. The usual solution for storing state is to use a Thread-local Storage (TLS), implemented in the standard library as ``threading.local()``. Unfortunately, TLS does not work for the purpose of state isolation for generators or asynchronous code, because such code executes concurrently in a single thread. Rationale ========= Traditionally, a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:: def calculate(precision, ...): with decimal.localcontext() as ctx: # Set the precision for decimal calculations # inside this block ctx.prec = precision yield calculate_something() yield calculate_something_else() Decimal context is using a TLS to store the state, and because TLS is not aware of generators, the state can leak. If a user iterates over the ``calculate()`` generator with different precisions one by one using a ``zip()`` built-in, the above code will not work correctly. For example:: g1 = calculate(precision=100) g2 = calculate(precision=50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision (!!!), # second value from g2 calculated with 50 precision. An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix. Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:: async def handle_http_request(request): context.current_http_request = request await ... # Invoke your framework code, render templates, # make DB queries, etc, and use the global # 'current_http_request' in that code. # This isn't currently possible to do reliably # in asyncio out of the box. These examples are just a few out of many, where a reliable way to store context data is absolutely needed. The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases. Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3]_ as an example issue.) Some languages that have coroutines or generators recommend to manually pass a ``context`` object to every function, see [1]_ describing the pattern for Go. This approach, however, has limited use for Python, where we have a huge ecosystem that was built to work with a TLS-like context. Moreover, passing the context explicitly does not work at all for libraries like ``decimal`` or ``numpy``, which use operator overloading. .NET runtime, which has support for async/await, has a generic solution of this problem, called ``ExecutionContext`` (see [2]_). On the surface, working with it is very similar to working with a TLS, but the former explicitly supports asynchronous code. Goals ===== The goal of this PEP is to provide a more reliable alternative to ``threading.local()``. It should be explicitly designed to work with Python execution model, equally supporting threads, generators, and coroutines. An acceptable solution for Python should meet the following requirements: * Transparent support for code executing in threads, coroutines, and generators with an easy to use API. * Negligible impact on the performance of the existing code or the code that will be using the new mechanism. * Fast C API for packages like ``decimal`` and ``numpy``. Explicit is still better than implicit, hence the new APIs should only be used when there is no acceptable way of passing the state explicitly. Specification ============= Execution Context is a mechanism of storing and accessing data specific to a logical thread of execution. We consider OS threads, generators, and chains of coroutines (such as ``asyncio.Task``) to be variants of a logical thread. In this specification, we will use the following terminology: * **Local Context**, or LC, is a key/value mapping that stores the context of a logical thread. * **Execution Context**, or EC, is an OS-thread-specific dynamic stack of Local Contexts. * **Context Item**, or CI, is an object used to set and get values from the Execution Context. Please note that throughout the specification we use simple pseudo-code to illustrate how the EC machinery works. The actual algorithms and data structures that we will use to implement the PEP are discussed in the `Implementation Strategy`_ section. Context Item Object ------------------- The ``sys.new_context_item(description)`` function creates a new ``ContextItem`` object. The ``description`` parameter is a ``str``, explaining the nature of the context key for introspection and debugging purposes. ``ContextItem`` objects have the following methods and attributes: * ``.description``: read-only description; * ``.set(o)`` method: set the value to ``o`` for the context item in the execution context. * ``.get()`` method: return the current EC value for the context item. Context items are initialized with ``None`` when created, so this method call never fails. The below is an example of how context items can be used:: my_context = sys.new_context_item(description='mylib.context') my_context.set('spam') # Later, to access the value of my_context: print(my_context.get()) Thread State and Multi-threaded code ------------------------------------ Execution Context is implemented on top of Thread-local Storage. For every thread there is a separate stack of Local Contexts -- mappings of ``ContextItem`` objects to their values in the LC. New threads always start with an empty EC. For CPython:: PyThreadState: execution_context: ExecutionContext([ LocalContext({ci1: val1, ci2: val2, ...}), ... ]) The ``ContextItem.get()`` and ``.set()`` methods are defined as follows (in pseudo-code):: class ContextItem: def get(self): tstate = PyThreadState_Get() for local_context in reversed(tstate.execution_context): if self in local_context: return local_context[self] def set(self, value): tstate = PyThreadState_Get() if not tstate.execution_context: tstate.execution_context = [LocalContext()] tstate.execution_context[-1][self] = value With the semantics defined so far, the Execution Context can already be used as an alternative to ``threading.local()``:: def print_foo(): print(ci.get() or 'nothing') ci = sys.new_context_item(description='test') ci.set('foo') # Will print "foo": print_foo() # Will print "nothing": threading.Thread(target=print_foo).start() Manual Context Management ------------------------- Execution Context is generally managed by the Python interpreter, but sometimes it is desirable for the user to take the control over it. A few examples when this is needed: * running a computation in ``concurrent.futures.ThreadPoolExecutor`` with the current EC; * reimplementing generators with iterators (more on that later); * managing contexts in asynchronous frameworks (implement proper EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) For these purposes we add a set of new APIs (they will be used in later sections of this specification): * ``sys.new_local_context()``: create an empty ``LocalContext`` object. * ``sys.new_execution_context()``: create an empty ``ExecutionContext`` object. * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque to Python code, and there are no APIs to modify them. * ``sys.get_execution_context()`` function. The function returns a copy of the current EC: an ``ExecutionContext`` instance. The runtime complexity of the actual implementation of this function can be O(1), but for the purposes of this section it is equivalent to:: def get_execution_context(): tstate = PyThreadState_Get() return copy(tstate.execution_context) * ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution context:: def run_with_execution_context(ec, func, *args, **kwargs): tstate = PyThreadState_Get() old_ec = tstate.execution_context tstate.execution_context = ExecutionContext( ec.local_contexts + [LocalContext()] ) try: return func(*args, **kwargs) finally: tstate.execution_context = old_ec Any changes to Local Context by ``func`` will be ignored. This allows to reuse one ``ExecutionContext`` object for multiple invocations of different functions, without them being able to affect each other's environment:: ci = sys.new_context_item('example') ci.set('spam') def func(): print(ci.get()) ci.set('ham') ec = sys.get_execution_context() sys.run_with_execution_context(ec, func) sys.run_with_execution_context(ec, func) # Will print: # spam # spam * ``sys.run_with_local_context(lc: LocalContext, func, *args, **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution context using the specified local context. Any changes that ``func`` does to the local context will be persisted in ``lc``. This behaviour is different from the ``run_with_execution_context()`` function, which always creates a new throw-away local context. In pseudo-code:: def run_with_local_context(lc, func, *args, **kwargs): tstate = PyThreadState_Get() old_ec = tstate.execution_context tstate.execution_context = ExecutionContext( old_ec.local_contexts + [lc] ) try: return func(*args, **kwargs) finally: tstate.execution_context = old_ec Using the previous example:: ci = sys.new_context_item('example') ci.set('spam') def func(): print(ci.get()) ci.set('ham') ec = sys.get_execution_context() lc = sys.new_local_context() sys.run_with_local_context(lc, func) sys.run_with_local_context(lc, func) # Will print: # spam # ham As an example, let's make a subclass of ``concurrent.futures.ThreadPoolExecutor`` that preserves the execution context for scheduled functions:: class Executor(concurrent.futures.ThreadPoolExecutor): def submit(self, fn, *args, **kwargs): context = sys.get_execution_context() fn = functools.partial( sys.run_with_execution_context, context, fn, *args, **kwargs) return super().submit(fn) EC Semantics for Coroutines --------------------------- Python :pep:`492` coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state. An event loop is needed to schedule coroutines. Coroutines that are explicitly scheduled by the user are usually called Tasks. When a coroutine is scheduled, it can schedule other coroutines using an ``await`` expression. In async/await world, awaiting a coroutine is equivalent to a regular function call in synchronous code. Thus, Tasks are similar to threads. By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same OS thread. Coroutine Object Modifications ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To achieve this, a small set of modifications to the coroutine object is needed: * New ``cr_local_context`` attribute. This attribute is readable and writable for Python code. * When a coroutine object is instantiated, its ``cr_local_context`` is initialized with an empty Local Context. * Coroutine's ``.send()`` and ``.throw()`` methods are modified as follows (in pseudo-C):: if coro.cr_local_context is not None: tstate = PyThreadState_Get() tstate.execution_context.push(coro.cr_local_context) try: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro.send(...) finally: coro.cr_local_context = tstate.execution_context.pop() else: # Perform the actual `Coroutine.send()` or # `Coroutine.throw()` call. return coro.send(...) * When Python interpreter sees an ``await`` instruction, it inspects the ``cr_local_context`` attribute of the coroutine that is about to be awaited. For ``await coro``: * If ``coro.cr_local_context`` is an empty ``LocalContext`` object that ``coro`` was created with, the interpreter will set ``coro.cr_local_context`` to ``None``. * If ``coro.cr_local_context`` was modified by Python code, the interpreter will leave it as is. This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task:: ci = sys.new_context_item('example') async def nested(): ci.set('nested') asynd def main(): ci.set('main') print('before:', ci.get()) await nested() print('after:', ci.get()) # Will print: # before: main # after: nested Essentially, coroutines work with Execution Context items similarly to threads, and ``await`` expression acts like a function call. This mechanism also works for ``yield from`` in generators decorated with ``@types.coroutine`` or ``@asyncio.coroutine``, which are called "generator-based coroutines" according to :pep:`492`, and should be fully compatible with native async/await coroutines. Tasks ^^^^^ In asynchronous frameworks like asyncio, coroutines are run by an event loop, and need to be explicitly scheduled (in asyncio coroutines are run by ``asyncio.Task``.) With the currently defined semantics, the interpreter makes coroutines linked by an ``await`` expression share the same Local Context. The interpreter, however, is not aware of the Task concept, and cannot help with ensuring that new Tasks started in coroutines, use the correct EC:: current_request = sys.new_context_item(description='request') async def child(): print('current request:', repr(current_request.get())) async def handle_request(request): current_request.set(request) event_loop.create_task(child) run(top_coro()) # Will print: # current_request: None To enable correct Execution Context propagation into Tasks, the asynchronous framework needs to assist the interpreter: * When ``create_task`` is called, it should capture the current execution context with ``sys.get_execution_context()`` and save it on the Task object. * When the Task object runs its coroutine object, it should execute ``.send()`` and ``.throw()`` methods within the captured execution context, using the ``sys.run_with_execution_context()`` function. With help from the asynchronous framework, the above snippet will run correctly, and the ``child()`` coroutine will be able to access the current request object through the ``current_request`` Context Item. Event Loop Callbacks ^^^^^^^^^^^^^^^^^^^^ Similarly to Tasks, functions like asyncio's ``loop.call_soon()`` should capture the current execution context with ``sys.get_execution_context()`` and execute callbacks within it with ``sys.run_with_execution_context()``. This way the following code will work:: current_request = sys.new_context_item(description='request') def log(): request = current_request.get() print(request) async def request_handler(request): current_request.set(request) get_event_loop.call_soon(log) Generators ---------- Generators in Python, while similar to Coroutines, are used in a fundamentally different way. They are producers of data, and they use ``yield`` expression to suspend/resume their execution. A crucial difference between ``await coro`` and ``yield value`` is that the former expression guarantees that the ``coro`` will be executed fully, while the latter is producing ``value`` and suspending the generator until it gets iterated again. Generators, similarly to coroutines, have a ``gi_local_context`` attribute, which is set to an empty Local Context when created. Contrary to coroutines though, ``yield from o`` expression in generators (that are not generator-based coroutines) is semantically equivalent to ``for v in o: yield v``, therefore the interpreter does not attempt to control their ``gi_local_context``. EC Semantics for Generators ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Every generator object has its own Local Context that stores only its own local modifications of the context. When a generator is being iterated, its local context will be put in the EC stack of the current thread. This means that the generator will be able to see access items from the surrounding context:: local = sys.new_context_item("local") global = sys.new_context_item("global") def generator(): local.set('inside gen:') while True: print(local.get(), global.get()) yield g = gen() local.set('hello') global.set('spam') next(g) local.set('world') global.set('ham') next(g) # Will print: # inside gen: spam # inside gen: ham Any changes to the EC in nested generators are invisible to the outer generator:: local = sys.new_context_item("local") def inner_gen(): local.set('spam') yield def outer_gen(): local.set('ham') yield from gen() print(local.get()) list(outer_gen()) # Will print: # ham Running generators without LC ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Similarly to coroutines, generators with ``gi_local_context`` set to ``None`` simply use the outer Local Context. The ``@contextlib.contextmanager`` decorator uses this mechanism to allow its generator to affect the EC:: item = sys.new_context_item('test') @contextmanager def context(x): old = item.get() item.set('x') try: yield finally: item.set(old) with context('spam'): with context('ham'): print(1, item.get()) print(2, item.get()) # Will print: # 1 ham # 2 spam Implementing Generators with Iterators ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Execution Context API allows to fully replicate EC behaviour imposed on generators with a regular Python iterator class:: class Gen: def __init__(self): self.local_context = sys.new_local_context() def __iter__(self): return self def __next__(self): return sys.run_with_local_context( self.local_context, self._next_impl) def _next_impl(self): # Actual __next__ implementation. ... Asynchronous Generators ----------------------- Asynchronous Generators (AG) interact with the Execution Context similarly to regular generators. They have an ``ag_local_context`` attribute, which, similarly to regular generators, can be set to ``None`` to make them use the outer Local Context. This is used by the new ``contextlib.asynccontextmanager`` decorator. The EC support of ``await`` expression is implemented using the same approach as in coroutines, see the `Coroutine Object Modifications`_ section. Greenlets --------- Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts. In a nutshell, greenlet design is very similar to design of generators. The main difference is that for generators, the stack is managed by the Python interpreter. Greenlet works outside of the Python interpreter, and manually saves some ``PyThreadState`` fields and pushes/pops the C-stack. Thus the ``greenlet`` package can be easily updated to use the new low-level `C API`_ to enable full support of EC. New APIs ======== Python ------ Python APIs were designed to completely hide the internal implementation details, but at the same time provide enough control over EC and LC to re-implement all of Python built-in objects in pure Python. 1. ``sys.new_context_item(description='...')``: create a ``ContextItem`` object used to access/set values in EC. 2. ``ContextItem``: * ``.description``: read-only attribute. * ``.get()``: return the current value for the item. * ``.set(o)``: set the current value in the EC for the item. 3. ``sys.get_execution_context()``: return the current ``ExecutionContext``. 4. ``sys.new_execution_context()``: create a new empty ``ExecutionContext``. 5. ``sys.new_local_context()``: create a new empty ``LocalContext``. 6. ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)``. 7. ``sys.run_with_local_context(lc:LocalContext, func, *args, **kwargs)``. C API ----- 1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a ``PyContextItem`` object. 2. ``PyObject * PyContext_GetItem(PyContextItem *)``: get the current value for the context item. 3. ``int PyContext_SetItem(PyContextItem *, PyObject *)``: set the current value for the context item. 4. ``PyLocalContext * PyLocalContext_New()``: create a new empty ``PyLocalContext``. 5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty ``PyExecutionContext``. 6. ``PyExecutionContext * PyExecutionContext_Get()``: get the EC for the active thread state. 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the passed EC object as the current for the active thread state. 8. ``int PyExecutionContext_SetWithLocalContext(PyExecutionContext *, PyLocalContext *)``: allows to implement ``sys.run_with_local_context`` Python API. Implementation Strategy ======================= LocalContext is a Weak Key Mapping ---------------------------------- Using a weak key mapping for ``LocalContext`` implementation enables the following properties with regards to garbage collection: * ``ContextItem`` objects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their garbage collection. * Values put in the Execution Context are guaranteed to be kept alive while there is a ``ContextItem`` key referencing them in the thread. * If a ``ContextItem`` is garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed. * If a thread has ended its execution, its thread state will be cleaned up along with its ``ExecutionContext``, cleaning up all values bound to all Context Items in the thread. ContextItem.get() Cache ----------------------- We can add three new fields to ``PyThreadState`` and ``PyInterpreterState`` structs: * ``uint64_t PyThreadState->unique_id``: a globally unique thread state identifier (we can add a counter to ``PyInterpreterState`` and increment it when a new thread state is created.) * ``uint64_t PyInterpreterState->context_item_deallocs``: every time a ``ContextItem`` is GCed, all Execution Contexts in all threads will lose track of it. ``context_item_deallocs`` will simply count all ``ContextItem`` deallocations. * ``uint64_t PyThreadState->execution_context_ver``: every time a new item is set, or an existing item is updated, or the stack of execution contexts is changed in the thread, we increment this counter. The above two fields allow implementing a fast cache path in ``ContextItem.get()``, in pseudo-code:: class ContextItem: def get(self): tstate = PyThreadState_Get() if (self.last_tstate_id == tstate.unique_id and self.last_ver == tstate.execution_context_ver self.last_deallocs == tstate.iterp.context_item_deallocs): return self.last_value value = None for mapping in reversed(tstate.execution_context): if self in mapping: value = mapping[self] break self.last_value = value self.last_tstate_id = tstate.unique_id self.last_ver = tstate.execution_context_ver self.last_deallocs = tstate.interp.context_item_deallocs return value This is similar to the trick that decimal C implementation uses for caching the current decimal context, and will have the same performance characteristics, but available to all Execution Context users. Approach #1: Use a dict for LocalContext ---------------------------------------- The straightforward way of implementing the proposed EC mechanisms is to create a ``WeakKeyDict`` on top of Python ``dict`` type. To implement the ``ExecutionContext`` type we can use Python ``list`` (or a custom stack implementation with some pre-allocation optimizations). This approach will have the following runtime complexity: * O(M) for ``ContextItem.get()``, where ``M`` is the number of Local Contexts in the stack. It is important to note that ``ContextItem.get()`` will implement a cache making the operation O(1) for packages like ``decimal`` and ``numpy``. * O(1) for ``ContextItem.set()``. * O(N) for ``sys.get_execution_context()``, where ``N`` is the total number of items in the current **execution** context. Approach #2: Use HAMT for LocalContext -------------------------------------- Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5]_, [6]_. Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) performance for both ``set()``, ``get()``, and ``merge()`` operations, which is essentially O(1) for relatively small mappings (read about HAMT performance in CPython in the `Appendix: HAMT Performance`_ section.) In this approach we use the same design of the ``ExecutionContext`` as in Approach #1, but we will use HAMT backed weak key Local Context implementation. With that we will have the following runtime complexity: * O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``, where ``M`` is the number of Local Contexts in the stack, and ``N`` is the number of items in the EC. The operation will essentially be O(M), because execution contexts are normally not expected to have more than a few dozen of items. (``ContextItem.get()`` will have the same caching mechanism as in Approach #1.) * O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the number of items in the current **local** context. This will essentially be an O(1) operation most of the time. * O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where ``N`` is the total number of items in the current **execution** context. Essentially, using HAMT for Local Contexts instead of Python dicts, allows to bring down the complexity of ``sys.get_execution_context()`` from O(N) to O(log\ :sub:`32`\ N) because of the more efficient merge algorithm. Approach #3: Use HAMT and Immutable Linked List ----------------------------------------------- We can make an alternative ``ExecutionContext`` design by using a linked list. Each ``LocalContext`` in the ``ExecutionContext`` object will be wrapped in a linked-list node. ``LocalContext`` objects will use an HAMT backed weak key implementation described in the Approach #2. Every modification to the current ``LocalContext`` will produce a new version of it, which will be wrapped in a **new linked list node**. Essentially this means, that ``ExecutionContext`` is an immutable forest of ``LocalContext`` objects, and can be safely copied by reference in ``sys.get_execution_context()`` (eliminating the expensive "merge" operation.) With this approach, ``sys.get_execution_context()`` will be an **O(1) operation**. Summary ------- We believe that approach #3 enables an efficient and complete Execution Context implementation, with excellent runtime performance. `ContextItem.get() Cache`_ enables fast retrieval of context items for performance critical libraries like decimal and numpy. Fast ``sys.get_execution_context()`` enables efficient management of execution contexts in asynchronous libraries like asyncio. Design Considerations ===================== Can we fix ``PyThreadState_GetDict()``? --------------------------------------- ``PyThreadState_GetDict`` is a TLS, and some of its existing users might depend on it being just a TLS. Changing its behaviour to follow the Execution Context semantics would break backwards compatibility. PEP 521 ------- :pep:`521` proposes an alternative solution to the problem: enhance Context Manager Protocol with two new methods: ``__suspend__`` and ``__resume__``. To make it compatible with async/await, the Asynchronous Context Manager Protocol will also need to be extended with ``__asuspend__`` and ``__aresume__``. This allows to implement context managers like decimal context and ``numpy.errstate`` for generators and coroutines. The following code:: class Context: def __enter__(self): self.old_x = get_execution_context_item('x') set_execution_context_item('x', 'something') def __exit__(self, *err): set_execution_context_item('x', self.old_x) would become this:: local = threading.local() class Context: def __enter__(self): self.old_x = getattr(local, 'x', None) local.x = 'something' def __suspend__(self): local.x = self.old_x def __resume__(self): local.x = 'something' def __exit__(self, *err): local.x = self.old_x Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation. :pep:`521` also does not provide any mechanism to propagate state in a local context, like storing a request object in an HTTP request handler to have better logging. Nor does it solve the leaking state problem for greenlet/gevent. Can Execution Context be implemented outside of CPython? -------------------------------------------------------- Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines. Generators, on the other hand, do not have an event loop or trampoline, making it impossible to intercept their ``yield`` points outside of the Python interpreter. Backwards Compatibility ======================= This proposal preserves 100% backwards compatibility. Appendix: HAMT Performance ========================== To assess if HAMT can be used for Execution Context, we implemented it in CPython [7]_. .. figure:: pep-0550-hamt_vs_dict.png :align: center :width: 100% Figure 1. Benchmark code can be found here: [9]_. Figure 1 shows that HAMT indeed displays O(1) performance for all benchmarked dictionary sizes. For dictionaries with less than 100 items, HAMT is a bit slower than Python dict/shallow copy. .. figure:: pep-0550-lookup_hamt.png :align: center :width: 100% Figure 2. Benchmark code can be found here: [10]_. Figure 2 shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized. Note, that according to [8]_, HAMT design can be further improved. Acknowledgments =============== I thank Elvis Pranskevichus and Victor Petrovykh for countless discussions around the topic and PEP proof reading and edits. Thanks to Nathaniel Smith for proposing the ``ContextItem`` design [17]_ [18]_, for pushing the PEP towards a more complete design, and coming up with the idea of having a stack of contexts in the thread state. Thanks to Nick Coghlan for numerous suggestions and ideas on the mailing list, and for coming up with a case that cause the complete rewrite of the initial PEP version [19]_. References ========== .. [1] https://blog.golang.org/context .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx .. [3] https://github.com/numpy/numpy/issues/9444 .. [4] http://bugs.python.org/issue31179 .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html .. [7] https://github.com/1st1/cpython/tree/hamt .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e .. [11] https://github.com/1st1/cpython/tree/pep550 .. [12] https://www.python.org/dev/peps/pep-0492/#async-await .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py .. [14] https://github.com/MagicStack/pgbench .. [15] https://github.com/python/performance .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c .. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html .. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html .. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046780.html Copyright ========= This document has been placed in the public domain. From arj.python at gmail.com Wed Aug 16 00:41:56 2017 From: arj.python at gmail.com (Abdur-Rahmaan Janhangeer) Date: Wed, 16 Aug 2017 08:41:56 +0400 Subject: [Python-ideas] DOM syntax guide In-Reply-To: References: Message-ID: greetings all, i like python and lot and would like to use it everywhere ... upto on the web (not django type). For python js-compiled versions (for makers) can you provide some syntax guidelines for dom access ? Abdur-Rahmaan Janhangeer, Mauritius abdurrahmaanjanhangeer.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jelle.zijlstra at gmail.com Wed Aug 16 02:53:04 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Wed, 16 Aug 2017 08:53:04 +0200 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: 2017-08-16 1:55 GMT+02:00 Yury Selivanov : > Hi, > > Here's the PEP 550 version 2. Thanks to a very active and insightful > discussion here on Python-ideas, we've discovered a number of > problems with the first version of the PEP. This version is a complete > rewrite (only Abstract, Rationale, and Goals sections were not updated). > > The updated PEP is live on python.org: > https://www.python.org/dev/peps/pep-0550/ > > There is no reference implementation at this point, but I'm confident > that this version of the spec will have the same extremely low > runtime overhead as the first version. Thanks to the new ContextItem > design, accessing values in the context is even faster now. > > Thank you! > > > PEP: 550 > Title: Execution Context > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 11-Aug-2017 > Python-Version: 3.7 > Post-History: 11-Aug-2017, 15-Aug-2017 > > > Abstract > ======== > > This PEP proposes a new mechanism to manage execution state--the > logical environment in which a function, a thread, a generator, > or a coroutine executes in. > > A few examples of where having a reliable state storage is required: > > * Context managers like decimal contexts, ``numpy.errstate``, > and ``warnings.catch_warnings``; > > * Storing request-related data such as security tokens and request > data in web applications, implementing i18n; > > * Profiling, tracing, and logging in complex and large code bases. > > The usual solution for storing state is to use a Thread-local Storage > (TLS), implemented in the standard library as ``threading.local()``. > Unfortunately, TLS does not work for the purpose of state isolation > for generators or asynchronous code, because such code executes > concurrently in a single thread. > > > Rationale > ========= > > Traditionally, a Thread-local Storage (TLS) is used for storing the > state. However, the major flaw of using the TLS is that it works only > for multi-threaded code. It is not possible to reliably contain the > state within a generator or a coroutine. For example, consider > the following generator:: > > def calculate(precision, ...): > with decimal.localcontext() as ctx: > # Set the precision for decimal calculations > # inside this block > ctx.prec = precision > > yield calculate_something() > yield calculate_something_else() > > Decimal context is using a TLS to store the state, and because TLS is > not aware of generators, the state can leak. If a user iterates over > the ``calculate()`` generator with different precisions one by one > using a ``zip()`` built-in, the above code will not work correctly. > For example:: > > g1 = calculate(precision=100) > g2 = calculate(precision=50) > > items = list(zip(g1, g2)) > > # items[0] will be a tuple of: > # first value from g1 calculated with 100 precision, > # first value from g2 calculated with 50 precision. > # > # items[1] will be a tuple of: > # second value from g1 calculated with 50 precision (!!!), > # second value from g2 calculated with 50 precision. > > An even scarier example would be using decimals to represent money > in an async/await application: decimal calculations can suddenly > lose precision in the middle of processing a request. Currently, > bugs like this are extremely hard to find and fix. > > Another common need for web applications is to have access to the > current request object, or security context, or, simply, the request > URL for logging or submitting performance tracing data:: > > async def handle_http_request(request): > context.current_http_request = request > > await ... > # Invoke your framework code, render templates, > # make DB queries, etc, and use the global > # 'current_http_request' in that code. > > # This isn't currently possible to do reliably > # in asyncio out of the box. > > These examples are just a few out of many, where a reliable way to > store context data is absolutely needed. > > The inability to use TLS for asynchronous code has lead to > proliferation of ad-hoc solutions, which are limited in scope and > do not support all required use cases. > > Current status quo is that any library, including the standard > library, that uses a TLS, will likely not work as expected in > asynchronous code or with generators (see [3]_ as an example issue.) > > Some languages that have coroutines or generators recommend to > manually pass a ``context`` object to every function, see [1]_ > describing the pattern for Go. This approach, however, has limited > use for Python, where we have a huge ecosystem that was built to work > with a TLS-like context. Moreover, passing the context explicitly > does not work at all for libraries like ``decimal`` or ``numpy``, > which use operator overloading. > > .NET runtime, which has support for async/await, has a generic > solution of this problem, called ``ExecutionContext`` (see [2]_). > On the surface, working with it is very similar to working with a TLS, > but the former explicitly supports asynchronous code. > > > Goals > ===== > > The goal of this PEP is to provide a more reliable alternative to > ``threading.local()``. It should be explicitly designed to work with > Python execution model, equally supporting threads, generators, and > coroutines. > > An acceptable solution for Python should meet the following > requirements: > > * Transparent support for code executing in threads, coroutines, > and generators with an easy to use API. > > * Negligible impact on the performance of the existing code or the > code that will be using the new mechanism. > > * Fast C API for packages like ``decimal`` and ``numpy``. > > Explicit is still better than implicit, hence the new APIs should only > be used when there is no acceptable way of passing the state > explicitly. > > > Specification > ============= > > Execution Context is a mechanism of storing and accessing data specific > to a logical thread of execution. We consider OS threads, > generators, and chains of coroutines (such as ``asyncio.Task``) > to be variants of a logical thread. > > In this specification, we will use the following terminology: > > * **Local Context**, or LC, is a key/value mapping that stores the > context of a logical thread. > > * **Execution Context**, or EC, is an OS-thread-specific dynamic > stack of Local Contexts. > > * **Context Item**, or CI, is an object used to set and get values > from the Execution Context. > > Please note that throughout the specification we use simple > pseudo-code to illustrate how the EC machinery works. The actual > algorithms and data structures that we will use to implement the PEP > are discussed in the `Implementation Strategy`_ section. > > > Context Item Object > ------------------- > > The ``sys.new_context_item(description)`` function creates a > new ``ContextItem`` object. The ``description`` parameter is a > ``str``, explaining the nature of the context key for introspection > and debugging purposes. > > ``ContextItem`` objects have the following methods and attributes: > > * ``.description``: read-only description; > > * ``.set(o)`` method: set the value to ``o`` for the context item > in the execution context. > > * ``.get()`` method: return the current EC value for the context item. > Context items are initialized with ``None`` when created, so > this method call never fails. > > The below is an example of how context items can be used:: > > my_context = sys.new_context_item(description='mylib.context') > my_context.set('spam') > Minor suggestion: Could we allow something like `sys.set_new_context_item(description='mylib.context', initial_value='spam')`? That would make it easier for type checkers to infer the type of a ContextItem, and it would save a line of code in the common case. With this modification, the type of new_context_item would be @overload def new_context_item(*, description: str, initial_value: T) -> ContextItem[T]: ... @overload def new_context_item(*, description: str) -> ContextItem[Any]: ... If we only allow the second variant, type checkers would need some sort of special casing to figure out that after .set(), .get() will return the same type. > # Later, to access the value of my_context: > print(my_context.get()) > > > Thread State and Multi-threaded code > ------------------------------------ > > Execution Context is implemented on top of Thread-local Storage. > For every thread there is a separate stack of Local Contexts -- > mappings of ``ContextItem`` objects to their values in the LC. > New threads always start with an empty EC. > > For CPython:: > > PyThreadState: > execution_context: ExecutionContext([ > LocalContext({ci1: val1, ci2: val2, ...}), > ... > ]) > > The ``ContextItem.get()`` and ``.set()`` methods are defined as > follows (in pseudo-code):: > > class ContextItem: > > def get(self): > tstate = PyThreadState_Get() > > for local_context in reversed(tstate.execution_context): > if self in local_context: > return local_context[self] > > def set(self, value): > tstate = PyThreadState_Get() > > if not tstate.execution_context: > tstate.execution_context = [LocalContext()] > > tstate.execution_context[-1][self] = value > > With the semantics defined so far, the Execution Context can already > be used as an alternative to ``threading.local()``:: > > def print_foo(): > print(ci.get() or 'nothing') > > ci = sys.new_context_item(description='test') > ci.set('foo') > > # Will print "foo": > print_foo() > > # Will print "nothing": > threading.Thread(target=print_foo).start() > > > Manual Context Management > ------------------------- > > Execution Context is generally managed by the Python interpreter, > but sometimes it is desirable for the user to take the control > over it. A few examples when this is needed: > > * running a computation in ``concurrent.futures.ThreadPoolExecutor`` > with the current EC; > > * reimplementing generators with iterators (more on that later); > > * managing contexts in asynchronous frameworks (implement proper > EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) > > For these purposes we add a set of new APIs (they will be used in > later sections of this specification): > > * ``sys.new_local_context()``: create an empty ``LocalContext`` > object. > > * ``sys.new_execution_context()``: create an empty > ``ExecutionContext`` object. > > * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque > to Python code, and there are no APIs to modify them. > > * ``sys.get_execution_context()`` function. The function returns a > copy of the current EC: an ``ExecutionContext`` instance. > > The runtime complexity of the actual implementation of this function > can be O(1), but for the purposes of this section it is equivalent > to:: > > def get_execution_context(): > tstate = PyThreadState_Get() > return copy(tstate.execution_context) > > * ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, > **kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution > context:: > > def run_with_execution_context(ec, func, *args, **kwargs): > tstate = PyThreadState_Get() > > old_ec = tstate.execution_context > > tstate.execution_context = ExecutionContext( > ec.local_contexts + [LocalContext()] > ) > > try: > return func(*args, **kwargs) > finally: > tstate.execution_context = old_ec > > Any changes to Local Context by ``func`` will be ignored. > This allows to reuse one ``ExecutionContext`` object for multiple > invocations of different functions, without them being able to > affect each other's environment:: > > ci = sys.new_context_item('example') > ci.set('spam') > > def func(): > print(ci.get()) > ci.set('ham') > > ec = sys.get_execution_context() > > sys.run_with_execution_context(ec, func) > sys.run_with_execution_context(ec, func) > > # Will print: > # spam > # spam > > * ``sys.run_with_local_context(lc: LocalContext, func, *args, > **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution > context using the specified local context. > > Any changes that ``func`` does to the local context will be > persisted in ``lc``. This behaviour is different from the > ``run_with_execution_context()`` function, which always creates > a new throw-away local context. > > In pseudo-code:: > > def run_with_local_context(lc, func, *args, **kwargs): > tstate = PyThreadState_Get() > > old_ec = tstate.execution_context > > tstate.execution_context = ExecutionContext( > old_ec.local_contexts + [lc] > ) > > try: > return func(*args, **kwargs) > finally: > tstate.execution_context = old_ec > > Using the previous example:: > > ci = sys.new_context_item('example') > ci.set('spam') > > def func(): > print(ci.get()) > ci.set('ham') > > ec = sys.get_execution_context() > lc = sys.new_local_context() > > sys.run_with_local_context(lc, func) > sys.run_with_local_context(lc, func) > > # Will print: > # spam > # ham > > As an example, let's make a subclass of > ``concurrent.futures.ThreadPoolExecutor`` that preserves the execution > context for scheduled functions:: > > class Executor(concurrent.futures.ThreadPoolExecutor): > > def submit(self, fn, *args, **kwargs): > context = sys.get_execution_context() > > fn = functools.partial( > sys.run_with_execution_context, context, > fn, *args, **kwargs) > > return super().submit(fn) > > > EC Semantics for Coroutines > --------------------------- > > Python :pep:`492` coroutines are used to implement cooperative > multitasking. For a Python end-user they are similar to threads, > especially when it comes to sharing resources or modifying > the global state. > > An event loop is needed to schedule coroutines. Coroutines that > are explicitly scheduled by the user are usually called Tasks. > When a coroutine is scheduled, it can schedule other coroutines using > an ``await`` expression. In async/await world, awaiting a coroutine > is equivalent to a regular function call in synchronous code. Thus, > Tasks are similar to threads. > > By drawing a parallel between regular multithreaded code and > async/await, it becomes apparent that any modification of the > execution context within one Task should be visible to all coroutines > scheduled within it. Any execution context modifications, however, > must not be visible to other Tasks executing within the same OS > thread. > > > Coroutine Object Modifications > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > To achieve this, a small set of modifications to the coroutine object > is needed: > > * New ``cr_local_context`` attribute. This attribute is readable > and writable for Python code. > > * When a coroutine object is instantiated, its ``cr_local_context`` > is initialized with an empty Local Context. > > * Coroutine's ``.send()`` and ``.throw()`` methods are modified as > follows (in pseudo-C):: > > if coro.cr_local_context is not None: > tstate = PyThreadState_Get() > > tstate.execution_context.push(coro.cr_local_context) > > try: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro.send(...) > finally: > coro.cr_local_context = tstate.execution_context.pop() > else: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro.send(...) > > * When Python interpreter sees an ``await`` instruction, it inspects > the ``cr_local_context`` attribute of the coroutine that is about > to be awaited. For ``await coro``: > > * If ``coro.cr_local_context`` is an empty ``LocalContext`` object > that ``coro`` was created with, the interpreter will set > ``coro.cr_local_context`` to ``None``. > > * If ``coro.cr_local_context`` was modified by Python code, the > interpreter will leave it as is. > > This makes any changes to execution context made by nested coroutine > calls within a Task to be visible throughout the Task:: > > ci = sys.new_context_item('example') > > async def nested(): > ci.set('nested') > > asynd def main(): > ci.set('main') > print('before:', ci.get()) > await nested() > print('after:', ci.get()) > > # Will print: > # before: main > # after: nested > > Essentially, coroutines work with Execution Context items similarly > to threads, and ``await`` expression acts like a function call. > > This mechanism also works for ``yield from`` in generators decorated > with ``@types.coroutine`` or ``@asyncio.coroutine``, which are > called "generator-based coroutines" according to :pep:`492`, > and should be fully compatible with native async/await coroutines. > > > Tasks > ^^^^^ > > In asynchronous frameworks like asyncio, coroutines are run by > an event loop, and need to be explicitly scheduled (in asyncio > coroutines are run by ``asyncio.Task``.) > > With the currently defined semantics, the interpreter makes > coroutines linked by an ``await`` expression share the same > Local Context. > > The interpreter, however, is not aware of the Task concept, and > cannot help with ensuring that new Tasks started in coroutines, > use the correct EC:: > > current_request = sys.new_context_item(description='request') > > async def child(): > print('current request:', repr(current_request.get())) > > async def handle_request(request): > current_request.set(request) > event_loop.create_task(child) > > run(top_coro()) > > # Will print: > # current_request: None > > To enable correct Execution Context propagation into Tasks, the > asynchronous framework needs to assist the interpreter: > > * When ``create_task`` is called, it should capture the current > execution context with ``sys.get_execution_context()`` and save it > on the Task object. > > * When the Task object runs its coroutine object, it should execute > ``.send()`` and ``.throw()`` methods within the captured > execution context, using the ``sys.run_with_execution_context()`` > function. > > With help from the asynchronous framework, the above snippet will > run correctly, and the ``child()`` coroutine will be able to access > the current request object through the ``current_request`` > Context Item. > > > Event Loop Callbacks > ^^^^^^^^^^^^^^^^^^^^ > > Similarly to Tasks, functions like asyncio's ``loop.call_soon()`` > should capture the current execution context with > ``sys.get_execution_context()`` and execute callbacks > within it with ``sys.run_with_execution_context()``. > > This way the following code will work:: > > current_request = sys.new_context_item(description='request') > > def log(): > request = current_request.get() > print(request) > > async def request_handler(request): > current_request.set(request) > get_event_loop.call_soon(log) > > > Generators > ---------- > > Generators in Python, while similar to Coroutines, are used in a > fundamentally different way. They are producers of data, and > they use ``yield`` expression to suspend/resume their execution. > > A crucial difference between ``await coro`` and ``yield value`` is > that the former expression guarantees that the ``coro`` will be > executed fully, while the latter is producing ``value`` and > suspending the generator until it gets iterated again. > > Generators, similarly to coroutines, have a ``gi_local_context`` > attribute, which is set to an empty Local Context when created. > > Contrary to coroutines though, ``yield from o`` expression in > generators (that are not generator-based coroutines) is semantically > equivalent to ``for v in o: yield v``, therefore the interpreter does > not attempt to control their ``gi_local_context``. > > > EC Semantics for Generators > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Every generator object has its own Local Context that stores > only its own local modifications of the context. When a generator > is being iterated, its local context will be put in the EC stack > of the current thread. This means that the generator will be able > to see access items from the surrounding context:: > > local = sys.new_context_item("local") > global = sys.new_context_item("global") > > def generator(): > local.set('inside gen:') > while True: > print(local.get(), global.get()) > yield > > g = gen() > > local.set('hello') > global.set('spam') > next(g) > > local.set('world') > global.set('ham') > next(g) > > # Will print: > # inside gen: spam > # inside gen: ham > > Any changes to the EC in nested generators are invisible to the outer > generator:: > > local = sys.new_context_item("local") > > def inner_gen(): > local.set('spam') > yield > > def outer_gen(): > local.set('ham') > yield from gen() > print(local.get()) > > list(outer_gen()) > > # Will print: > # ham > > > Running generators without LC > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Similarly to coroutines, generators with ``gi_local_context`` > set to ``None`` simply use the outer Local Context. > > The ``@contextlib.contextmanager`` decorator uses this mechanism to > allow its generator to affect the EC:: > > item = sys.new_context_item('test') > > @contextmanager > def context(x): > old = item.get() > item.set('x') > try: > yield > finally: > item.set(old) > > with context('spam'): > > with context('ham'): > print(1, item.get()) > > print(2, item.get()) > > # Will print: > # 1 ham > # 2 spam > > > Implementing Generators with Iterators > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > The Execution Context API allows to fully replicate EC behaviour > imposed on generators with a regular Python iterator class:: > > class Gen: > > def __init__(self): > self.local_context = sys.new_local_context() > > def __iter__(self): > return self > > def __next__(self): > return sys.run_with_local_context( > self.local_context, self._next_impl) > > def _next_impl(self): > # Actual __next__ implementation. > ... > > > Asynchronous Generators > ----------------------- > > Asynchronous Generators (AG) interact with the Execution Context > similarly to regular generators. > > They have an ``ag_local_context`` attribute, which, similarly to > regular generators, can be set to ``None`` to make them use the outer > Local Context. This is used by the new > ``contextlib.asynccontextmanager`` decorator. > > The EC support of ``await`` expression is implemented using the same > approach as in coroutines, see the `Coroutine Object Modifications`_ > section. > > > Greenlets > --------- > > Greenlet is an alternative implementation of cooperative > scheduling for Python. Although greenlet package is not part of > CPython, popular frameworks like gevent rely on it, and it is > important that greenlet can be modified to support execution > contexts. > > In a nutshell, greenlet design is very similar to design of > generators. The main difference is that for generators, the stack > is managed by the Python interpreter. Greenlet works outside of the > Python interpreter, and manually saves some ``PyThreadState`` > fields and pushes/pops the C-stack. Thus the ``greenlet`` package > can be easily updated to use the new low-level `C API`_ to enable > full support of EC. > > > New APIs > ======== > > Python > ------ > > Python APIs were designed to completely hide the internal > implementation details, but at the same time provide enough control > over EC and LC to re-implement all of Python built-in objects > in pure Python. > > 1. ``sys.new_context_item(description='...')``: create a > ``ContextItem`` object used to access/set values in EC. > > 2. ``ContextItem``: > > * ``.description``: read-only attribute. > * ``.get()``: return the current value for the item. > * ``.set(o)``: set the current value in the EC for the item. > > 3. ``sys.get_execution_context()``: return the current > ``ExecutionContext``. > > 4. ``sys.new_execution_context()``: create a new empty > ``ExecutionContext``. > > 5. ``sys.new_local_context()``: create a new empty ``LocalContext``. > > 6. ``sys.run_with_execution_context(ec: ExecutionContext, > func, *args, **kwargs)``. > > 7. ``sys.run_with_local_context(lc:LocalContext, > func, *args, **kwargs)``. > > > C API > ----- > > 1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a > ``PyContextItem`` object. > > 2. ``PyObject * PyContext_GetItem(PyContextItem *)``: get the > current value for the context item. > > 3. ``int PyContext_SetItem(PyContextItem *, PyObject *)``: set > the current value for the context item. > > 4. ``PyLocalContext * PyLocalContext_New()``: create a new empty > ``PyLocalContext``. > > 5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty > ``PyExecutionContext``. > > 6. ``PyExecutionContext * PyExecutionContext_Get()``: get the > EC for the active thread state. > > 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the > passed EC object as the current for the active thread state. > > 8. ``int PyExecutionContext_SetWithLocalContext(PyExecutionContext *, > PyLocalContext *)``: allows to implement > ``sys.run_with_local_context`` Python API. > > > Implementation Strategy > ======================= > > LocalContext is a Weak Key Mapping > ---------------------------------- > > Using a weak key mapping for ``LocalContext`` implementation > enables the following properties with regards to garbage > collection: > > * ``ContextItem`` objects are strongly-referenced only from the > application code, not from any of the Execution Context > machinery or values they point to. This means that there > are no reference cycles that could extend their lifespan > longer than necessary, or prevent their garbage collection. > > * Values put in the Execution Context are guaranteed to be kept > alive while there is a ``ContextItem`` key referencing them in > the thread. > > * If a ``ContextItem`` is garbage collected, all of its values will > be removed from all contexts, allowing them to be GCed if needed. > > * If a thread has ended its execution, its thread state will be > cleaned up along with its ``ExecutionContext``, cleaning > up all values bound to all Context Items in the thread. > > > ContextItem.get() Cache > ----------------------- > > We can add three new fields to ``PyThreadState`` and > ``PyInterpreterState`` structs: > > * ``uint64_t PyThreadState->unique_id``: a globally unique > thread state identifier (we can add a counter to > ``PyInterpreterState`` and increment it when a new thread state is > created.) > > * ``uint64_t PyInterpreterState->context_item_deallocs``: every time > a ``ContextItem`` is GCed, all Execution Contexts in all threads > will lose track of it. ``context_item_deallocs`` will simply > count all ``ContextItem`` deallocations. > > * ``uint64_t PyThreadState->execution_context_ver``: every time > a new item is set, or an existing item is updated, or the stack > of execution contexts is changed in the thread, we increment this > counter. > > The above two fields allow implementing a fast cache path in > ``ContextItem.get()``, in pseudo-code:: > > class ContextItem: > > def get(self): > tstate = PyThreadState_Get() > > if (self.last_tstate_id == tstate.unique_id and > self.last_ver == tstate.execution_context_ver > self.last_deallocs == > tstate.iterp.context_item_deallocs): > return self.last_value > > value = None > for mapping in reversed(tstate.execution_context): > if self in mapping: > value = mapping[self] > break > > self.last_value = value > self.last_tstate_id = tstate.unique_id > self.last_ver = tstate.execution_context_ver > self.last_deallocs = tstate.interp.context_item_deallocs > > return value > > This is similar to the trick that decimal C implementation uses > for caching the current decimal context, and will have the same > performance characteristics, but available to all > Execution Context users. > > > Approach #1: Use a dict for LocalContext > ---------------------------------------- > > The straightforward way of implementing the proposed EC > mechanisms is to create a ``WeakKeyDict`` on top of Python > ``dict`` type. > > To implement the ``ExecutionContext`` type we can use Python > ``list`` (or a custom stack implementation with some > pre-allocation optimizations). > > This approach will have the following runtime complexity: > > * O(M) for ``ContextItem.get()``, where ``M`` is the number of > Local Contexts in the stack. > > It is important to note that ``ContextItem.get()`` will implement > a cache making the operation O(1) for packages like ``decimal`` > and ``numpy``. > > * O(1) for ``ContextItem.set()``. > > * O(N) for ``sys.get_execution_context()``, where ``N`` is the > total number of items in the current **execution** context. > > > Approach #2: Use HAMT for LocalContext > -------------------------------------- > > Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) > to implement high performance immutable collections [5]_, [6]_. > > Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) > performance for both ``set()``, ``get()``, and ``merge()`` operations, > which is essentially O(1) for relatively small mappings > (read about HAMT performance in CPython in the > `Appendix: HAMT Performance`_ section.) > > In this approach we use the same design of the ``ExecutionContext`` > as in Approach #1, but we will use HAMT backed weak key Local Context > implementation. With that we will have the following runtime > complexity: > > * O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``, > where ``M`` is the number of Local Contexts in the stack, > and ``N`` is the number of items in the EC. The operation will > essentially be O(M), because execution contexts are normally not > expected to have more than a few dozen of items. > > (``ContextItem.get()`` will have the same caching mechanism as in > Approach #1.) > > * O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the > number of items in the current **local** context. This will > essentially be an O(1) operation most of the time. > > * O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where > ``N`` is the total number of items in the current **execution** > context. > > Essentially, using HAMT for Local Contexts instead of Python dicts, > allows to bring down the complexity of ``sys.get_execution_context()`` > from O(N) to O(log\ :sub:`32`\ N) because of the more efficient > merge algorithm. > > > Approach #3: Use HAMT and Immutable Linked List > ----------------------------------------------- > > We can make an alternative ``ExecutionContext`` design by using > a linked list. Each ``LocalContext`` in the ``ExecutionContext`` > object will be wrapped in a linked-list node. > > ``LocalContext`` objects will use an HAMT backed weak key > implementation described in the Approach #2. > > Every modification to the current ``LocalContext`` will produce a > new version of it, which will be wrapped in a **new linked list > node**. Essentially this means, that ``ExecutionContext`` is an > immutable forest of ``LocalContext`` objects, and can be safely > copied by reference in ``sys.get_execution_context()`` (eliminating > the expensive "merge" operation.) > > With this approach, ``sys.get_execution_context()`` will be an > **O(1) operation**. > > > Summary > ------- > > We believe that approach #3 enables an efficient and complete > Execution Context implementation, with excellent runtime performance. > > `ContextItem.get() Cache`_ enables fast retrieval of context items > for performance critical libraries like decimal and numpy. > > Fast ``sys.get_execution_context()`` enables efficient management > of execution contexts in asynchronous libraries like asyncio. > > > Design Considerations > ===================== > > Can we fix ``PyThreadState_GetDict()``? > --------------------------------------- > > ``PyThreadState_GetDict`` is a TLS, and some of its existing users > might depend on it being just a TLS. Changing its behaviour to follow > the Execution Context semantics would break backwards compatibility. > > > PEP 521 > ------- > > :pep:`521` proposes an alternative solution to the problem: > enhance Context Manager Protocol with two new methods: ``__suspend__`` > and ``__resume__``. To make it compatible with async/await, > the Asynchronous Context Manager Protocol will also need to be > extended with ``__asuspend__`` and ``__aresume__``. > > This allows to implement context managers like decimal context and > ``numpy.errstate`` for generators and coroutines. > > The following code:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > would become this:: > > local = threading.local() > > class Context: > > def __enter__(self): > self.old_x = getattr(local, 'x', None) > local.x = 'something' > > def __suspend__(self): > local.x = self.old_x > > def __resume__(self): > local.x = 'something' > > def __exit__(self, *err): > local.x = self.old_x > > Besides complicating the protocol, the implementation will likely > negatively impact performance of coroutines, generators, and any code > that uses context managers, and will notably complicate the > interpreter implementation. > > :pep:`521` also does not provide any mechanism to propagate state > in a local context, like storing a request object in an HTTP request > handler to have better logging. Nor does it solve the leaking state > problem for greenlet/gevent. > > > Can Execution Context be implemented outside of CPython? > -------------------------------------------------------- > > Because async/await code needs an event loop to run it, an EC-like > solution can be implemented in a limited way for coroutines. > > Generators, on the other hand, do not have an event loop or > trampoline, making it impossible to intercept their ``yield`` points > outside of the Python interpreter. > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > > Appendix: HAMT Performance > ========================== > > To assess if HAMT can be used for Execution Context, we implemented > it in CPython [7]_. > > .. figure:: pep-0550-hamt_vs_dict.png > :align: center > :width: 100% > > Figure 1. Benchmark code can be found here: [9]_. > > Figure 1 shows that HAMT indeed displays O(1) performance for all > benchmarked dictionary sizes. For dictionaries with less than 100 > items, HAMT is a bit slower than Python dict/shallow copy. > > .. figure:: pep-0550-lookup_hamt.png > :align: center > :width: 100% > > Figure 2. Benchmark code can be found here: [10]_. > > Figure 2 shows comparison of lookup costs between Python dict > and an HAMT immutable mapping. HAMT lookup time is 30-40% worse > than Python dict lookups on average, which is a very good result, > considering how well Python dicts are optimized. > > Note, that according to [8]_, HAMT design can be further improved. > > > Acknowledgments > =============== > > I thank Elvis Pranskevichus and Victor Petrovykh for countless > discussions around the topic and PEP proof reading and edits. > > Thanks to Nathaniel Smith for proposing the ``ContextItem`` design > [17]_ [18]_, for pushing the PEP towards a more complete design, and > coming up with the idea of having a stack of contexts in the thread > state. > > Thanks to Nick Coghlan for numerous suggestions and ideas on the > mailing list, and for coming up with a case that cause the complete > rewrite of the initial PEP version [19]_. > > > References > ========== > > .. [1] https://blog.golang.org/context > > .. [2] https://msdn.microsoft.com/en-us/library/system.threading. > executioncontext.aspx > > .. [3] https://github.com/numpy/numpy/issues/9444 > > .. [4] http://bugs.python.org/issue31179 > > .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie > > .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures- > persistenthashmap-part-ii.html > > .. [7] https://github.com/1st1/cpython/tree/hamt > > .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf > > .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > .. [11] https://github.com/1st1/cpython/tree/pep550 > > .. [12] https://www.python.org/dev/peps/pep-0492/#async-await > > .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/ > bench/echoserver.py > > .. [14] https://github.com/MagicStack/pgbench > > .. [15] https://github.com/python/performance > > .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c > > .. [17] https://mail.python.org/pipermail/python-ideas/2017- > August/046752.html > > .. [18] https://mail.python.org/pipermail/python-ideas/2017- > August/046772.html > > .. [19] https://mail.python.org/pipermail/python-ideas/2017- > August/046780.html > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 16 03:18:23 2017 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Aug 2017 00:18:23 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov wrote: > Hi, > > Here's the PEP 550 version 2. Awesome! Some of the changes from v1 to v2 might be a bit confusing -- in particular the thing where ExecutionContext is now a stack of LocalContext objects instead of just being a mapping. So here's the big picture as I understand it: In discussions on the mailing list and off-line, we realized that the main reason people use "thread locals" is to implement fake dynamic scoping. Of course, generators/async/await mean that currently it's impossible to *really* fake dynamic scoping in Python -- that's what PEP 550 is trying to fix. So PEP 550 v1 essentially added "generator locals" as a refinement of "thread locals". But... it turns out that "generator locals" aren't enough to properly implement dynamic scoping either! So the goal in PEP 550 v2 is to provide semantics strong enough to *really* get this right. I wrote up some notes on what I mean by dynamic scoping, and why neither thread-locals nor generator-locals can fake it: https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb > Specification > ============= > > Execution Context is a mechanism of storing and accessing data specific > to a logical thread of execution. We consider OS threads, > generators, and chains of coroutines (such as ``asyncio.Task``) > to be variants of a logical thread. > > In this specification, we will use the following terminology: > > * **Local Context**, or LC, is a key/value mapping that stores the > context of a logical thread. If you're more familiar with dynamic scoping, then you can think of an LC as a single dynamic scope... > * **Execution Context**, or EC, is an OS-thread-specific dynamic > stack of Local Contexts. ...and an EC as a stack of scopes. Looking up a ContextItem in an EC proceeds by checking the first LC (innermost scope), then if it doesn't find what it's looking for it checks the second LC (the next-innermost scope), etc. > ``ContextItem`` objects have the following methods and attributes: > > * ``.description``: read-only description; > > * ``.set(o)`` method: set the value to ``o`` for the context item > in the execution context. > > * ``.get()`` method: return the current EC value for the context item. > Context items are initialized with ``None`` when created, so > this method call never fails. Two issues here, that both require some expansion of this API to reveal a *bit* more information about the EC structure. 1) For trio's cancel scope use case I described in the last, I actually need some way to read out all the values on the LocalContext stack. (It would also be helpful if there were some fast way to check the depth of the ExecutionContext stack -- or at least tell whether it's 1 deep or more-than-1 deep. I know that any cancel scopes that are in the bottommost LC will always be attached to the given Task, so I can set up the scope->task mapping once and re-use it indefinitely. OTOH for scopes that are stored in higher LCs, I have to check at every yield whether they're currently in effect. And I want to minimize the per-yield workload as much as possible.) 2) For classic decimal.localcontext context managers, the idea is still that you save/restore the value, so that you can nest multiple context managers without having to push/pop LCs all the time. But the above API is not actually sufficient to implement a proper save/restore, for a subtle reason: if you do ci.set(ci.get()) then you just (potentially) moved the value from a lower LC up to the top LC. Here's an example of a case where this can produce user-visible effects: https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py There are probably a bunch of options for fixing this. But basically we need some API that makes it possible to temporarily set a value in the top LC, and then restore that value to what it was before (either the previous value, or 'unset' to unshadow a value in a lower LC). One simple option would be to make the idiom be something like: @contextmanager def local_value(new_value): state = ci.get_local_state() ci.set(new_value) try: yield finally: ci.set_local_state(state) where 'state' is something like a tuple (ci in EC[-1], EC[-1].get(ci)). A downside with this is that it's a bit error-prone (very easy for an unwary user to accidentally use get/set instead of get_local_state/set_local_state). But I'm sure we can come up with something. > Manual Context Management > ------------------------- > > Execution Context is generally managed by the Python interpreter, > but sometimes it is desirable for the user to take the control > over it. A few examples when this is needed: > > * running a computation in ``concurrent.futures.ThreadPoolExecutor`` > with the current EC; > > * reimplementing generators with iterators (more on that later); > > * managing contexts in asynchronous frameworks (implement proper > EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) > > For these purposes we add a set of new APIs (they will be used in > later sections of this specification): > > * ``sys.new_local_context()``: create an empty ``LocalContext`` > object. > > * ``sys.new_execution_context()``: create an empty > ``ExecutionContext`` object. > > * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque > to Python code, and there are no APIs to modify them. > > * ``sys.get_execution_context()`` function. The function returns a > copy of the current EC: an ``ExecutionContext`` instance. If there are enough of these functions then it might make sense to stick them in their own module instead of adding more stuff to sys. I guess worrying about that can wait until the API details are more firm though. > * If ``coro.cr_local_context`` is an empty ``LocalContext`` object > that ``coro`` was created with, the interpreter will set > ``coro.cr_local_context`` to ``None``. I like all the ideas in this section, but this specific point feels a bit weird. Coroutine objects need a second hidden field somewhere to keep track of whether the object they end up with is the same one they were created with? If I set cr_local_context to something else, and then set it back to the original value, does that trigger the magic await behavior or not? What if I take the initial LocalContext off of one coroutine and attach it to another, does that trigger the magic await behavior? Maybe it would make more sense to have two sentinel values: UNINITIALIZED and INHERIT? > To enable correct Execution Context propagation into Tasks, the > asynchronous framework needs to assist the interpreter: > > * When ``create_task`` is called, it should capture the current > execution context with ``sys.get_execution_context()`` and save it > on the Task object. I wonder if it would be useful to have an option to squash this execution context down into a single LocalContext, since we know we'll be using it for a while and once we've copied an ExecutionContext it becomes impossible to tell the difference between one that has lots of internal LocalContexts and one that doesn't. This could also be handy for trio/curio's semantics where they initialize a new task's context to be a shallow copy of the parent task: you could do new_task_coro.cr_local_context = get_current_context().squash() and then skip having to wrap every send() call in a run_in_context. > Generators > ---------- > > Generators in Python, while similar to Coroutines, are used in a > fundamentally different way. They are producers of data, and > they use ``yield`` expression to suspend/resume their execution. > > A crucial difference between ``await coro`` and ``yield value`` is > that the former expression guarantees that the ``coro`` will be > executed fully, while the latter is producing ``value`` and > suspending the generator until it gets iterated again. > > Generators, similarly to coroutines, have a ``gi_local_context`` > attribute, which is set to an empty Local Context when created. > > Contrary to coroutines though, ``yield from o`` expression in > generators (that are not generator-based coroutines) is semantically > equivalent to ``for v in o: yield v``, therefore the interpreter does > not attempt to control their ``gi_local_context``. Hmm. I assume you're simplifying for expository purposes, but 'yield from' isn't the same as 'for v in o: yield v'. In fact PEP 380 says: "Motivation: [...] a piece of code containing a yield cannot be factored out and put into a separate function in the same way as other code. [...] If yielding of values is the only concern, this can be performed without much difficulty using a loop such as 'for v in g: yield v'. However, if the subgenerator is to interact properly with the caller in the case of calls to send(), throw() and close(), things become considerably more difficult. As will be seen later, the necessary code is very complicated, and it is tricky to handle all the corner cases correctly." So it seems to me that the whole idea of 'yield from' is that it's supposed to handle all the tricky bits needed to guarantee that if you take some code out of a generator and refactor it into a subgenerator, then everything works the same as before. This suggests that 'yield from' should do the same magic as 'await', where by default the subgenerator shares the same LocalContext as the parent generator. (And as a bonus it makes things simpler if 'yield from' and 'await' work the same.) > Asynchronous Generators > ----------------------- > > Asynchronous Generators (AG) interact with the Execution Context > similarly to regular generators. > > They have an ``ag_local_context`` attribute, which, similarly to > regular generators, can be set to ``None`` to make them use the outer > Local Context. This is used by the new > ``contextlib.asynccontextmanager`` decorator. > > The EC support of ``await`` expression is implemented using the same > approach as in coroutines, see the `Coroutine Object Modifications`_ > section. You showed how to make an iterator that acts like a generator. Is it also possible to make an async iterator that acts like an async generator? It's not immediately obvious, because you need to make sure that the local context gets restored each time you re-enter the __anext__ generator. I think it's something like: class AIter: def __init__(self): self._local_context = ... # Note: intentionally not async def __anext__(self): coro = self._real_anext() coro.cr_local_context = self._local_context return coro async def _real_anext(self): ... Does that look right? > ContextItem.get() Cache > ----------------------- > > We can add three new fields to ``PyThreadState`` and > ``PyInterpreterState`` structs: > > * ``uint64_t PyThreadState->unique_id``: a globally unique > thread state identifier (we can add a counter to > ``PyInterpreterState`` and increment it when a new thread state is > created.) > > * ``uint64_t PyInterpreterState->context_item_deallocs``: every time > a ``ContextItem`` is GCed, all Execution Contexts in all threads > will lose track of it. ``context_item_deallocs`` will simply > count all ``ContextItem`` deallocations. > > * ``uint64_t PyThreadState->execution_context_ver``: every time > a new item is set, or an existing item is updated, or the stack > of execution contexts is changed in the thread, we increment this > counter. I think this can be refined further (and I don't understand context_item_deallocs -- maybe it's a mistake?). AFAICT the things that invalidate a ContextItem's cache are: 1) switching threadstates 2) popping or pushing a non-empty LocalContext off the current threadstate's ExecutionContext 3) calling ContextItem.set() on *that* context item So I'd suggest tracking the thread state id, a counter of how many non-empty LocalContexts have been pushed/popped on this thread state, and a *per ContextItem* counter of how many times set() has been called. > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. While this is mostly true in the strict sense, in practice this PEP is useless if existing thread-local users like decimal and numpy can't migrate to it without breaking backcompat. So maybe this section should discuss that? (For example, one constraint on the design is that we can't provide only a pure push/pop API, even though that's what would be most convenient context managers like decimal.localcontext or numpy.errstate, because we also need to provide some backcompat story for legacy functions like decimal.setcontext and numpy.seterr.) -n -- Nathaniel J. Smith -- https://vorpus.org From ncoghlan at gmail.com Wed Aug 16 04:07:59 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Aug 2017 18:07:59 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: TLDR: I really like this version, and the tweaks I suggest below are just cosmetic. I figure if there are any major technical traps lurking, you'll find them as you work through updating the reference implementation. On 16 August 2017 at 09:55, Yury Selivanov wrote: > Context Item Object > ------------------- > > The ``sys.new_context_item(description)`` function creates a > new ``ContextItem`` object. The ``description`` parameter is a > ``str``, explaining the nature of the context key for introspection > and debugging purposes. > > ``ContextItem`` objects have the following methods and attributes: > > * ``.description``: read-only description; It may be worth having separate "name" and "description" attributes, similar to __name__ and __doc__ being separate on things like functions. That way, error messages can just show "name", while debuggers and other introspection tools can include a more detailed description. > Coroutine Object Modifications > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > To achieve this, a small set of modifications to the coroutine object > is needed: > > * New ``cr_local_context`` attribute. This attribute is readable > and writable for Python code. For ease of introspection, it's probably worth using a common `__local_context__` attribute name across all the different types that support one, and encouraging other object implementations to do the same. This isn't like cr_await and gi_yieldfrom, where we wanted to use different names because they refer to different kinds of objects. > Acknowledgments > =============== [snip] > Thanks to Nick Coghlan for numerous suggestions and ideas on the > mailing list, and for coming up with a case that cause the complete > rewrite of the initial PEP version [19]_. [snip] > .. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046780.html The threading in pipermail makes it difficult to get from your reply back to my original comment, so it may be better to link directly to the latter: https://mail.python.org/pipermail/python-ideas/2017-August/046775.html And to be completely explicit about: I like your proposed approach of leaving it up to iterator developers to decide whether or not to run with a local context or not. If they don't manipulate any context items, it won't matter, and if they do, it's straightforward to add a suitable call to sys.run_in_local_context(). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Wed Aug 16 04:37:58 2017 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Aug 2017 01:37:58 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Tue, Aug 15, 2017 at 11:53 PM, Jelle Zijlstra wrote: > Minor suggestion: Could we allow something like > `sys.set_new_context_item(description='mylib.context', > initial_value='spam')`? That would make it easier for type checkers to infer > the type of a ContextItem, and it would save a line of code in the common > case. This is a really handy feature in general, actually! In fact all of asyncio's thread-locals define initial values (using a trick involving subclassing threading.local), and I recently added this feature to trio.TaskLocal as well just because it's so convenient. However, something that you realize almost immediately when trying to use this is that in many cases, what you actually want is an initial value *factory*. Like, if you write new_context_item(initial_value=[]) then you're going to have a bad time. So, should we support something like new_context_item(initializer=lambda: [])? The semantics are a little bit subtle. I guess it would be something like: if ci.get() goes to find the value and fails at all levels, then we call the factory function and assign its return value to the *deepest* LC, EC[0]. The idea being that we're pretending that the value was there all along in the outermost scope, you just didn't notice before now. > With this modification, the type of new_context_item would be > > @overload > def new_context_item(*, description: str, initial_value: T) -> > ContextItem[T]: ... > @overload > def new_context_item(*, description: str) -> ContextItem[Any]: ... > > If we only allow the second variant, type checkers would need some sort of > special casing to figure out that after .set(), .get() will return the same > type. I'm not super familiar with PEP 484. Would using a factory function instead of an initial value break this type inference? If you want to automatically infer that whatever type I use to initialize the value is the only type it can ever have, is there a way for users to easily override that? Like could I write something like my_ci: ContextItem[int, str] = new_context_item(initial_value=0) ? -n -- Nathaniel J. Smith -- https://vorpus.org From jelle.zijlstra at gmail.com Wed Aug 16 04:40:54 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Wed, 16 Aug 2017 10:40:54 +0200 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: 2017-08-16 10:37 GMT+02:00 Nathaniel Smith : > On Tue, Aug 15, 2017 at 11:53 PM, Jelle Zijlstra > wrote: > > Minor suggestion: Could we allow something like > > `sys.set_new_context_item(description='mylib.context', > > initial_value='spam')`? That would make it easier for type checkers to > infer > > the type of a ContextItem, and it would save a line of code in the common > > case. > > This is a really handy feature in general, actually! In fact all of > asyncio's thread-locals define initial values (using a trick involving > subclassing threading.local), and I recently added this feature to > trio.TaskLocal as well just because it's so convenient. > > However, something that you realize almost immediately when trying to > use this is that in many cases, what you actually want is an initial > value *factory*. Like, if you write new_context_item(initial_value=[]) > then you're going to have a bad time. So, should we support something > like new_context_item(initializer=lambda: [])? > > The semantics are a little bit subtle. I guess it would be something > like: if ci.get() goes to find the value and fails at all levels, then > we call the factory function and assign its return value to the > *deepest* LC, EC[0]. The idea being that we're pretending that the > value was there all along in the outermost scope, you just didn't > notice before now. > > > With this modification, the type of new_context_item would be > > > > @overload > > def new_context_item(*, description: str, initial_value: T) -> > > ContextItem[T]: ... > > @overload > > def new_context_item(*, description: str) -> ContextItem[Any]: ... > > > > If we only allow the second variant, type checkers would need some sort > of > > special casing to figure out that after .set(), .get() will return the > same > > type. > > I'm not super familiar with PEP 484. > > Would using a factory function instead of an initial value break this > type inference? > > If you want to automatically infer that whatever type I use to > initialize the value is the only type it can ever have, is there a way > for users to easily override that? Like could I write something like > > my_ci: ContextItem[int, str] = new_context_item(initial_value=0) > > It would be `ContextItem[Union[int, str]]`, but yes, that should work. > ? > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Aug 16 05:36:03 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Aug 2017 19:36:03 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On 16 August 2017 at 17:18, Nathaniel Smith wrote: [Yury wrote] >> For these purposes we add a set of new APIs (they will be used in >> later sections of this specification): >> >> * ``sys.new_local_context()``: create an empty ``LocalContext`` >> object. >> >> * ``sys.new_execution_context()``: create an empty >> ``ExecutionContext`` object. >> >> * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque >> to Python code, and there are no APIs to modify them. >> >> * ``sys.get_execution_context()`` function. The function returns a >> copy of the current EC: an ``ExecutionContext`` instance. > > If there are enough of these functions then it might make sense to > stick them in their own module instead of adding more stuff to sys. I > guess worrying about that can wait until the API details are more firm > though. I'm actually wondering if it may be worth defining a _contextlib module (to export the interpreter level APIs to Python code), and making contextlib the official home of the user facing API. That we we can use contextlib2 to at least attempt to polyfill the coroutine parts of the proposal for 3.5+, even if the implicit generator changes are restricted to 3.7+ . >> * If ``coro.cr_local_context`` is an empty ``LocalContext`` object >> that ``coro`` was created with, the interpreter will set >> ``coro.cr_local_context`` to ``None``. > > I like all the ideas in this section, but this specific point feels a > bit weird. Coroutine objects need a second hidden field somewhere to > keep track of whether the object they end up with is the same one they > were created with? It feels odd to me as well, and I'm wondering if we can actually simplify this by saying: 1. Generator contexts (both sync and async) are isolated by default (__local_context__ = LocalContext()) 2. Coroutine contexts are *not* isolated by default (__local_context__ = None) Running top level task coroutines in separate execution contexts then becomes the responsibility of the event loop, which the PEP already lists as a required change in 3rd party libraries to get this all to work properly. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Aug 16 05:56:20 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Aug 2017 19:56:20 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On 16 August 2017 at 18:37, Nathaniel Smith wrote: > On Tue, Aug 15, 2017 at 11:53 PM, Jelle Zijlstra > wrote: >> Minor suggestion: Could we allow something like >> `sys.set_new_context_item(description='mylib.context', >> initial_value='spam')`? That would make it easier for type checkers to infer >> the type of a ContextItem, and it would save a line of code in the common >> case. > > This is a really handy feature in general, actually! In fact all of > asyncio's thread-locals define initial values (using a trick involving > subclassing threading.local), and I recently added this feature to > trio.TaskLocal as well just because it's so convenient. > > However, something that you realize almost immediately when trying to > use this is that in many cases, what you actually want is an initial > value *factory*. Like, if you write new_context_item(initial_value=[]) > then you're going to have a bad time. So, should we support something > like new_context_item(initializer=lambda: [])? > > The semantics are a little bit subtle. I guess it would be something > like: if ci.get() goes to find the value and fails at all levels, then > we call the factory function and assign its return value to the > *deepest* LC, EC[0]. The idea being that we're pretending that the > value was there all along in the outermost scope, you just didn't > notice before now. I actually wondered about this in the context of the PEP saying that "context items are set to None by default", as it isn't clear what that means for the behaviour of sys.new_execution_context(). The PEP states that the latter API creates an "empty" execution context, but the notion of a fresh EC being truly empty conflicts with the notion of all defined config items having a default value of None. I think your idea resolves that nicely: if context_item.get() failed to find a suitable context entry, it would do: base_context = ec.local_contexts[0] default_value = sys.run_with_local_context(base_context, self.default_factory) sys.run_with_local_context(base_context, self.set, default_value) The default setting for default_factory could then be to raise RuntimeError complaining that the context item isn't set in the current context. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From arj.python at gmail.com Wed Aug 16 08:06:23 2017 From: arj.python at gmail.com (Abdur-Rahmaan Janhangeer) Date: Wed, 16 Aug 2017 16:06:23 +0400 Subject: [Python-ideas] DOM syntax guide In-Reply-To: References: Message-ID: hum i'm saying that if i write a compiler for python based on the js language, is there any guideline as how to make the syntax more pythonic? Abdur-Rahmaan Janhangeer, Mauritius abdurrahmaanjanhangeer.wordpress.com On 16 Aug 2017 08:41, "Abdur-Rahmaan Janhangeer" wrote: > greetings all, > > i like python and lot and would like to use it everywhere ... upto on the > web (not django type). > > For python js-compiled versions (for makers) can you provide some syntax > guidelines for dom access ? > > > > Abdur-Rahmaan Janhangeer, > Mauritius > abdurrahmaanjanhangeer.wordpress.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Aug 16 08:13:16 2017 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 16 Aug 2017 22:13:16 +1000 Subject: [Python-ideas] DOM syntax guide In-Reply-To: References: Message-ID: On Wed, Aug 16, 2017 at 10:06 PM, Abdur-Rahmaan Janhangeer wrote: > hum i'm saying that if i write a compiler for python based on the js > language, is there any guideline as how to make the syntax more pythonic? You may want to look at prior art, including PyPyJS and Brython. https://github.com/pypyjs/pypyjs-examples https://www.brython.info/static_doc/en/dom_api.html ChrisA From stefan at bytereef.org Wed Aug 16 10:25:53 2017 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 16 Aug 2017 16:25:53 +0200 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: <20170816142553.GA2837@bytereef.org> On Wed, Aug 16, 2017 at 12:18:23AM -0700, Nathaniel Smith wrote: > > Here's the PEP 550 version 2. > > Awesome! > > Some of the changes from v1 to v2 might be a bit confusing -- in > particular the thing where ExecutionContext is now a stack of > LocalContext objects instead of just being a mapping. So here's the > big picture as I understand it: I'm still trying to digest this with very little time for it. It *is* slightly confusing. Perhaps it would be possible to name the data structures by their functionality. E.g. if ExecutionContext is a stack, use ExecutionStack? Or if the dynamic scope angle should be highlighted, perhaps ExecutionScope or even DynamicScope. This sounds like bikeshedding, but I find it difficult to have ExecutionContext, ContextItem, LocalContext in addition to the actual decimal.localcontext() and PyDecContext. For example, should PyDecContext inherit from ContextItem? I don't fully understand. :-/ Stefan Krah From wes.turner at gmail.com Wed Aug 16 10:31:45 2017 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 16 Aug 2017 09:31:45 -0500 Subject: [Python-ideas] DOM syntax guide In-Reply-To: References: Message-ID: On Wednesday, August 16, 2017, Chris Angelico wrote: > On Wed, Aug 16, 2017 at 10:06 PM, Abdur-Rahmaan Janhangeer > > wrote: > > hum i'm saying that if i write a compiler for python based on the js > > language, is there any guideline as how to make the syntax more pythonic? > > You may want to look at prior art, including PyPyJS and Brython. > > https://github.com/pypyjs/pypyjs-examples > https://www.brython.info/static_doc/en/dom_api.html https://github.com/Knio/dominate - #examples - nested context managers -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Wed Aug 16 10:35:41 2017 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 16 Aug 2017 09:35:41 -0500 Subject: [Python-ideas] DOM syntax guide In-Reply-To: References: Message-ID: On Wednesday, August 16, 2017, Wes Turner wrote: > > > On Wednesday, August 16, 2017, Chris Angelico > wrote: > >> On Wed, Aug 16, 2017 at 10:06 PM, Abdur-Rahmaan Janhangeer >> wrote: >> > hum i'm saying that if i write a compiler for python based on the js >> > language, is there any guideline as how to make the syntax more >> pythonic? >> >> You may want to look at prior art, including PyPyJS and Brython. >> >> https://github.com/pypyjs/pypyjs-examples >> https://www.brython.info/static_doc/en/dom_api.html > > > https://github.com/Knio/dominate > > - #examples > - nested context managers > > https://pyquery.readthedocs.io/en/latest/ https://pyquery.readthedocs.io/en/latest/api.html - PyQuery supports jQuery-like Pythonic DOM traversal -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Wed Aug 16 10:48:50 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 10:48:50 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Wed, Aug 16, 2017 at 2:53 AM, Jelle Zijlstra wrote: [..] >> >> The below is an example of how context items can be used:: >> >> my_context = sys.new_context_item(description='mylib.context') >> my_context.set('spam') > > > Minor suggestion: Could we allow something like > `sys.set_new_context_item(description='mylib.context', > initial_value='spam')`? That would make it easier for type checkers to infer > the type of a ContextItem, and it would save a line of code in the common > case. > > With this modification, the type of new_context_item would be > > @overload > def new_context_item(*, description: str, initial_value: T) -> > ContextItem[T]: ... > @overload > def new_context_item(*, description: str) -> ContextItem[Any]: ... > > If we only allow the second variant, type checkers would need some sort of > special casing to figure out that after .set(), .get() will return the same > type. I think that trying to infer the type of CI values by its default value is not the way to go: ci = sys.ContextItem(default=1) Is CI an int? Likely. Can it be set to None? Maybe, for some use-cases it might be what you want. The correct way IMO is to extend the typing module: ci1: typing.ContextItem[int] = sys.ContextItem(default=1) # ci1: is an int, and can't be anything else. ci2: typing.ContextItem[typing.Optional[int]] = sys.ContextItem(default=42) # ci2 is 42 by default, but can be reset to None. ci3: typing.ContextItem[typing.Union[int, str]] = sys.ContextItem(default='spam') # ci3 can be an int or str, can't be None. This is also forward compatible with proposals to add a `default_factory` or `initializer` parameter to ContextItems. Yury From yselivanov.ml at gmail.com Wed Aug 16 11:00:43 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 11:00:43 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: <20170816142553.GA2837@bytereef.org> References: <20170816142553.GA2837@bytereef.org> Message-ID: On Wed, Aug 16, 2017 at 10:25 AM, Stefan Krah wrote: > On Wed, Aug 16, 2017 at 12:18:23AM -0700, Nathaniel Smith wrote: >> > Here's the PEP 550 version 2. >> >> Awesome! >> >> Some of the changes from v1 to v2 might be a bit confusing -- in >> particular the thing where ExecutionContext is now a stack of >> LocalContext objects instead of just being a mapping. So here's the >> big picture as I understand it: > > I'm still trying to digest this with very little time for it. It *is* > slightly confusing. > > > Perhaps it would be possible to name the data structures by their functionality. > E.g. if ExecutionContext is a stack, use ExecutionStack? > > Or if the dynamic scope angle should be highlighted, perhaps ExecutionScope > or even DynamicScope. I'm -1 on calling this thing a "scope" or "dynamic scope", as I think it will be even more confusing to Python users. When I think of "scoping" I usually think about Python name scopes -- locals, globals, nonlocals, etc. I'm afraid that adding another dimension to this vocabulary won't help anyone. "Context" is an established term for what PEP 550 tries to accomplish. It's used in multiple languages and runtimes, and while researching this topic I didn't see anybody confused with the concept on StackOverflow/etc. > This sounds like bikeshedding, but I find it difficult to have ExecutionContext, > ContextItem, LocalContext in addition to the actual decimal.localcontext() > and PyDecContext. > > > For example, should PyDecContext inherit from ContextItem? I don't fully > understand. :-/ No, you wouldn't be able to extend ContextItem type. The way for decimal it so simply do the following: In Python: _current_ctx = sys.ContextItem('decimal context') # later when you set decimal context _current_ctx.set(DecimalContext) # whenever you need to get the current context dc = _current_ctx.get() In C: PyContextItem * _current_ctx = PyContext_NewItem("decimal context"); if (_current_ctx == NULL) { /* error */ } # later when you set decimal context PyDecContextObject *ctx; ... if (PyContext_SetItem(_current_ctx, (PyObject*)ctx)) { /* error */ } # whenever you need to get the current context PyDecContextObject *ctx = PyContext_GetItem(_current_ctx); if (ctx == NULL) { /* error */ } if (ctx == Py_None) { /* not initialized, nothing is there */ } We didn't really discuss C APIs at this point, and it's very likely that they will be adjusted, but the general idea should stay the same. All in all, the complexity of _decimal.c will only decrease with PEP 550, while getting better support for generators/async. Yury From ncoghlan at gmail.com Wed Aug 16 11:03:21 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Aug 2017 01:03:21 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: <20170816142553.GA2837@bytereef.org> References: <20170816142553.GA2837@bytereef.org> Message-ID: On 17 August 2017 at 00:25, Stefan Krah wrote: > Perhaps it would be possible to name the data structures by their functionality. > E.g. if ExecutionContext is a stack, use ExecutionStack? > > Or if the dynamic scope angle should be highlighted, perhaps ExecutionScope > or even DynamicScope. > > This sounds like bikeshedding, but I find it difficult to have ExecutionContext, > ContextItem, LocalContext in addition to the actual decimal.localcontext() > and PyDecContext. > > For example, should PyDecContext inherit from ContextItem? I don't fully > understand. :-/ Agreed, I don't think we have the terminology quite right yet. For "ContextItem" for example, we may actually be better off calling it "ContextKey", and have the methods be "ck.get_value()" and "ck.set_value()". That would get us closer to the POSIX TSS terminology, and emphasises that the objects themselves are best seen as opaque references to a key that lets you get and set the corresponding value in the active execution context. I do think we should stick with "context" rather than bringing dynamic scopes into the mix - while dynamic scoping *is* an accurate term for what we're doing at a computer science level, Python itself tends to reserve the term scoping for the way the compiler resolves names, which we're deliberately *not* touching here. Avoiding a naming collision with decimal.localcontext() would also be desirable. Yury, what do you think about moving the ExecutionContext name to what the PEP currently calls LocalContext, and renaming the current ExecutionContext type to ExecutionContextChain? The latter name then hints at the collections.ChainMap style behaviour of ck.get_value() lookups, without making any particular claims about what the internal implementation data structures actually are. The run methods could then be sys.run_with_context_chain() (to ignore the current context entirely and use a completely separate context chain) and sys.run_with_active_context() (to append a single execution context onto the end of the current context chain) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Wed Aug 16 11:22:21 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 11:22:21 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Wed, Aug 16, 2017 at 4:07 AM, Nick Coghlan wrote: > TLDR: I really like this version, and the tweaks I suggest below are > just cosmetic. Thanks, Nick! > I figure if there are any major technical traps > lurking, you'll find them as you work through updating the reference > implementation. FWIW I've implemented 3-5 different variations of PEP 550 (along with HAMT) and I'm fairly confident that datastructures and optimizations will work, so no major traps there are really expected. The risk that we need to manage now is getting the API design "right". > > On 16 August 2017 at 09:55, Yury Selivanov wrote: >> Context Item Object >> ------------------- >> >> The ``sys.new_context_item(description)`` function creates a >> new ``ContextItem`` object. The ``description`` parameter is a >> ``str``, explaining the nature of the context key for introspection >> and debugging purposes. >> >> ``ContextItem`` objects have the following methods and attributes: >> >> * ``.description``: read-only description; > > It may be worth having separate "name" and "description" attributes, > similar to __name__ and __doc__ being separate on things like > functions. That way, error messages can just show "name", while > debuggers and other introspection tools can include a more detailed > description. Initially I wanted to have "sys.new_context_item(name)" signature, but then I thought that some users might be confused what "name" actually means. In some contexts you might say that the "name" of the CI is the name of the variable it is bound to, IOW, for "foo = CI(name="bar")' the name is "foo". But some users might think that it's "bar". OTOH, PEP 550 doesn't have any introspection APIs at this point, and the final version of it will have to have them. If we add something like "sys.get_execution_context_as_dict()", then it would be preferable for CIs to have short name-like descriptions, as opposed to multiline docstrings. So in the end, I think that we should adopt a namedtuple solution, and just make the first "ContextItem" parameter a positional-only "name": ContextItem(name: str, /) > >> Coroutine Object Modifications >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> >> To achieve this, a small set of modifications to the coroutine object >> is needed: >> >> * New ``cr_local_context`` attribute. This attribute is readable >> and writable for Python code. > > For ease of introspection, it's probably worth using a common > `__local_context__` attribute name across all the different types that > support one, and encouraging other object implementations to do the > same. > > This isn't like cr_await and gi_yieldfrom, where we wanted to use > different names because they refer to different kinds of objects. We also have cr_code and gi_code, which are used for introspection purposes but refer to CodeObject. I myself don't like the mess the C-style convention created for our Python code (think of what the "dis" and "inspect" modules have to go through), so I'm +0 for having "__local_context__". > >> Acknowledgments >> =============== > [snip] > >> Thanks to Nick Coghlan for numerous suggestions and ideas on the >> mailing list, and for coming up with a case that cause the complete >> rewrite of the initial PEP version [19]_. > [snip] > >> .. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046780.html > > The threading in pipermail makes it difficult to get from your reply > back to my original comment, so it may be better to link directly to > the latter: https://mail.python.org/pipermail/python-ideas/2017-August/046775.html > > And to be completely explicit about: I like your proposed approach of > leaving it up to iterator developers to decide whether or not to run > with a local context or not. If they don't manipulate any context > items, it won't matter, and if they do, it's straightforward to add a > suitable call to sys.run_in_local_context(). Fixed the link, and will update the Acknowledgments section with your paragraph (thanks!) Yury From yselivanov.ml at gmail.com Wed Aug 16 11:43:24 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 11:43:24 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: <20170816142553.GA2837@bytereef.org> Message-ID: On Wed, Aug 16, 2017 at 11:03 AM, Nick Coghlan wrote: > On 17 August 2017 at 00:25, Stefan Krah wrote: >> Perhaps it would be possible to name the data structures by their functionality. >> E.g. if ExecutionContext is a stack, use ExecutionStack? >> >> Or if the dynamic scope angle should be highlighted, perhaps ExecutionScope >> or even DynamicScope. >> >> This sounds like bikeshedding, but I find it difficult to have ExecutionContext, >> ContextItem, LocalContext in addition to the actual decimal.localcontext() >> and PyDecContext. >> >> For example, should PyDecContext inherit from ContextItem? I don't fully >> understand. :-/ > > Agreed, I don't think we have the terminology quite right yet. > > For "ContextItem" for example, we may actually be better off calling > it "ContextKey", and have the methods be "ck.get_value()" and > "ck.set_value()". That would get us closer to the POSIX TSS > terminology, and emphasises that the objects themselves are best seen > as opaque references to a key that lets you get and set the > corresponding value in the active execution context. With the confusion of what "empty ExecutionContext" and "ContextItem is set to None by default", I tend to agree that "ContextKey" might be a better name. A default for "ContextKey" means something that will be returned if the lookup failed, plain and simple. > > I do think we should stick with "context" rather than bringing dynamic > scopes into the mix - while dynamic scoping *is* an accurate term for > what we're doing at a computer science level, Python itself tends to > reserve the term scoping for the way the compiler resolves names, > which we're deliberately *not* touching here. +1, I feel the same about this. > > Avoiding a naming collision with decimal.localcontext() would also be desirable. The ContextItem (or ContextKey) that decimal will be using will be an implementation detail, and it must not be exposed to the public API of the module. > > Yury, what do you think about moving the ExecutionContext name to what > the PEP currently calls LocalContext, and renaming the current > ExecutionContext type to ExecutionContextChain? While I think that the naming issue is important, the API that will be used most of the time is ContextItem. That's the name in the spotlight. > > The latter name then hints at the collections.ChainMap style behaviour > of ck.get_value() lookups, without making any particular claims about > what the internal implementation data structures actually are. > > The run methods could then be sys.run_with_context_chain() (to ignore > the current context entirely and use a completely separate context > chain) and sys.run_with_active_context() (to append a single execution > context onto the end of the current context chain) sys.run_with_context_chain and sys.run_with_active_context sound *really* confusing to me. Maybe it's because I spent too much time thinking about the current PEP 550 naming. To be honest, I really like Execution Context and Local Context names. I'm curious if other people are confused with them. Yury From stefan at bytereef.org Wed Aug 16 12:08:56 2017 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 16 Aug 2017 18:08:56 +0200 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: <20170816142553.GA2837@bytereef.org> Message-ID: <20170816160856.GA2672@bytereef.org> On Wed, Aug 16, 2017 at 11:00:43AM -0400, Yury Selivanov wrote: > "Context" is an established term for what PEP 550 tries to accomplish. > It's used in multiple languages and runtimes, and while researching > this topic I didn't see anybody confused with the concept on > StackOverflow/etc. For me a context is a "single thing" that is usually used to thread state through functions. I guess I'd call "environment" what you call "context". > In C: > > PyContextItem * _current_ctx = PyContext_NewItem("decimal context"); > if (_current_ctx == NULL) { /* error */ } > > # later when you set decimal context > PyDecContextObject *ctx; > ... > if (PyContext_SetItem(_current_ctx, (PyObject*)ctx)) { /* error */ } > > # whenever you need to get the current context > PyDecContextObject *ctx = PyContext_GetItem(_current_ctx); > if (ctx == NULL) { /* error */ } > if (ctx == Py_None) { /* not initialized, nothing is there */ } Thanks! This makes it a lot clearer. I'd probably use (stealing Nick's key suggestion): PyEnvKey *_current_context_key = PyEnv_NewKey("___DECIMAL_CONTEXT__"); ... PyDecContextObject *ctx = PyEnv_GetItem(_current_ctx_key); Stefan Krah From stefan at bytereef.org Wed Aug 16 12:12:40 2017 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 16 Aug 2017 18:12:40 +0200 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: <20170816142553.GA2837@bytereef.org> Message-ID: <20170816161240.GB2672@bytereef.org> On Thu, Aug 17, 2017 at 01:03:21AM +1000, Nick Coghlan wrote: > For "ContextItem" for example, we may actually be better off calling > it "ContextKey", and have the methods be "ck.get_value()" and > "ck.set_value()". That would get us closer to the POSIX TSS > terminology, and emphasises that the objects themselves are best seen > as opaque references to a key that lets you get and set the > corresponding value in the active execution context. +1 for "key". One is using a key to look up an item. > Avoiding a naming collision with decimal.localcontext() would also be desirable. > > Yury, what do you think about moving the ExecutionContext name to what > the PEP currently calls LocalContext, and renaming the current > ExecutionContext type to ExecutionContextChain? For me this is already a lot clearer. Otherwise I'd call it ExecutionEnvironment. Stefan Krah From yselivanov.ml at gmail.com Wed Aug 16 12:36:24 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 12:36:24 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Wed, Aug 16, 2017 at 3:18 AM, Nathaniel Smith wrote: > On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov wrote: >> Hi, >> >> Here's the PEP 550 version 2. > > Awesome! Thanks! [..] >> >> * **Local Context**, or LC, is a key/value mapping that stores the >> context of a logical thread. > > If you're more familiar with dynamic scoping, then you can think of an > LC as a single dynamic scope... > >> * **Execution Context**, or EC, is an OS-thread-specific dynamic >> stack of Local Contexts. > > ...and an EC as a stack of scopes. Looking up a ContextItem in an EC > proceeds by checking the first LC (innermost scope), then if it > doesn't find what it's looking for it checks the second LC (the > next-innermost scope), etc. Yes. We touched upon this topic in parallel threads, so I'll just briefly mention this here: I deliberately avoided using "scope" in PEP 550 naming, as "scoping" in Python is usually associated with names/globals/locals/nonlocals etc. Adding another "level" of scoping will be very confusing for users (IMO). > >> ``ContextItem`` objects have the following methods and attributes: >> >> * ``.description``: read-only description; >> >> * ``.set(o)`` method: set the value to ``o`` for the context item >> in the execution context. >> >> * ``.get()`` method: return the current EC value for the context item. >> Context items are initialized with ``None`` when created, so >> this method call never fails. > > Two issues here, that both require some expansion of this API to > reveal a *bit* more information about the EC structure. > > 1) For trio's cancel scope use case I described in the last, I > actually need some way to read out all the values on the LocalContext > stack. (It would also be helpful if there were some fast way to check > the depth of the ExecutionContext stack -- or at least tell whether > it's 1 deep or more-than-1 deep. I know that any cancel scopes that > are in the bottommost LC will always be attached to the given Task, so > I can set up the scope->task mapping once and re-use it indefinitely. > OTOH for scopes that are stored in higher LCs, I have to check at > every yield whether they're currently in effect. And I want to > minimize the per-yield workload as much as possible.) We can add an API for returning the full stack of values for a CI: ContextItem.iter_stack() -> Iterator # or ContextItem.get_stack() -> List Because some of the LC will be empty, what you'll get is a list with some None values in it, like: [None, val1, None, None, val2] The length of the list will tell you how deep the stack is. > > 2) For classic decimal.localcontext context managers, the idea is > still that you save/restore the value, so that you can nest multiple > context managers without having to push/pop LCs all the time. But the > above API is not actually sufficient to implement a proper > save/restore, for a subtle reason: if you do > > ci.set(ci.get()) > > then you just (potentially) moved the value from a lower LC up to the top LC. > > Here's an example of a case where this can produce user-visible effects: > > https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py > > There are probably a bunch of options for fixing this. But basically > we need some API that makes it possible to temporarily set a value in > the top LC, and then restore that value to what it was before (either > the previous value, or 'unset' to unshadow a value in a lower LC). One > simple option would be to make the idiom be something like: > > @contextmanager > def local_value(new_value): > state = ci.get_local_state() > ci.set(new_value) > try: > yield > finally: > ci.set_local_state(state) > > where 'state' is something like a tuple (ci in EC[-1], > EC[-1].get(ci)). A downside with this is that it's a bit error-prone > (very easy for an unwary user to accidentally use get/set instead of > get_local_state/set_local_state). But I'm sure we can come up with > something. Yeah, this is tricky. The main issue is indeed the confusion of what methods you need to call -- "get/set" or "get_local_state/set_local_state". On some level the problem is very similar to regular Python scoping rules: 1. we have local hames 2. we have global names 3. we nave 'nonlocal' modifier IOW scoping isn't easy, and you need to be conscious of what you do. It's just that we are so used to these scoping rules that they have a low cognitive effort for us. One of the ideas that I have in mind is to add another level of indirection to separate "global get" from "local set/get": 1. Rename ContextItem to ContextKey (reasoning for that in parallel thread) 2. Remove ContextKey.set() method 3. Add a new ContextKey.value() -> ContextValue ck = ContextKey() with ck.value() as val: val.set(spam) yield or val = ck.value() val.set(spam) try: yield finally: val.clear() Essentially ContextValue will be the only API to set values in execution context. ContextKey.get() will be used to get them. Nathaniel, Nick, what do you guys think? [..] >> * ``sys.get_execution_context()`` function. The function returns a >> copy of the current EC: an ``ExecutionContext`` instance. > > If there are enough of these functions then it might make sense to > stick them in their own module instead of adding more stuff to sys. I > guess worrying about that can wait until the API details are more firm > though. I'm OK with this idea -- pystate.c becomes way too crowded. Maybe we should just put this stuff in _contextlib.c and expose in the contextlib module. > >> * If ``coro.cr_local_context`` is an empty ``LocalContext`` object >> that ``coro`` was created with, the interpreter will set >> ``coro.cr_local_context`` to ``None``. > > I like all the ideas in this section, but this specific point feels a > bit weird. Coroutine objects need a second hidden field somewhere to > keep track of whether the object they end up with is the same one they > were created with? Yes, I planned to have a second hidden field, as Coroutines will have their cr_local_context set to NULL, and that will be their empty LC. So a second internal field is needed to disambiguate NULL -- meaning an "empty context" and NULL meaning "use outside local context". I omitted this from the PEP to make it a bit easier to digest, as this seemed to be a low-level implementation detail. > > If I set cr_local_context to something else, and then set it back to > the original value, does that trigger the magic await behavior or not? > What if I take the initial LocalContext off of one coroutine and > attach it to another, does that trigger the magic await behavior? > > Maybe it would make more sense to have two sentinel values: > UNINITIALIZED and INHERIT? All good questions. I don't like sentinels in general, I'd be more OK with a "gi_isolated_local_context" flag (we're back to square one here). But I don't think we should add it. My thinking is that once you start writing to "gi_local_context" -- all bets are off, and you manage this from now on (meaning that some internal coroutine flag will be set to 1, and the interpreter will never touch local_context of this coroutine): 1. If you write None -- it means that the generator/coroutine will not have its own LC. 2. If you write you own LC object -- the generator/coroutine will use it. > >> To enable correct Execution Context propagation into Tasks, the >> asynchronous framework needs to assist the interpreter: >> >> * When ``create_task`` is called, it should capture the current >> execution context with ``sys.get_execution_context()`` and save it >> on the Task object. > > I wonder if it would be useful to have an option to squash this > execution context down into a single LocalContext, since we know we'll > be using it for a while and once we've copied an ExecutionContext it > becomes impossible to tell the difference between one that has lots of > internal LocalContexts and one that doesn't. This could also be handy > for trio/curio's semantics where they initialize a new task's context > to be a shallow copy of the parent task: you could do > > new_task_coro.cr_local_context = get_current_context().squash() I think this would be a bit too low-level. I'd prefer to defer solving the "squashing" problem until I have a reference implementation and we can test this. Essentially, this is an optimization problem--the EC implementation can just squash the chain itself, when the chain is longer than 5 LCs. Or something like this. But exposing this to Python level would be like letting a program to tinker GCC -O flags after it's compiled IMO. [..] >> Contrary to coroutines though, ``yield from o`` expression in >> generators (that are not generator-based coroutines) is semantically >> equivalent to ``for v in o: yield v``, therefore the interpreter does >> not attempt to control their ``gi_local_context``. > > Hmm. I assume you're simplifying for expository purposes, but 'yield > from' isn't the same as 'for v in o: yield v'. In fact PEP 380 says: > "Motivation: [...] a piece of code containing a yield cannot be > factored out and put into a separate function in the same way as other > code. [...] If yielding of values is the only concern, this can be > performed without much difficulty using a loop such as 'for v in g: > yield v'. However, if the subgenerator is to interact properly with > the caller in the case of calls to send(), throw() and close(), things > become considerably more difficult. As will be seen later, the > necessary code is very complicated, and it is tricky to handle all the > corner cases correctly." > > So it seems to me that the whole idea of 'yield from' is that it's > supposed to handle all the tricky bits needed to guarantee that if you > take some code out of a generator and refactor it into a subgenerator, > then everything works the same as before. This suggests that 'yield > from' should do the same magic as 'await', where by default the > subgenerator shares the same LocalContext as the parent generator. > (And as a bonus it makes things simpler if 'yield from' and 'await' > work the same.) I see what you are saying here, but 'yield from' for generators is still different from awaits, as you can partially iterate the generator and *then* "yield from" from it: def foo(): g = gen() val1 = next(g) val2 = next(g) # do some computation? yield from g ... def gen(): # messing with EC between yields In general, I still think that 'yield from g' is semantically equivalent to 'for i in g: yield i' for most users. > >> Asynchronous Generators >> ----------------------- >> >> Asynchronous Generators (AG) interact with the Execution Context >> similarly to regular generators. >> >> They have an ``ag_local_context`` attribute, which, similarly to >> regular generators, can be set to ``None`` to make them use the outer >> Local Context. This is used by the new >> ``contextlib.asynccontextmanager`` decorator. >> >> The EC support of ``await`` expression is implemented using the same >> approach as in coroutines, see the `Coroutine Object Modifications`_ >> section. > > You showed how to make an iterator that acts like a generator. Is it > also possible to make an async iterator that acts like an async > generator? It's not immediately obvious, because you need to make sure > that the local context gets restored each time you re-enter the > __anext__ generator. I think it's something like: > > class AIter: > def __init__(self): > self._local_context = ... > > # Note: intentionally not async > def __anext__(self): > coro = self._real_anext() > coro.cr_local_context = self._local_context > return coro > > async def _real_anext(self): > ... > > Does that look right? Yes, seems to be correct. > >> ContextItem.get() Cache >> ----------------------- >> >> We can add three new fields to ``PyThreadState`` and >> ``PyInterpreterState`` structs: >> >> * ``uint64_t PyThreadState->unique_id``: a globally unique >> thread state identifier (we can add a counter to >> ``PyInterpreterState`` and increment it when a new thread state is >> created.) >> >> * ``uint64_t PyInterpreterState->context_item_deallocs``: every time >> a ``ContextItem`` is GCed, all Execution Contexts in all threads >> will lose track of it. ``context_item_deallocs`` will simply >> count all ``ContextItem`` deallocations. >> >> * ``uint64_t PyThreadState->execution_context_ver``: every time >> a new item is set, or an existing item is updated, or the stack >> of execution contexts is changed in the thread, we increment this >> counter. > > I think this can be refined further (and I don't understand > context_item_deallocs -- maybe it's a mistake?). Now that you highlighted the deallocs counter and I thought about it a bit more I don't think it's needed :) I'll remove it. > AFAICT the things > that invalidate a ContextItem's cache are: > > 1) switching threadstates > 2) popping or pushing a non-empty LocalContext off the current > threadstate's ExecutionContext > 3) calling ContextItem.set() on *that* context item > > So I'd suggest tracking the thread state id, a counter of how many > non-empty LocalContexts have been pushed/popped on this thread state, > and a *per ContextItem* counter of how many times set() has been > called. Excellent idea, will be in the next version of the PEP. > >> Backwards Compatibility >> ======================= >> >> This proposal preserves 100% backwards compatibility. > > While this is mostly true in the strict sense, in practice this PEP is > useless if existing thread-local users like decimal and numpy can't > migrate to it without breaking backcompat. So maybe this section > should discuss that? The main purpose of this section is to tell if some parts of the PEP are breaking some existing code/patterns or if it imposes a significant performance penalty. PEP 550 does neither of these things. If decimal/numpy simply switch to using new APIs, everything should work as expected for them, with the exception that assigning a new decimal context (without a context manager) will be isolated in generators. Which I'd consider as a bug fix. We can add a new section to discuss the specifics. Yury From yselivanov.ml at gmail.com Wed Aug 16 12:40:26 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 12:40:26 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: <20170816160856.GA2672@bytereef.org> References: <20170816142553.GA2837@bytereef.org> <20170816160856.GA2672@bytereef.org> Message-ID: On Wed, Aug 16, 2017 at 12:08 PM, Stefan Krah wrote: > On Wed, Aug 16, 2017 at 11:00:43AM -0400, Yury Selivanov wrote: >> "Context" is an established term for what PEP 550 tries to accomplish. >> It's used in multiple languages and runtimes, and while researching >> this topic I didn't see anybody confused with the concept on >> StackOverflow/etc. > > For me a context is a "single thing" that is usually used to thread state > through functions. > > I guess I'd call "environment" what you call "context". "environment" is also an overloaded term, and when I hear it I usually think about os.getenv(). Yury From yselivanov.ml at gmail.com Wed Aug 16 12:51:03 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 12:51:03 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Wed, Aug 16, 2017 at 5:36 AM, Nick Coghlan wrote: > On 16 August 2017 at 17:18, Nathaniel Smith wrote: > [Yury wrote] [..] >>> * If ``coro.cr_local_context`` is an empty ``LocalContext`` object >>> that ``coro`` was created with, the interpreter will set >>> ``coro.cr_local_context`` to ``None``. >> >> I like all the ideas in this section, but this specific point feels a >> bit weird. Coroutine objects need a second hidden field somewhere to >> keep track of whether the object they end up with is the same one they >> were created with? > > It feels odd to me as well, and I'm wondering if we can actually > simplify this by saying: > > 1. Generator contexts (both sync and async) are isolated by default > (__local_context__ = LocalContext()) > 2. Coroutine contexts are *not* isolated by default (__local_context__ = None) > > Running top level task coroutines in separate execution contexts then > becomes the responsibility of the event loop, which the PEP already > lists as a required change in 3rd party libraries to get this all to > work properly. This is an interesting twist, and I like it. This will change asyncio.Task from: class Task: def __init__(self, coro): ... self.exec_context = sys.get_execution_context() def step(): sys.run_with_execution_context(self.coro.send) to: class Task: def __init__(self, coro): ... self.local_context = sys.new_local_context() def step(): sys.run_with_local_context(self.local_context, self.coro.send) And we don't need ceval to do anything for "await", which means that with this approach we won't touch ceval.c at all. Yury From yselivanov.ml at gmail.com Wed Aug 16 12:55:55 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 12:55:55 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Wed, Aug 16, 2017 at 12:51 PM, Yury Selivanov wrote: > On Wed, Aug 16, 2017 at 5:36 AM, Nick Coghlan wrote: >> On 16 August 2017 at 17:18, Nathaniel Smith wrote: >> [Yury wrote] > [..] >>>> * If ``coro.cr_local_context`` is an empty ``LocalContext`` object >>>> that ``coro`` was created with, the interpreter will set >>>> ``coro.cr_local_context`` to ``None``. >>> >>> I like all the ideas in this section, but this specific point feels a >>> bit weird. Coroutine objects need a second hidden field somewhere to >>> keep track of whether the object they end up with is the same one they >>> were created with? >> >> It feels odd to me as well, and I'm wondering if we can actually >> simplify this by saying: >> >> 1. Generator contexts (both sync and async) are isolated by default >> (__local_context__ = LocalContext()) >> 2. Coroutine contexts are *not* isolated by default (__local_context__ = None) >> >> Running top level task coroutines in separate execution contexts then >> becomes the responsibility of the event loop, which the PEP already >> lists as a required change in 3rd party libraries to get this all to >> work properly. > > This is an interesting twist, and I like it. > > This will change asyncio.Task from: > > class Task: > > def __init__(self, coro): > ... > self.exec_context = sys.get_execution_context() > > def step(): > > sys.run_with_execution_context(self.coro.send) > > > to: > > class Task: > > def __init__(self, coro): > ... > self.local_context = sys.new_local_context() > > def step(): > > sys.run_with_local_context(self.local_context, self.coro.send) > > And we don't need ceval to do anything for "await", which means that > with this approach we won't touch ceval.c at all. And immediately after I hit "send" I realized that this is a bit more complicated. In order for Tasks to remember the full execution context of where they were created, we need a new method that would allow to run with *both* exec and local contexts: class Task: def __init__(self, coro): ... self.local_context = sys.new_local_context() self.exec_context = sys.get_execution_context() def step(): sys.run_with_contexts(self.exec_context, self.local_context, self.coro.send) This is needed for the following PEP example to work properly: current_request = sys.new_context_item(description='request') async def child(): print('current request:', repr(current_request.get())) async def handle_request(request): current_request.set(request) event_loop.create_task(child) run(top_coro()) See https://www.python.org/dev/peps/pep-0550/#tasks Yury From stefan at bytereef.org Wed Aug 16 13:13:04 2017 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 16 Aug 2017 19:13:04 +0200 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: <20170816142553.GA2837@bytereef.org> <20170816160856.GA2672@bytereef.org> Message-ID: <20170816171304.GA3261@bytereef.org> On Wed, Aug 16, 2017 at 12:40:26PM -0400, Yury Selivanov wrote: > On Wed, Aug 16, 2017 at 12:08 PM, Stefan Krah wrote: > > On Wed, Aug 16, 2017 at 11:00:43AM -0400, Yury Selivanov wrote: > >> "Context" is an established term for what PEP 550 tries to accomplish. > >> It's used in multiple languages and runtimes, and while researching > >> this topic I didn't see anybody confused with the concept on > >> StackOverflow/etc. > > > > For me a context is a "single thing" that is usually used to thread state > > through functions. > > > > I guess I'd call "environment" what you call "context". > > "environment" is also an overloaded term, and when I hear it I usually > think about os.getenv(). Yeah, I usually think about symbol tables. FWIW, I find this terminology quite reasonable: https://hackernoon.com/execution-context-in-javascript-319dd72e8e2c The main points are ExecutionContextStack/FunctionalExecutionContext vs. ExecutionContext/LocalContext. Stefan Krah From yselivanov.ml at gmail.com Wed Aug 16 14:38:12 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 14:38:12 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: <20170816171304.GA3261@bytereef.org> References: <20170816142553.GA2837@bytereef.org> <20170816160856.GA2672@bytereef.org> <20170816171304.GA3261@bytereef.org> Message-ID: On Wed, Aug 16, 2017 at 1:13 PM, Stefan Krah wrote: > On Wed, Aug 16, 2017 at 12:40:26PM -0400, Yury Selivanov wrote: >> On Wed, Aug 16, 2017 at 12:08 PM, Stefan Krah wrote: >> > On Wed, Aug 16, 2017 at 11:00:43AM -0400, Yury Selivanov wrote: >> >> "Context" is an established term for what PEP 550 tries to accomplish. >> >> It's used in multiple languages and runtimes, and while researching >> >> this topic I didn't see anybody confused with the concept on >> >> StackOverflow/etc. >> > >> > For me a context is a "single thing" that is usually used to thread state >> > through functions. >> > >> > I guess I'd call "environment" what you call "context". >> >> "environment" is also an overloaded term, and when I hear it I usually >> think about os.getenv(). > > Yeah, I usually think about symbol tables. FWIW, I find this terminology > quite reasonable: > > https://hackernoon.com/execution-context-in-javascript-319dd72e8e2c Thanks for the link! I think it actually explains the JS language spec wrt how scoping of regular variables is implemented. > The main points are ExecutionContextStack/FunctionalExecutionContext > > vs. ExecutionContext/LocalContext. While I'm trying to avoid using scoping terminology for PEP 550, there's one parallel -- as with regular Python scoping you have global variables and you have local variables. You can use the locals() to access to your local scope, and you can use globals() to access to your global scope. Similarly in PEP 550, you have your LocalContext and ExecutionContext. We don't want to call ExecutionContext a "Global Context" because it is fundamentally OS-thread-specific (contrary to Python globals). LocalContexts are created for threads, generators, coroutines and are really similar to local scoping. Adding more names for local contexts like CoroutineLocalContext, GeneratorLocalContext won't solve anything either. All in all, Local Context is what its name stands for -- it's a local context for your current logical scope, be it a coroutine or a generator. At this point PEP 550 is very different from ExecutionContext in .NET, but there are still many similarities. That's a +1 to keep its current name. ExecutionContextStack and ExecutionContextChain reflect the implementation of PEP 550 on some level, but for most Python users they won't mean anything. If they want to learn how EC works, they just need to read the PEP (or documentation). Otherwise they will just use the ContextKey API and it should just work for them. So IMO, ExecutionContext and LocalContext are really the best names of all that were proposed so far. Yury From antoine at python.org Wed Aug 16 16:12:22 2017 From: antoine at python.org (Antoine Pitrou) Date: Wed, 16 Aug 2017 22:12:22 +0200 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: <2b104dac-8cce-9616-1876-c386e743a5ed@python.org> Hi, > * ``sys.get_execution_context()`` function. The function returns a > copy of the current EC: an ``ExecutionContext`` instance. Can you explain the requirement for it being a copy? What do you call a copy exactly? Does it shallow-copy the stack or does it deep copy the context items? > * ``uint64_t PyThreadState->unique_id``: a globally unique > thread state identifier (we can add a counter to > ``PyInterpreterState`` and increment it when a new thread state is > created.) How does this interact with sub-interpreters? (same question for rest of the PEP :-)) > * O(N) for ``sys.get_execution_context()``, where ``N`` is the > total number of items in the current **execution** context. Right... but if this is a simple list copy, we are talking about an extremely fast O(N): >>> l = [None] * 1000 >>> %timeit l.copy() 3.76 ?s ? 17.5 ns per loop (mean ? std. dev. of 7 runs, 100000 loops each) (what is "number of items"? number of local contexts? number of individual context items?) > We believe that approach #3 enables an efficient and complete Execution Context implementation, with excellent runtime performance. What about the maintenance and debugging cost, though? > Immutable mappings implemented with HAMT have O(log32N) performance for both set(), get(), and merge() operations, which is essentially O(1) for relatively small mappings But, for relatively small mappings, regular dicts would also be fast enough, right? It would be helpful for the PEP to estimate reasonable parameter sizes: - reasonable number of context items in a local context - reasonable number of local contexts in an execution stack Regards Antoine. From yselivanov.ml at gmail.com Wed Aug 16 17:07:41 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 17:07:41 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: <2b104dac-8cce-9616-1876-c386e743a5ed@python.org> References: <2b104dac-8cce-9616-1876-c386e743a5ed@python.org> Message-ID: On Wed, Aug 16, 2017 at 4:12 PM, Antoine Pitrou wrote: > > > Hi, > >> * ``sys.get_execution_context()`` function. The function returns a >> copy of the current EC: an ``ExecutionContext`` instance. > > Can you explain the requirement for it being a copy? When the execution context is used to schedule a function call in a thread, or an asyncio callback in the futures, we want to take a snapshot of all items in the EC. In general the recommendation will be to store immutable data in the context (same as in .NET EC implementation, or whenever you have some potentially shared state). > What do you call a copy exactly? Does it shallow-copy the stack or does it deep copy the > context items? Execution Context is conceptually a stack of Local Contexts. Each local context is a weak key mapping. We need a shallow copy of the EC, which is semantically equivalent to the below snippet: new_lc = {} for lc in execution_context: new_lc.update(lc) return ExecutionContext(new_lc) > >> * ``uint64_t PyThreadState->unique_id``: a globally unique >> thread state identifier (we can add a counter to >> ``PyInterpreterState`` and increment it when a new thread state is >> created.) > > How does this interact with sub-interpreters? (same question for rest of > the PEP :-)) As long as PyThreadState_Get() works with sub-interpreters, all of the PEP machinery will work too. > >> * O(N) for ``sys.get_execution_context()``, where ``N`` is the >> total number of items in the current **execution** context. > > Right... but if this is a simple list copy, we are talking about an > extremely fast O(N): > >>>> l = [None] * 1000 >>>> %timeit l.copy() > 3.76 ?s ? 17.5 ns per loop (mean ? std. dev. of 7 runs, 100000 loops each) > > (what is "number of items"? number of local contexts? number of > individual context items?) "Number of items in the current **execution** context" = sum(len(local_context) for local_context in current_execution_context) Yes, even though making a new list + merging all LCs is a relatively fast operation, it will need to be performed on *every* asyncio.call_soon and create_task. The immutable stack/mappings solution simply elminates the problem because you can just copy by reference which is fast. The #3 approach is implementable with regular dicts + copy() too, it will be just slower in some cases (explained below). > >> We believe that approach #3 enables an efficient and complete Execution > Context implementation, with excellent runtime performance. > > What about the maintenance and debugging cost, though? Contrary to Python dicts, the implementation scope for hamt mapping is much smaller -- we only need get, set, and merge operations. No split dicts, no ordering, etc. With the help of fuzz-testing and out ref-counting test mode I hope that we'll be able to catch most of the bugs. Any solution adds to the total debugging and maintenance cost, but I believe that in this specific case, the benefits outweigh that cost: 1. Sometimes we'll need to merge many dicts in places like asyncio.call_soon or async Task objects. 2. "set" operation might resize the dict, making it slower. 3. The "dict.copy()" optimization that the PEP mentions won't be able to always help us, as we will likely need to often resize the dict. > >> Immutable mappings implemented with HAMT have O(log32N) performance > for both set(), get(), and merge() operations, which is essentially O(1) > for relatively small mappings > > But, for relatively small mappings, regular dicts would also be fast > enough, right? If all mappings are relatively small than the answer is close to "yes". We might want to periodically "squash" (or merge or compact) the chain of Local Contexts, in which case merging dicts will be more expensive than merging hamt. > > It would be helpful for the PEP to estimate reasonable parameter sizes: > - reasonable number of context items in a local context I assume that the number of context items will be relatively low. It's hard for me to imagine having more than a thousand of them. > - reasonable number of local contexts in an execution stack In a simple multi-threaded code we will only have one local context per execution context. Every time you run a generator or an asynchronous task you push a local context to the stack. Generators will have an optimization -- they will push NULL to the stack and it will be a NULL until a generator writes to its local context. It's possible to imagine a degenerative case when a generator recurses in, say, a 'decimal context' with block, which can potentially create a long chain of LCs. Long chains of LCs are not a problem in general -- once the generator is done, it pops its LCs, thus decreasing the stack size. Long chains of LCs might become a problem if, deep into recursion, a generator needs to capture the execution context (say it makes an asyncio.call_soon() call). In which case the solution is simple -- we squash chains that are longer than 5-10-some-predefined-number. In general, though, EC is something that is there and you can't really control it. If you have a thousand decimal libraries in your next YouTube-killer website, you will have large numbers of items in your Execution Context. You will inevitably start experiencing slowdowns of your code that you can't even fix (or maybe even explain). In this case, HAMT is a safer bet -- it's a guarantee that you will always have O(log32) performance for LC-stack-squashing or set operations. This is the strongest argument in favour of HAMT mapping - we implement it and it should work for all use-cases, even the for the unlikely ones. Yury From yselivanov.ml at gmail.com Wed Aug 16 22:15:20 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 16 Aug 2017 22:15:20 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Wed, Aug 16, 2017 at 12:55 PM, Yury Selivanov [..] > And immediately after I hit "send" I realized that this is a bit more > complicated. > > In order for Tasks to remember the full execution context of where > they were created, we need a new method that would allow to run with > *both* exec and local contexts: Never mind, the actual implementation would be as simple as: class Task: def __init__(self, coro): ... coro.cr_local_context = sys.new_local_context() self.exec_context = sys.get_execution_context() def step(): sys.run_with_execution_context(self.exec_contex , self.coro.send) No need for another "run_with_context" function. Yury From arj.python at gmail.com Wed Aug 16 22:59:55 2017 From: arj.python at gmail.com (Abdur-Rahmaan Janhangeer) Date: Thu, 17 Aug 2017 06:59:55 +0400 Subject: [Python-ideas] DOM syntax guide In-Reply-To: References: Message-ID: Thanks all for links ! will look at them. i intend making that compiler as a fun project ^^ Abdur-Rahmaan Janhangeer, Mauritius abdurrahmaanjanhangeer.wordpress.com On 16 Aug 2017 08:41, "Abdur-Rahmaan Janhangeer" wrote: > greetings all, > > i like python and lot and would like to use it everywhere ... upto on the > web (not django type). > > For python js-compiled versions (for makers) can you provide some syntax > guidelines for dom access ? > > > > Abdur-Rahmaan Janhangeer, > Mauritius > abdurrahmaanjanhangeer.wordpress.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 17 05:18:50 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Aug 2017 19:18:50 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On 17 August 2017 at 02:36, Yury Selivanov wrote: > Yeah, this is tricky. The main issue is indeed the confusion of what > methods you need to call -- "get/set" or > "get_local_state/set_local_state". > > On some level the problem is very similar to regular Python scoping rules: > > 1. we have local hames > 2. we have global names > 3. we nave 'nonlocal' modifier > > IOW scoping isn't easy, and you need to be conscious of what you do. > It's just that we are so used to these scoping rules that they have a > low cognitive effort for us. > > One of the ideas that I have in mind is to add another level of > indirection to separate "global get" from "local set/get": > > 1. Rename ContextItem to ContextKey (reasoning for that in parallel thread) > > 2. Remove ContextKey.set() method > > 3. Add a new ContextKey.value() -> ContextValue > > ck = ContextKey() > > with ck.value() as val: > val.set(spam) > yield > > or > > val = ck.value() > val.set(spam) > try: > yield > finally: > val.clear() > > Essentially ContextValue will be the only API to set values in > execution context. ContextKey.get() will be used to get them. > > Nathaniel, Nick, what do you guys think? I think I don't want to have try to explain to anyone what happens if I get a context value in my current execution environment and then send that value reference into a different execution context :) So I'd prefer my earlier proposal of: # Resolve key in current execution environment ck.get_value() # Assign to key in current execution context ck.set_value(value) # Assign to key in specific execution context sys.run_with_active_context(ec, ck.set_value, value) One suggestion I do like is Stefan's one of using "ExecutionContext" to refer to the namespace that ck.set_value() writes to, and then "ExecutionEnvironment" for the whole chain that ck.get_value() reads. Similar to "generator" and "package", we'd still end up with "context" being inherently ambiguous when used without qualification: - PEP 550 execution context - exception handling context (for chained exceptions) - with statement context - various context objects, like the decimal context But we wouldn't have two different kinds of context within PEP 550 itself. Instead, we'd have to start disambiguating the word environment: - PEP 550 execution environment - process environment (i.e. os.environ) The analogy between process environments and execution environments wouldn't be exact (since the key-value pairs in process environments are copied eagerly rather than via lazily chained lookups), but once you account for that, the parallels between an operating system level process environment tree and a Python level execution environment tree as proposed in PEP 550 seem like they would be helpful rather than confusing. > [..] >>> * ``sys.get_execution_context()`` function. The function returns a >>> copy of the current EC: an ``ExecutionContext`` instance. >> >> If there are enough of these functions then it might make sense to >> stick them in their own module instead of adding more stuff to sys. I >> guess worrying about that can wait until the API details are more firm >> though. > > I'm OK with this idea -- pystate.c becomes way too crowded. > > Maybe we should just put this stuff in _contextlib.c and expose in the > contextlib module. Yeah, I'd be OK with that - if we're going to reuse the word, it makes sense to reuse the module to expose the related machinery. That said, if we do go that way *and* we decide to offer a coroutine-only backport, I see an offer of contextlib2 co-maintainership in your future ;) >>> * If ``coro.cr_local_context`` is an empty ``LocalContext`` object >>> that ``coro`` was created with, the interpreter will set >>> ``coro.cr_local_context`` to ``None``. >> >> I like all the ideas in this section, but this specific point feels a >> bit weird. Coroutine objects need a second hidden field somewhere to >> keep track of whether the object they end up with is the same one they >> were created with? > > Yes, I planned to have a second hidden field, as Coroutines will have > their cr_local_context set to NULL, and that will be their empty LC. > So a second internal field is needed to disambiguate NULL -- meaning > an "empty context" and NULL meaning "use outside local context". > > I omitted this from the PEP to make it a bit easier to digest, as this > seemed to be a low-level implementation detail. Given that the field is writable, I think it makes more sense to just choose a suitable default, and then rely on other code changing that default when its not right. For generators: set it to an empty context by default, have contextlib.contextmanager (and similar wrapper) clear it For coroutines: set it to None by default, have async task managers give top level coroutines their own private context No hidden flags, no magic value adjustments, just different defaults for coroutines and generators (including async generators). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Aug 17 05:40:45 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Aug 2017 19:40:45 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: <20170816142553.GA2837@bytereef.org> <20170816160856.GA2672@bytereef.org> <20170816171304.GA3261@bytereef.org> Message-ID: On 17 August 2017 at 04:38, Yury Selivanov wrote: > On Wed, Aug 16, 2017 at 1:13 PM, Stefan Krah wrote: > While I'm trying to avoid using scoping terminology for PEP 550, there's > one parallel -- as with regular Python scoping you have global variables > and you have local variables. > > You can use the locals() to access to your local scope, and you can use > globals() to access to your global scope. To be honest, the difference between LocalContext and ExecutionContext feels more like the difference between locals() and lexical closure variables than it does the difference between between locals() and globals(). It's just that where the scoping rules are a compile time thing related to lexical closures, PEP 550 is about defining a dynamic context. > Similarly in PEP 550, you have your LocalContext and ExecutionContext. > We don't want to call ExecutionContext a "Global Context" because > it is fundamentally OS-thread-specific (contrary to Python globals). In addition to it being different from the way the decimal module already uses the phrase, one of the reasons I don't want to call it a LocalContext is because doing so brings in the suggestion that it is somehow connected to the locals() scope, and it isn't - there are plenty of things (most notably, function calls) that will change the active local namespace, but *won't* change the active execution context. > LocalContexts are created for threads, generators, coroutines and are > really similar to local scoping. Adding more names for local contexts > like CoroutineLocalContext, GeneratorLocalContext won't solve anything > either. All in all, Local Context is what its name stands for -- it's a > local context for your current logical scope, be it a coroutine or a > generator. But unlike locals() itself, it *isn't* linked to a specific frame of execution - it's deliberately designed to be shared *between* frames. If you don't like either of the ExecutionContext/ExecutionEnvironment or ExecutionContext/ExecutionContextChain combinations, how would you feel about ExecutionContext + DynamicContext? Saying that "ck.set_value(value) sets the value corresponding to the given context key in the currently active execution context" is still my preferred terminology for setting values, and I think the following would work well for reading values: ck.get_value() attempts to look up the value for that key in the currently active execution context. If it doesn't find one, it then tries each of the execution contexts in the currently active dynamic context. If it *still* doesn't find one, then it will set the default value in the outermost execution context and then return that value. One thing I like about that phrasing is that we'd be using the word dynamic in exactly the same sense that dynamic scoping uses it, and the dynamic context mechanism would become PEP 550's counterpart to the lexical closure support in Python's normal scoping rules. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Aug 17 05:46:58 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Aug 2017 19:46:58 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On 17 August 2017 at 02:55, Yury Selivanov wrote: > And immediately after I hit "send" I realized that this is a bit more > complicated. > > In order for Tasks to remember the full execution context of where > they were created, we need a new method that would allow to run with > *both* exec and local contexts: > > class Task: > > def __init__(self, coro): > ... > self.local_context = sys.new_local_context() > self.exec_context = sys.get_execution_context() > > def step(): > > sys.run_with_contexts(self.exec_context, self.local_context, > self.coro.send) I don't think that's entirely true, since you can nest the calls even without a combined API: sys.run_with_execution_context(self.exec_context, sys.run_with_local_context, self.local_context, self.coro.send) Offering a combined API may still make sense for usability and efficiency reasons, but it isn't strictly necessary. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Aug 18 01:09:51 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Aug 2017 15:09:51 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On 17 August 2017 at 01:22, Yury Selivanov wrote: > On Wed, Aug 16, 2017 at 4:07 AM, Nick Coghlan wrote: >>> Coroutine Object Modifications >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> >>> To achieve this, a small set of modifications to the coroutine object >>> is needed: >>> >>> * New ``cr_local_context`` attribute. This attribute is readable >>> and writable for Python code. >> >> For ease of introspection, it's probably worth using a common >> `__local_context__` attribute name across all the different types that >> support one, and encouraging other object implementations to do the >> same. >> >> This isn't like cr_await and gi_yieldfrom, where we wanted to use >> different names because they refer to different kinds of objects. > > We also have cr_code and gi_code, which are used for introspection > purposes but refer to CodeObject. Right, hence https://bugs.python.org/issue31230 :) (That suggestion is prompted by the fact that if we'd migrated gi_code to __code__ in 3.0, the same way we migrated func_code, then cr_code and ag_code would almost certainly have followed the same dunder-naming convention, and https://github.com/python/cpython/pull/3077 would never have been necessary) > I myself don't like the mess the C-style convention created for our > Python code (think of what the "dis" and "inspect" modules have to go > through), so I'm +0 for having "__local_context__". I'm starting to think this should be __private_context__ (to convey the *intent* of the attribute), rather than naming it after the type that it's expected to store. Thinking about this particular attribute name did prompt the question of how we want PEP 550 to interact with the exec builtin, though, as well as raising some questions around a number of other code execution cases: 1. What is the execution context for top level code in a module? 2. What is the execution context for the import machinery in an import statement? 3. What is the execution context for the import machinery when invoked via importlib? 4. What is the execution context for the import machinery when invoked via the C API? 5. What is the execution context for the import machinery when invoked via the runpy module? 6. What is the execution context for things like the timeit module, templating engines, etc? 7. What is the execution context for codecs and codec error handlers? 8. What is the execution context for __del__ methods and weakref callbacks? 9. What is the execution context for trace hooks and other really low level machinery? 10. What is the execution context for displayhook and excepthook? I think a number of those (top level module code executed via the import system, the timeit module, templating engines) can be addressed by saying that the exec builtin always creates a completely fresh execution context by default (with no access to the parent's execution context), and will gain a new keyword-only parameter that allows you to specify an execution context to use. That way, exec'ed code will be independent by default, but users of exec() will be able to opt in to handing it like a normal function call by passing in the current context. The default REPL, the code module and the IDLE shell window would need to be updated so that they use a shared context for evaluating the user supplied code snippets, while keeping their own context separate. While top-level code would always run in a completely fresh context for imports, the runpy module would expose the same setting as the exec builtin, so the executed code would be isolated by default, but you could opt in to using a particular execution context if you wanted to. Codecs and codec error handlers I think will be best handled in a way similar to generators, where they have their own private context (so they can't alter the caller's context), but can *read* the caller's context (so the context can be used as a way of providing context-dependent codec settings). That "read-only" access model also feels like the right option for the import machinery - regardless of whether it's accessed via the import statement, importlib, the C API, or the runpy module, the import machinery should be able to *read* the dynamic context, but not make persistent changes to it. Since they can be executed at arbitrary points in the code, it feels to me that __del__ methods and weakref callbacks should *always* be executed in a completely pristine execution context, with no access whatsoever to any thread's dynamic context. I think we should leave the execution context alone for the really low level hooks, and simply point out that yes, these have the ability to do weird things to the execution context, just as they have the power to do weird things to local variables, so they need to be handles with care. For displayhook and excepthook, I don't have a particularly strong intuition, so my default recommendation would be the read-only access proposed for generators, codecs, and the import machinery. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Fri Aug 18 02:12:40 2017 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 18 Aug 2017 08:12:40 +0200 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: Nathaniel Smith schrieb am 16.08.2017 um 09:18: > On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov wrote: >> Here's the PEP 550 version 2. > Awesome! +1 >> Backwards Compatibility >> ======================= >> >> This proposal preserves 100% backwards compatibility. > > While this is mostly true in the strict sense, in practice this PEP is > useless if existing thread-local users like decimal and numpy can't > migrate to it without breaking backcompat. So maybe this section > should discuss that? > > (For example, one constraint on the design is that we can't provide > only a pure push/pop API, even though that's what would be most > convenient context managers like decimal.localcontext or > numpy.errstate, because we also need to provide some backcompat story > for legacy functions like decimal.setcontext and numpy.seterr.) I agree with Nathaniel that many projects that can benefit from this feature will need to keep supporting older Python versions as well. In the case of Cython, that's Py2.6+. We already have the problem that the asynchronous finalisation of async generators cannot be supported in older Python versions ("old" as in Py3.5 and before), so we end up with a language feature that people can use in Py2.6, but not completely/safely. I can't say yet how difficult it will be to integrate the new infrastructure that this PEP proposes into a backwards compatible code base, but if there's something we can think of now in order to help projects keep supporting older Python versions in the same code base, given the constraints of their existing APIs and semantics - that would be great. Stefan From ncoghlan at gmail.com Fri Aug 18 04:50:11 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Aug 2017 18:50:11 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On 18 August 2017 at 16:12, Stefan Behnel wrote: > Nathaniel Smith schrieb am 16.08.2017 um 09:18: >> On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov wrote: >>> Backwards Compatibility >>> ======================= >>> >>> This proposal preserves 100% backwards compatibility. >> >> While this is mostly true in the strict sense, in practice this PEP is >> useless if existing thread-local users like decimal and numpy can't >> migrate to it without breaking backcompat. So maybe this section >> should discuss that? >> >> (For example, one constraint on the design is that we can't provide >> only a pure push/pop API, even though that's what would be most >> convenient context managers like decimal.localcontext or >> numpy.errstate, because we also need to provide some backcompat story >> for legacy functions like decimal.setcontext and numpy.seterr.) > > I agree with Nathaniel that many projects that can benefit from this > feature will need to keep supporting older Python versions as well. In the > case of Cython, that's Py2.6+. We already have the problem that the > asynchronous finalisation of async generators cannot be supported in older > Python versions ("old" as in Py3.5 and before), so we end up with a > language feature that people can use in Py2.6, but not completely/safely. > > I can't say yet how difficult it will be to integrate the new > infrastructure that this PEP proposes into a backwards compatible code > base, but if there's something we can think of now in order to help > projects keep supporting older Python versions in the same code base, given > the constraints of their existing APIs and semantics - that would be great. One aspect of this that we're considering is to put the Python level API in contextlib rather than in sys. That has the pragmatic benefit that contextlib2 then becomes the natural home for an API backport, and we should be able to get the full *explicit* API working on older versions (even if it means introducing an optional C extension module as a dependency to get that part of the API working fully). To backport the isolation of generators, we'd likely be able to provide a decorator that explicitly isolated generators, but it wouldn't be feasible to backport implicit isolation. The same would go for the various other proposals for implicit isolation - when running on older versions, the general principle would be "if you (or a library/framework you're using) didn't explicitly isolate the execution context, assume it's not isolated". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Aug 18 08:06:10 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 18 Aug 2017 22:06:10 +1000 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: References: Message-ID: <20170818120609.GS7395@ando.pearwood.info> Hello Thautwarm, and welcome! Sorry for the delay in responding, but this has been a very busy week for me personally, and an even busier week for my inbox, and so I missed your post until now. On Sun, Aug 13, 2017 at 12:49:45PM +0000, ?? ? wrote: > > Hi all, > > I've just finished a language extension for CPython 3.6.x to support > some additional grammars like Pattern Matching. And It's compatible > with CPython. It is really good to see some actual practical experiments for these features, rather than just talking about them. Thank you! [...] > # where syntax > > from math import pi > r = 1 # the radius > h = 10 # the height > S = (2*S_top + S_side) where: > S_top = pi*r**2 > S_side = C * h where: > C = 2*pi*r This has been suggested a few times. The first time, I disliked it, but I've come across to seeing its value. I like it. I wonder: could we make the "where" clause delay evaluation until the entire block was compiled, so that we could write something like this: S = (2*S_top + S_side) where: S_top = pi*r**2 S_side = C * h # C is defined further on C = 2*pi*r That's more how "where" is used mathematically. > # lambda&curry : > > lambda x: lambda y: lambda z: ret where: > ret = x+y > ret -= z > .x -> .y -> .z -> ret where: > ret = x+y > ret -= z > as-with x def as y def as z def ret where: > ret = x+y > ret -= z I'm afraid I can't make heads or tails of that. Apart from guessing that it creates a function, I have no idea what it would do. > # arrow transform (to avoid endless parentheses and try to be more readable. > > >> range(5) -> map(.x->x+2, _) -> list(_) > >> [2,3,4,5,6] I like the idea of chained function calls, like pipes in shell languages such as bash. I've written a proof-of-concept for that: http://code.activestate.com/recipes/580625-collection-pipeline-in-python/ I prefer | to -> but that's just a personal preference. I don't like the use of _ in there. Underscore already has a number of special meanings, such as: - a convention for "don't care" - in the interactive interpreter, the last value calculated - used for internationalisation I don't think that giving _ yet another special meaning, and this one built in to the language, is a good idea. > # pattern matching use "condic" as keyword is for avoiding the > # conflictions against the standard libraries and packages from third > # party. "switch" and "match" both lead to conflictions. This is a hard problem to deal with, but "condic" sounds awful. What is is supposed to mean? Short for "condition"? > condic+(type) 1: > case a:int => assert a == 1 and type(a) == 1 > [>] > case 0 => assert 1 > 0 > [is not] > case 1 => assert 1 is not 1 > otherwise => print("nothing") > > condic+() [1,2,3]: > case (a,*b)->b:list => sum(b) > +[] > case [] => print('empty list') > +[==] > case (a,b):(1,2) => print("the list is [1,2]") I don't know how to read those. [...] > Here is an example to use flowpython, which gives the permutations of a sequence. > > from copy import deepcopy > permutations = .seq -> seq_seq where: > condic+[] seq: > case (a, ) => seq_seq = [a,] > case (a, b) => seq_seq = [[a,b],[b,a]] > case (a,*b) => > seq_seq = permutations(b) -> map(.x -> insertAll(x, a), _) -> sum(_, []) where: > insertAll = . x, a -> ret where: > ret = [ deepcopy(x) -> _.insert(i, a) or _ for i in (len(x) -> range(_+1)) ] I find that almost unreadable. Too many new features all at once, it's like trying to read a completely unfamiliar language. How would you translate that into regular Python? Thanks for your experiments! -- Steve From python at mrabarnett.plus.com Fri Aug 18 09:16:14 2017 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 18 Aug 2017 14:16:14 +0100 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: <20170818120609.GS7395@ando.pearwood.info> References: <20170818120609.GS7395@ando.pearwood.info> Message-ID: <55b24282-ca1a-92bf-d7ad-d3e4256bc606@mrabarnett.plus.com> On 2017-08-18 13:06, Steven D'Aprano wrote: > Hello Thautwarm, and welcome! [snip] >> # pattern matching use "condic" as keyword is for avoiding the >> # conflictions against the standard libraries and packages from third >> # party. "switch" and "match" both lead to conflictions. > > This is a hard problem to deal with, but "condic" sounds awful. What is > is supposed to mean? Short for "condition"? > FWIW, Lisp has COND. [snip] From rosuav at gmail.com Fri Aug 18 11:06:17 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 19 Aug 2017 01:06:17 +1000 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: <20170818120609.GS7395@ando.pearwood.info> References: <20170818120609.GS7395@ando.pearwood.info> Message-ID: On Fri, Aug 18, 2017 at 10:06 PM, Steven D'Aprano wrote: >> # arrow transform (to avoid endless parentheses and try to be more readable. >> >> >> range(5) -> map(.x->x+2, _) -> list(_) >> >> [2,3,4,5,6] > > I like the idea of chained function calls, like pipes in shell > languages such as bash. I've written a proof-of-concept for that: > > http://code.activestate.com/recipes/580625-collection-pipeline-in-python/ > > I prefer | to -> but that's just a personal preference. > > I don't like the use of _ in there. Underscore already has a number of > special meanings, such as: > > - a convention for "don't care" > > - in the interactive interpreter, the last value calculated > > - used for internationalisation > > I don't think that giving _ yet another special meaning, and this one > built in to the language, is a good idea. AIUI it's not a new meaning, but another variant of the second of those examples: it means "the last value calculated". However, I'd prefer to see it done with something that's otherwise illegal syntax - so unless the expression is to the right of a "->", you cannot use that symbol in that way. I'm on the fence as to whether it'd be better to allow an implicit last argument (or implicit first argument), so you can say "-> list()" without the symbol. ChrisA From bagrat at aznauryan.org Fri Aug 18 11:09:12 2017 From: bagrat at aznauryan.org (Bagrat Aznauryan) Date: Fri, 18 Aug 2017 15:09:12 +0000 Subject: [Python-ideas] More Metadata for Variable Annotations Message-ID: # Abstract Before the holly PEP-526 the only option for type hints were comments. And before PEP-484 the docstrings were the main place where variable metadata would go. That variable metadata would include: * the type * the human-readable description * some value constraints (e.g. a range for integer variable) PEP-526 introduced the awesome syntax sugar, which made the first part of the metadata - the type, easily introspectable during the runtime. However, if you still need to add the description and the value constraints to the variable metadata, you still need to fallback to the docstring option. The idea is to make it possible to be able to include all of the mentioned metadata in the variable annotations. # Rationale Having the type specified using the supported annotation syntax and the rest of the metadata in the docstrings, adds duplication and complexity for further maintenance. Moreover, if you need the docstring-contained metadata to be used in the runtime, you need to implement a parser or pick one from existing ones which adds another dependency to your application. The need for the rest of the metadata other than the type, might be proven to be common. A typical example is generating the JSON Schema for a class, e.g. to be used for OpenAPI definition of your API. # Possible Solutions ## A wrapper The proposal is to introduce a new wrapper (probably a function), that will accept the type as the first positional argument and additional keyword arguments for metadata. The wrapper will map the keyword arguments to the type object as attributes and return it. The code would look like this: ``` foo: wrapper( int, description="bar", minimum=0, maximum=100 ) ``` Later, the metadata can be accessed as the annotation attributes, like e.g.: ``` __annotations__['foo'].description ``` ## Annotation as a tuple This solution does not require any code change in Python, but will force other tools change the parsing (e.g. mypy). The proposal is that when the annotation is optionally a tuple instance, use the first element as the type of the variable, and ignore the rest or treat as additional metadata. This will make it possible to add the metadata into a separate dictionary as the second element of the annotation tuple. For example: ``` foo: ( int, { description="bar", minimum=0, maximum=100 } ) ``` The annotation will be stored as is, so to access the metadata in the runtime, one would need to explicitly access the second item of the annotation tuple. # Summary This option would help to have a well annotated code which will be self-descriptive and provide abilities to generate schemas and other definitions (e.g. OpenAPI) automatically and without duplication. The proposed solutions are definitely not perfect and not the main point of this email. The target is to describe the idea and motivation and start a discussion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Aug 18 11:52:47 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 18 Aug 2017 11:52:47 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Fri, Aug 18, 2017 at 2:12 AM, Stefan Behnel wrote: > Nathaniel Smith schrieb am 16.08.2017 um 09:18: >> On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov wrote: >>> Here's the PEP 550 version 2. >> Awesome! > > +1 > >>> Backwards Compatibility >>> ======================= >>> >>> This proposal preserves 100% backwards compatibility. >> >> While this is mostly true in the strict sense, in practice this PEP is >> useless if existing thread-local users like decimal and numpy can't >> migrate to it without breaking backcompat. So maybe this section >> should discuss that? >> >> (For example, one constraint on the design is that we can't provide >> only a pure push/pop API, even though that's what would be most >> convenient context managers like decimal.localcontext or >> numpy.errstate, because we also need to provide some backcompat story >> for legacy functions like decimal.setcontext and numpy.seterr.) > > I agree with Nathaniel that many projects that can benefit from this > feature will need to keep supporting older Python versions as well. In the > case of Cython, that's Py2.6+. We already have the problem that the > asynchronous finalisation of async generators cannot be supported in older > Python versions ("old" as in Py3.5 and before), so we end up with a > language feature that people can use in Py2.6, but not completely/safely. > > I can't say yet how difficult it will be to integrate the new > infrastructure that this PEP proposes into a backwards compatible code > base, but if there's something we can think of now in order to help > projects keep supporting older Python versions in the same code base, given > the constraints of their existing APIs and semantics - that would be great. I think it's Cython's quest to try to backport support of all new Python 3.x language features to be 2.6-compatible, which sometimes can be questionable. You can add support of PEP 550 semantics to code that was compiled with Cython, but pure Python code won't be able to support it. This, in my opinion, could cause more confusion than benefit, so for Cython I think the solution is to do nothing in this case. We'll (maybe) backport some functionality to contextlib2. In my opinion, any code that uses contextlib2 in Python should work exactly the same when it's compiled with Cython. Yury From yselivanov.ml at gmail.com Fri Aug 18 12:17:11 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 18 Aug 2017 12:17:11 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Fri, Aug 18, 2017 at 1:09 AM, Nick Coghlan wrote: > On 17 August 2017 at 01:22, Yury Selivanov wrote: >> On Wed, Aug 16, 2017 at 4:07 AM, Nick Coghlan wrote: >>>> Coroutine Object Modifications >>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>>> >>>> To achieve this, a small set of modifications to the coroutine object >>>> is needed: >>>> >>>> * New ``cr_local_context`` attribute. This attribute is readable >>>> and writable for Python code. >>> >>> For ease of introspection, it's probably worth using a common >>> `__local_context__` attribute name across all the different types that >>> support one, and encouraging other object implementations to do the >>> same. >>> >>> This isn't like cr_await and gi_yieldfrom, where we wanted to use >>> different names because they refer to different kinds of objects. >> >> We also have cr_code and gi_code, which are used for introspection >> purposes but refer to CodeObject. > > Right, hence https://bugs.python.org/issue31230 :) > > (That suggestion is prompted by the fact that if we'd migrated gi_code > to __code__ in 3.0, the same way we migrated func_code, then cr_code > and ag_code would almost certainly have followed the same > dunder-naming convention, and > https://github.com/python/cpython/pull/3077 would never have been > necessary) > >> I myself don't like the mess the C-style convention created for our >> Python code (think of what the "dis" and "inspect" modules have to go >> through), so I'm +0 for having "__local_context__". > > I'm starting to think this should be __private_context__ (to convey > the *intent* of the attribute), rather than naming it after the type > that it's expected to store. I've been thinking a lot about the terminology, and I have another variant to consider: ExecutionContext is a stack of LogicalContexts. Coroutines/generators will thus have a __logical_context__ attribute. I think that the "logical" term better conveys the meaning than "private" or "dynamic". > > Thinking about this particular attribute name did prompt the question > of how we want PEP 550 to interact with the exec builtin, though, as > well as raising some questions around a number of other code execution > cases: > > 1. What is the execution context for top level code in a module? Whatever the execution context of the current thread that is importing the code is. Which would usually be the main thread. > 2. What is the execution context for the import machinery in an import > statement? > 3. What is the execution context for the import machinery when invoked > via importlib? Whatever the execution context that invoked the import machinery, be it "__import__()" or "import" statement or "importlib.load_module" > 4. What is the execution context for the import machinery when invoked > via the C API? > 5. What is the execution context for the import machinery when invoked > via the runpy module? > 6. What is the execution context for things like the timeit module, > templating engines, etc? > 7. What is the execution context for codecs and codec error handlers? > 8. What is the execution context for __del__ methods and weakref callbacks? In general, EC behaves just like TLS for all these cases, there's literally no difference. > 9. What is the execution context for trace hooks and other really low > level machinery? > 10. What is the execution context for displayhook and excepthook? Speaking of sys.displayhook and sys.stdio -- this API is fundamentally incompatible with PEP 550 or any possible context isolation. These things are essentially *global* variables in the sys module, and there's tons of code out there that *expects* them to behave like globals. If a user changes displayhook they expect it to work across all threads. If we want to make displayhooks/sys.stdio to become context-aware we will need new APIs for them with new properties/expectations. Simply forcing them to use execution context would be backwards incompatible. PEP 550 won't try to change how displayhooks, excepthooks, trace functions, sys.stdout etc work -- this is out of its scope. We can't refactor half of sys module as part of one PEP. > > I think a number of those (top level module code executed via the > import system, the timeit module, templating engines) can be addressed > by saying that the exec builtin always creates a completely fresh > execution context by default (with no access to the parent's execution > context), and will gain a new keyword-only parameter that allows you > to specify an execution context to use. That way, exec'ed code will be > independent by default, but users of exec() will be able to opt in to > handing it like a normal function call by passing in the current > context. "exec" uses outer globals/locals if you don't pass them explicitly -- the code isn't isolated by default. Isolation for "exec" is opt-in: ]]] a = 1 ]]] exec('print(a); b = 2') 1 ]]] b 2 Therefore, with regards to PEP 550, it should execute the code with the current EC/LC. We should also add a new keyword arguments to provide custom LC and EC (same as we do for locals/globals). > The default REPL, the code module and the IDLE shell window > would need to be updated so that they use a shared context for > evaluating the user supplied code snippets, while keeping their own > context separate. > > While top-level code would always run in a completely fresh context > for imports, the runpy module would expose the same setting as the > exec builtin, so the executed code would be isolated by default, but > you could opt in to using a particular execution context if you wanted > to. > > Codecs and codec error handlers I think will be best handled in a way > similar to generators, where they have their own private context (so > they can't alter the caller's context), but can *read* the caller's > context (so the context can be used as a way of providing > context-dependent codec settings). > > That "read-only" access model also feels like the right option for the > import machinery - regardless of whether it's accessed via the import > statement, importlib, the C API, or the runpy module, the import > machinery should be able to *read* the dynamic context, but not make > persistent changes to it. > > Since they can be executed at arbitrary points in the code, it feels > to me that __del__ methods and weakref callbacks should *always* be > executed in a completely pristine execution context, with no access > whatsoever to any thread's dynamic context. > > I think we should leave the execution context alone for the really low > level hooks, and simply point out that yes, these have the ability to > do weird things to the execution context, just as they have the power > to do weird things to local variables, so they need to be handles with > care. > > For displayhook and excepthook, I don't have a particularly strong > intuition, so my default recommendation would be the read-only access > proposed for generators, codecs, and the import machinery. I really think that in 3.7 we should just implement PEP 550 with its current scope, and defer system refactorings to 3.8. Many of such refactorings will probably deserve their own PEP, as, for example, changing sys.stdout semantics is a really complex topic. At this point we try to solve a problem of making a replacement for TLS that supports generators and async. Yury From levkivskyi at gmail.com Fri Aug 18 13:48:19 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Fri, 18 Aug 2017 19:48:19 +0200 Subject: [Python-ideas] More Metadata for Variable Annotations In-Reply-To: References: Message-ID: Hi Bagrat, Thanks for a detailed proposal! Indeed, some projects might want to have some additional metadata attached to a variable/argument besides its type. However, I think it would be more productive to first discuss this on a more specialized forum like https://github.com/python/typing/issues Note that similar proposals have been discussed and rejected before, see for example https://www.python.org/dev/peps/pep-0484/#what-about-existing-uses-of-annotations so that you would need to have a strong argument, for example some popular projects that will benefit from your proposal. -- Ivan On 18 August 2017 at 17:09, Bagrat Aznauryan wrote: > # Abstract > > Before the holly PEP-526 the only option for type hints were comments. And > before PEP-484 the docstrings were the main place where variable metadata > would go. That variable metadata would include: > > * the type > * the human-readable description > * some value constraints (e.g. a range for integer variable) > > PEP-526 introduced the awesome syntax sugar, which made the first part of > the metadata - the type, easily introspectable during the runtime. However, > if you still need to add the description and the value constraints to the > variable metadata, you still need to fallback to the docstring option. > > The idea is to make it possible to be able to include all of the mentioned > metadata in the variable annotations. > > # Rationale > > Having the type specified using the supported annotation syntax and the > rest of the metadata in the docstrings, adds duplication and complexity for > further maintenance. Moreover, if you need the docstring-contained metadata > to be used in the runtime, you need to implement a parser or pick one from > existing ones which adds another dependency to your application. > > The need for the rest of the metadata other than the type, might be proven > to be common. A typical example is generating the JSON Schema for a class, > e.g. to be used for OpenAPI definition of your API. > > # Possible Solutions > > ## A wrapper > > The proposal is to introduce a new wrapper (probably a function), that > will accept the type as the first positional argument and additional > keyword arguments for metadata. The wrapper will map the keyword arguments > to the type object as attributes and return it. The code would look like > this: > > ``` > foo: wrapper( > int, > description="bar", > minimum=0, > maximum=100 > ) > ``` > > Later, the metadata can be accessed as the annotation attributes, like > e.g.: > > ``` > __annotations__['foo'].description > ``` > > ## Annotation as a tuple > > This solution does not require any code change in Python, but will force > other tools change the parsing (e.g. mypy). The proposal is that when the > annotation is optionally a tuple instance, use the first element as the > type of the variable, and ignore the rest or treat as additional metadata. > This will make it possible to add the metadata into a separate dictionary > as the second element of the annotation tuple. For example: > > ``` > foo: ( > int, > { > description="bar", > minimum=0, > maximum=100 > } > ) > ``` > > The annotation will be stored as is, so to access the metadata in the > runtime, one would need to explicitly access the second item of the > annotation tuple. > > # Summary > > This option would help to have a well annotated code which will be > self-descriptive and provide abilities to generate schemas and other > definitions (e.g. OpenAPI) automatically and without duplication. > > The proposed solutions are definitely not perfect and not the main point > of this email. The target is to describe the idea and motivation and start > a discussion. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Aug 18 14:47:40 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Aug 2017 11:47:40 -0700 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: References: <20170818120609.GS7395@ando.pearwood.info> Message-ID: >> # arrow transform (to avoid endless parentheses and try to be more readable. > >> > >> >> range(5) -> map(.x->x+2, _) -> list(_) > >> >> [2,3,4,5,6] > > > > I like the idea of chained function calls parentheses aren't that bad, and as far as I can tell, this is just another way to call a function on the results of a function. The above is now spelled: list(map(lambda x: x+2, range(5))) which seems fine with me -- the only improvement I see is a more compact way to spell lambda. (though really, a list comp is considered more "pythonic" these days, yes? [x+2 for x in range(5)] nicely, we have list comps and generator expressions, so we can avoid the list0 call. I know this was a simple example for demonstration's sake, but doesn't look like an improvement to me. Of course, in this case, it's chaining iterations, not "ordinary" functions, so maybe would make more sense in other contexts. Also, we need to remember that functions can take *args, **kwargs, etc, and can return a tuple of just about anything -- not sure how well that maps to the "pipe" model. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Aug 18 16:34:51 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Aug 2017 13:34:51 -0700 Subject: [Python-ideas] More Metadata for Variable Annotations In-Reply-To: References: Message-ID: A similar approach (though only for class/instance variables) is taken by the 'attrs' package and by the proposal currently code-named "dataclasses" ( https://github.com/ericvsmith/dataclasses). On Fri, Aug 18, 2017 at 10:48 AM, Ivan Levkivskyi wrote: > Hi Bagrat, > > Thanks for a detailed proposal! Indeed, some projects might want to have > some additional metadata attached to a variable/argument besides its type. > However, I think it would be more productive to first discuss this on a > more specialized forum like https://github.com/python/typing/issues > > Note that similar proposals have been discussed and rejected before, see > for example > https://www.python.org/dev/peps/pep-0484/#what-about- > existing-uses-of-annotations > so that you would need to have a strong argument, for example some popular > projects that will benefit from your proposal. > > -- > Ivan > > > > On 18 August 2017 at 17:09, Bagrat Aznauryan wrote: > >> # Abstract >> >> Before the holly PEP-526 the only option for type hints were comments. >> And before PEP-484 the docstrings were the main place where variable >> metadata would go. That variable metadata would include: >> >> * the type >> * the human-readable description >> * some value constraints (e.g. a range for integer variable) >> >> PEP-526 introduced the awesome syntax sugar, which made the first part of >> the metadata - the type, easily introspectable during the runtime. However, >> if you still need to add the description and the value constraints to the >> variable metadata, you still need to fallback to the docstring option. >> >> The idea is to make it possible to be able to include all of the >> mentioned metadata in the variable annotations. >> >> # Rationale >> >> Having the type specified using the supported annotation syntax and the >> rest of the metadata in the docstrings, adds duplication and complexity for >> further maintenance. Moreover, if you need the docstring-contained metadata >> to be used in the runtime, you need to implement a parser or pick one from >> existing ones which adds another dependency to your application. >> >> The need for the rest of the metadata other than the type, might be >> proven to be common. A typical example is generating the JSON Schema for a >> class, e.g. to be used for OpenAPI definition of your API. >> >> # Possible Solutions >> >> ## A wrapper >> >> The proposal is to introduce a new wrapper (probably a function), that >> will accept the type as the first positional argument and additional >> keyword arguments for metadata. The wrapper will map the keyword arguments >> to the type object as attributes and return it. The code would look like >> this: >> >> ``` >> foo: wrapper( >> int, >> description="bar", >> minimum=0, >> maximum=100 >> ) >> ``` >> >> Later, the metadata can be accessed as the annotation attributes, like >> e.g.: >> >> ``` >> __annotations__['foo'].description >> ``` >> >> ## Annotation as a tuple >> >> This solution does not require any code change in Python, but will force >> other tools change the parsing (e.g. mypy). The proposal is that when the >> annotation is optionally a tuple instance, use the first element as the >> type of the variable, and ignore the rest or treat as additional metadata. >> This will make it possible to add the metadata into a separate dictionary >> as the second element of the annotation tuple. For example: >> >> ``` >> foo: ( >> int, >> { >> description="bar", >> minimum=0, >> maximum=100 >> } >> ) >> ``` >> >> The annotation will be stored as is, so to access the metadata in the >> runtime, one would need to explicitly access the second item of the >> annotation tuple. >> >> # Summary >> >> This option would help to have a well annotated code which will be >> self-descriptive and provide abilities to generate schemas and other >> definitions (e.g. OpenAPI) automatically and without duplication. >> >> The proposed solutions are definitely not perfect and not the main point >> of this email. The target is to describe the idea and motivation and start >> a discussion. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Fri Aug 18 17:23:50 2017 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 18 Aug 2017 16:23:50 -0500 Subject: [Python-ideas] More Metadata for Variable Annotations In-Reply-To: References: Message-ID: PyContracts supports things like numpy array constraints https://andreacensi.github.io/contracts/reference.html#contracts-language-reference > You can specify that the value must be a list, and specify optional constraints for its length and for its elements. ... You mentioned JSONschema. For RDF (e.g. JSONLD, #CSVW), there are a number of relevant variable annotations that should be useful as schema: > A data table with 7 metadata header rows (column label, property URI path, DataType, unit, accuracy, precision, significant figures): https://wrdrd.github.io/docs/consulting/linkedreproducibility#csv-csvw-and-metadata-rows On Friday, August 18, 2017, Bagrat Aznauryan wrote: > # Abstract > > Before the holly PEP-526 the only option for type hints were comments. And > before PEP-484 the docstrings were the main place where variable metadata > would go. That variable metadata would include: > > * the type > * the human-readable description > * some value constraints (e.g. a range for integer variable) > > PEP-526 introduced the awesome syntax sugar, which made the first part of > the metadata - the type, easily introspectable during the runtime. However, > if you still need to add the description and the value constraints to the > variable metadata, you still need to fallback to the docstring option. > > The idea is to make it possible to be able to include all of the mentioned > metadata in the variable annotations. > > # Rationale > > Having the type specified using the supported annotation syntax and the > rest of the metadata in the docstrings, adds duplication and complexity for > further maintenance. Moreover, if you need the docstring-contained metadata > to be used in the runtime, you need to implement a parser or pick one from > existing ones which adds another dependency to your application. > > The need for the rest of the metadata other than the type, might be proven > to be common. A typical example is generating the JSON Schema for a class, > e.g. to be used for OpenAPI definition of your API. > > # Possible Solutions > > ## A wrapper > > The proposal is to introduce a new wrapper (probably a function), that > will accept the type as the first positional argument and additional > keyword arguments for metadata. The wrapper will map the keyword arguments > to the type object as attributes and return it. The code would look like > this: > > ``` > foo: wrapper( > int, > description="bar", > minimum=0, > maximum=100 > ) > ``` > > Later, the metadata can be accessed as the annotation attributes, like > e.g.: > > ``` > __annotations__['foo'].description > ``` > > ## Annotation as a tuple > > This solution does not require any code change in Python, but will force > other tools change the parsing (e.g. mypy). The proposal is that when the > annotation is optionally a tuple instance, use the first element as the > type of the variable, and ignore the rest or treat as additional metadata. > This will make it possible to add the metadata into a separate dictionary > as the second element of the annotation tuple. For example: > > ``` > foo: ( > int, > { > description="bar", > minimum=0, > maximum=100 > } > ) > ``` > > The annotation will be stored as is, so to access the metadata in the > runtime, one would need to explicitly access the second item of the > annotation tuple. > > # Summary > > This option would help to have a well annotated code which will be > self-descriptive and provide abilities to generate schemas and other > definitions (e.g. OpenAPI) automatically and without duplication. > > The proposed solutions are definitely not perfect and not the main point > of this email. The target is to describe the idea and motivation and start > a discussion. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Aug 18 21:25:46 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 18 Aug 2017 18:25:46 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: <20170816142553.GA2837@bytereef.org> <20170816160856.GA2672@bytereef.org> <20170816171304.GA3261@bytereef.org> Message-ID: <5997939A.1090404@stoneleaf.us> On 08/17/2017 02:40 AM, Nick Coghlan wrote: > On 17 August 2017 at 04:38, Yury Selivanov wrote: > ck.get_value() attempts to look up the value for that key in the > currently active execution context. > If it doesn't find one, it then tries each of the execution > contexts in the currently active dynamic context. > If it *still* doesn't find one, then it will set the default value > in the outermost execution context and then return that value. For what it's worth, I find the term DynamicContext much easier to understand with relation to these concepts. -- ~Ethan~ From ethan at stoneleaf.us Fri Aug 18 21:41:34 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 18 Aug 2017 18:41:34 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: <20170816142553.GA2837@bytereef.org> Message-ID: <5997974E.9010801@stoneleaf.us> On 08/16/2017 08:43 AM, Yury Selivanov wrote: > To be honest, I really like Execution Context and Local Context names. > I'm curious if other people are confused with them. +1 confused :/ -- ~Ethan~ From guido at python.org Fri Aug 18 22:26:15 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Aug 2017 19:26:15 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: <5997974E.9010801@stoneleaf.us> References: <20170816142553.GA2837@bytereef.org> <5997974E.9010801@stoneleaf.us> Message-ID: I'm also confused by these, because they share the noun part of their name, but their use and meaning is quite different. The PEP defines an EC as a stack of LCs, and (apart from strings :-) it's usually not a good idea to use the same term for a container and its items. On Fri, Aug 18, 2017 at 6:41 PM, Ethan Furman wrote: > On 08/16/2017 08:43 AM, Yury Selivanov wrote: > > To be honest, I really like Execution Context and Local Context names. >> I'm curious if other people are confused with them. >> > > +1 confused :/ > > -- > ~Ethan~ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Aug 18 23:57:10 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Aug 2017 13:57:10 +1000 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: References: <20170818120609.GS7395@ando.pearwood.info> Message-ID: <20170819035710.GU7395@ando.pearwood.info> On Fri, Aug 18, 2017 at 11:47:40AM -0700, Chris Barker wrote: > >> # arrow transform (to avoid endless parentheses and try to be more > readable. > > > >> > > >> >> range(5) -> map(.x->x+2, _) -> list(_) > > >> >> [2,3,4,5,6] > > > > > > I like the idea of chained function calls > > > parentheses aren't that bad, and as far as I can tell, this is just another > way to call a function on the results of a function. I wouldn't say that parens are evil, but they're pretty noisy and distracting. I remember an old joke that claimed to prove that the US Defence Department was using Lisp for the SDI ("Star Wars") software: somebody had found a page covered completely edge to edge in nothing but closing brackets: )))))))))))))))))))))))))))))))))))))))) )))))))))))))))))))))))))))))))))))))))) ))))))))))))))))))))) ... etc Your example has a fairly short pipeline of calls: > list(map(lambda x: x+2, range(5))) But even this has two clear problems: - the trailing brackets ))) are just noise, like the SDI joke above; - you have to read it backwards, right to left, to make sense of it. Imagine if you had a chain of ten or twenty calls: )))))))))) ... you get the picture But ultimately that's a relatively minor nuisance rather than a major problem. The thing that makes long chains of function calls painful is that you have to read them backwards: - first range() is called; - then map; - finally list even though we write them in the opposite order. When we reason about the code, say to write it in the first place, or to read the expression and understand it, I would guess that most people reason something like this: - start with our input data, range() - call map on it to generate new values; - call list to generate a list. When writing code like this, I frequently find myself having to work backwards compared to how we write the order of function calls: range(5) # move editor insertion point backwards map(...) # move editor insertion point backwards list(...) Half of my key presses are moving backwards over code I've just written to insert a function call which is executed *after* what I wrote, but needs to be written *before* what I just wrote. For a short example like this, where we can easily keep the three function calls in short-term memory, it isn't so bad, but short-term memory is very limited ("magic number seven, plus or minus two") and if you're already thinking about a couple of previous operations on earlier lines of code, you don't have a lot of stack space left for a long chain of operations. And that's why we often fall back to temporary variables and an imperative style: data = range(5) data = map(..., data) data = list(data) Perhaps not in such a short example, but for longer ones, very frequently. We can write the code in the same order that it is executed with a pipeline and avoid needing to push functions into our short-term memory when either reading or writing: range(5) -> map(lambda...) -> list This way of thinking combines the stengths of postfix notation and function call notation, without the disadvantages of either. This is very successful in shell scripting languages like bash. I don't want to oversell it as a panacea that solves everything, but it really is a powerful (and underused) software paradigm. > which seems fine with me -- the only improvement I see is a more compact > way to spell lambda. (though really, a list comp is considered more > "pythonic" these days, yes? > > [x+2 for x in range(5)] Aye, for such a sort example. But consider a longer one: find the earliest date in a bunch of lines of text: result = (myfile.readlines() -> map(str.strip) -> filter( lambda s: not s.startwith('#') ) -> sorted -> collapse # collapse runs of identical lines -> extract_dates -> map(date_to_seconds) -> min ) (I've assumed that the functions map and filter have some sort of automatic currying, like in Haskell; if you don't like that, then just pretend I spelled them Map and Filter instead :-) That's nice and easy to read and write: I wrote down exactly the steps I would have taken to solve the problem, in the same order that they need to be taken. Formatting is a breeze: the hardest decision was how far to indent subsequent lines. Compare it to this: result = min(map(date_to_seconds, extract_dates(collapse(sorted( filter(lambda s: not s.startswith('#'), map(str.strip, myfile.readlines()))))))) You have to read all the way to the end to find out the most important part, namely what data you are operating on! And then you have to read backwards to understand what is done to the data. And finally you have to be prepared for a whole lot of arguments from your co-workers about how to format it :-) # Either the ugliest thing ever, or the One True Way result = min( map( date_to_seconds, extract_dates( collapse( sorted( filter( lambda s: not s.startswith('#'), map( str.strip, myfile.readlines() ) ) ) ) ) ) ) [...] > Also, we need to remember that functions can take *args, **kwargs, etc, > and can return a tuple of just about anything -- not sure how well that > maps to the "pipe" model. Not everything maps well to the function pipeline model. But enough things do that I believe it is a powerful tool in the programmers toolkit. -- Steve From mertz at gnosis.cx Sat Aug 19 01:33:40 2017 From: mertz at gnosis.cx (David Mertz) Date: Fri, 18 Aug 2017 22:33:40 -0700 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: <20170819035710.GU7395@ando.pearwood.info> References: <20170818120609.GS7395@ando.pearwood.info> <20170819035710.GU7395@ando.pearwood.info> Message-ID: This is pretty easy to write without any syntax changes, just using a higher-order function `compose()` (possible implementation at foot). Again, I'll assume auto-currying like the map/filter versions of those functions in toolz, as Steven does: > result = (myfile.readlines() > -> map(str.strip) > -> filter( lambda s: not s.startwith('#') ) > -> sorted > -> collapse # collapse runs of identical lines > -> extract_dates > -> map(date_to_seconds) > -> min > ) > result = compose(map(str.strip), filter(lambda s: not startswith('#'), sorted, collapse, extract_dates, map(date_to_seconds), min )(myfile.readlines()) Pretty much exactly the same thing with just a utility HOF. There's one that behaves right in `toolz`/`cytoolz`, or I've used this one in some publications and teaching material: def compose(*funcs): """Return a new function s.t. compose(f,g,...)(x) == f(g(...(x))) """ def inner(data, funcs=funcs): result = data for f in reversed(funcs): result = f(result) return result return inner -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Aug 19 06:42:03 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Aug 2017 20:42:03 +1000 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: References: <20170818120609.GS7395@ando.pearwood.info> <20170819035710.GU7395@ando.pearwood.info> Message-ID: <20170819104202.GV7395@ando.pearwood.info> On Fri, Aug 18, 2017 at 10:33:40PM -0700, David Mertz wrote: > This is pretty easy to write without any syntax changes, just using a > higher-order function `compose()` (possible implementation at foot). > Again, I'll assume auto-currying like the map/filter versions of those > functions in toolz, as Steven does: [...] > result = compose(map(str.strip), > filter(lambda s: not startswith('#'), > sorted, > collapse, > extract_dates, > map(date_to_seconds), > min > )(myfile.readlines()) A ~~slight~~ major nit: given the implementation of compose you quote below, this applies the functions in the wrong order. min() is called first, and map(str.strip) last. But apart from being completely wrong *wink* that's not too bad :-) Now we start bike-shedding the aethetics of what looks better and reads more nicely. Your version is pretty good, except: 1) The order of function composition is backwards to that normally expected (more on this below); 2) there's that unfortunate call to "compose" which isn't actually part of the algorithm, its just scaffolding to make it work; 3) the data being operated on still at the far end of the chain, instead of the start; 4) and I believe that teaching a chain of function calls is easier than teaching higher order function composition. Much easier. The standard mathematical definition of function composition operates left to right: (f?g?h)(x) = f(g(h(x)) http://mathworld.wolfram.com/Composition.html And that's precisely what your implementation does. Given your implementation quoted below: py> def add_one(x): return x + 1 ... py> def double(x): return 2*x ... py> def take_one(x): return x - 1 ... py> py> compose(add_one, ... double, ... take_one)(10) 19 py> py> add_one(double(take_one(10))) 19 which is the mathematically expected behaviour. But for chaining, we want the operations in the opposite order: 10 -> add_one -> double -> take_one which is equivalent to: take_one(double(add_one(10)) So to use composition for chaining, we need: - a non-standard implementation of chaining, which operates in the reverse to what mathematicians and functional programmers expect; - AND remember to use this rcompose() instead of compose() - stick to the standard compose(), but put the functions in the reverse order to what we want; - or use the standard compose, but use even more scaffolding to make it work: result = compose(*reversed( ( map(str.strip), filter(lambda s: not startswith('#')), sorted, collapse, extract_dates, map(date_to_seconds), min )))(myfile.readlines()) > def compose(*funcs): > """Return a new function s.t. > compose(f,g,...)(x) == f(g(...(x))) > """ > def inner(data, funcs=funcs): > result = data > for f in reversed(funcs): > result = f(result) > return result > return inner -- Steve From twshere at outlook.com Sat Aug 19 06:34:16 2017 From: twshere at outlook.com (?? ?) Date: Sat, 19 Aug 2017 10:34:16 +0000 Subject: [Python-ideas] How do you think about these language extensions?(Thautwarm) Message-ID: Hi, all! I want to reply many people and it might annoy you if I write multiple replies... As a result, I write them all in one post. ---------------------------------------------------------------------------------- To Christopher Barker, Ph.D. ---------------------------------------------------------------------------------- Hi, Dr. Christopher Barker Ph.D. Just as what you said, > parentheses aren't that bad, and as far as I can tell, this is just another > way to call a function on the results of a function. > The above is now spelled: > list(map(lambda x: x+2, range(5))) > which seems fine with me -- the only improvement I see is a more compact > way to spell lambda. (though really, a list comp is considered more > "pythonic" these days, yes? > [x+2 for x in range(5)] > nicely, we have list comps and generator expressions, so we can avoid the > list0 call. I'll try to talk something about why I think we need this grammar, the reasons for it are not just to remove parentheses. Could you please think this way to define a variable: >> var = expr() -> g1(_) if f(_) else g2(_) which equals >> test = f(expr()) >> var = g1(test) if f(test) else g2(test) which means that we have to use a temporary variable "test" to define "var". I think the second example is a bit lengthy, isn't it? The reason why I take this kind of grammar is that I can "flatten the programming logic". In another words,I can clearly state what I mean to say in order of my thinking. For example, >> lambda x: f(g(x)) -> map(_, range(100)) The codes above means that I'm stressing on what(an action) I'm going to do on an object "range(100)". However, sometimes the actions are not important, so if we want to stress on what we're going to do something on, we write this codes: >> range(100) -> map( lambda x:f(g(x)), _ ) Additionally, coding with chaining expressions makes me feel like writing a poem (Although it's a little difficult to me :) How do you think about writing the following codes? >> someone -> dosomething( _, options=options) \ -> is_meeting_some_conditions( _ ) \ -> result1() if _ else result2() where: options = ... result1 = lambda: ... result2 = lambda: ... def dosomething(obj, options) -> Any: ... def is_meeting_some_conditions( event : Any ) -> bool : ... In my opinion, it's quite readable and "smooth". To be honest, I think we can totolly do coding like chatting and it can be quite enjoyable. However, I'm not sure whether '->' is a good choice, and it didn't lead to any conflics at all when I compiled the CPython source code. Moreover, It can be easily changed in Grammar/Grammar, so I think it not to be crucial. Finally, > Also, we need to remember that functions can take *args, **kwargs, etc, > and can return a tuple of just about anything -- not sure how well that > maps to the "pipe" model. I think that using "pipe" model cannot be the right choice. We don't need to worry about this problem if we use the grammar I've implemented yet :) >> (lambda x: (x%5, x) ) -> max( range(99), key = _) >> 94 >> def max_from_seq(*args): return max(args) >> [1,2,3] -> max_from_seq(*_) >> 3 Thautwarm ---------------------------------------------------------------------------------- To David Mertz ---------------------------------------------------------------------------------- I think what you mean to is partially correct, but auto-currying cannot be effective enough. For sure you can do this in "compose" function. > ... -> ... -> ... -> map(lambda x:x+1, _) However, the evaluated result can be used as any argument of map and other callable objects. > ... -> ... -> ... -> map(_ , range(100)) > ... -> ... -> ... -> min([1,2,3], key = _ ) Thautwarm ---------------------------------------------------------------------------------- To Chris Angelico ---------------------------------------------------------------------------------- To be honest, I'm in favor of the grammar you prefer, like "expr1 | expr2 | expr3". However, it might be annoying that I should firstly define a lot of kinds of pipeline operators like Map, Reduce and so on. As for whether to allow implicit first/last argument, it seems to be a good idea but two points are in the way: 1. We need to change almost all C functions related to expressions in the source code located at Python/ast.c, while implementing the grammar I'm using now just needing nothing more than adding new C-function here. 2. Implicit methods makes it impossible to use following expressions. > ... -> func(some_var, some_var, _, some_eval(), some_key = _) In other words, implicit methods weaken the grammar, we need one more "where syntax" to do the same thing: > some -> new_func where: new_func = lambda x: func(some_var, some_var, x, some_eval(), some_key = x) Emmm... I'm not sure about that, how do you think about that? Thautwarm ---------------------------------------------------------------------------------- To Steven D'Aprano ---------------------------------------------------------------------------------- Thank you very much for your reply, and it encouraged me a lot. I've just read your most recent post, and it seems that you've suffered a lot from the parentheses, and so did I. > Half of my key presses are moving backwards over code I've just written > to insert a function call which is executed *after* what I wrote, but > needs to be written *before* what I just wrote. I couldn't agree more about what you've said here!!! My opinions about "chaining and pipeline" could be found in my reply to Chris Barker, sorry for that I could not repeat myself in the same post. >> # where syntax >> >> from math import pi >> r = 1 # the radius >> h = 10 # the height >> S = (2*S_top + S_side) where: >> S_top = pi*r**2 >> S_side = C * h where: >> C = 2*pi*r > This has been suggested a few times. The first time, I disliked it, but > I've come across to seeing its value. I like it. > I wonder: could we make the "where" clause delay evaluation until the > entire block was compiled, so that we could write something like this: > > S = (2*S_top + S_side) where: > S_top = pi*r**2 > S_side = C * h # C is defined further on > C = 2*pi*r > That's more how "where" is used mathematically. As far as I'm concerned, it's not sure to tell how about you opinion. The grammar you've just considered is quite Haskell-like, I think. And the reason why I want to use "where synatx" is to divide the programming logic clearly into different layers. For example, sometimes we just need to know that surface area of a cylinder is 2*S_top + S_side If someone see the codes, he may not need to know how S_top and S_side are evaluated,getting a knowledge of what it means to is enough. And if you want to get more about how to evaluate S_side and S_top, just see the next "where syntax" and find the answers. Here is another example, about forward propagation in neural networks. # input_layer[i] : "np.ndarray[:] " = np.array( ... ) # weight[i] : "np.ndarray[:][:]" = np.array( ... ) output_layer[i] = activate(input_layer[i]) where: """ logic layer 1 """ def activate( layer ): ... return activation[i](layer) # for example, activation[i] = lambda x:x input_layer[i] = forward(weight[i-1], output_layer[i-1].T) where: """ logic layer 2 """ def forward(weight, output): ... # if it's a normal multi-layer perceptron. return np.matmul(weight, output.T) For some people, their works just need them to know that forward propagation of a neural network means that the output layer is generated from the input layer with some transformation. For some people who want to know what the transformation is, they can go to the next "where syntax", and find the definition of the transformation which named "activate". For people who want to know how neural networks works with multiple layers, they can find that the input layer is defined by last output_layer, last weight matrix how NN forwards. I think it a good way to use "where syntax" to deconstruct the programming logic, which can strengthen the readability a lot! And then I'm going to talk something about Pattern Matching, and transform them to regular Python to make it clear to understand. >> Here is an example to use flowpython, which gives the permutations of a sequence. >> >> from copy import deepcopy >> permutations = .seq -> seq_seq where: >> condic+[] seq: >> case (a, ) => seq_seq = [a,] >> case (a, b) => seq_seq = [[a,b],[b,a]] >> case (a,*b) => >> seq_seq = permutations(b) -> map(.x -> insertAll(x, a), _) -> sum(_, []) where: >> insertAll = . x, a -> ret where: >> ret = [ deepcopy(x) -> _.insert(i, a) or _ for i in (len(x) -> range(_+1)) ] > I find that almost unreadable. Too many new features all at once, it's > like trying to read a completely unfamiliar language. > How would you translate that into regular Python? This algorithm can be fixed a little because the second case is redundant. And here is the regular Python codes transformed from the codes above. from copy import deepcopy def permutations(seq): try: # the first case (a, ) = seq return [a ,] except: try: # the third case (the second case is redundant) def insertAll(x, a): # insertAll([1,2,3], 0) -> [[0, 1, 2, 3], [1, 0, 2, 3], [1, 2, 0, 3], [1, 2, 3, 0]] ret = [] for i in range( len(x) + 1 ): tmp = deepcopy(x) tmp.insert(i, a) ret.append(tmp) return ret (a, *b) = seq tmp = permutations(b) tmp = map(lambda x : insertAll(x, a) , tmp) return sum(tmp, []) # sum([[1,2,3], [-1,-2,-3]], []) -> [1,2,3,-1,-2,-3] except: # no otherwise! pass To be continue...(sorry for my lack of time Thautwarm ---------------------------------------------------------------------------------- I'm sorry that I have to do some other works now and didn't finished writing down all I want to say. I'd like to continue replying the posts tomorrow , and it's quite glad to discuss these topics with you all!!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Aug 19 12:05:36 2017 From: mertz at gnosis.cx (David Mertz) Date: Sat, 19 Aug 2017 09:05:36 -0700 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: <20170819104202.GV7395@ando.pearwood.info> References: <20170818120609.GS7395@ando.pearwood.info> <20170819035710.GU7395@ando.pearwood.info> <20170819104202.GV7395@ando.pearwood.info> Message-ID: You are right, of course. Mine does the order wrong. But an 'rcompose()' or 'pipe()' or 'funchain()' is easy enough to put in the right order. On Aug 19, 2017 3:44 AM, "Steven D'Aprano" wrote: > On Fri, Aug 18, 2017 at 10:33:40PM -0700, David Mertz wrote: > > > This is pretty easy to write without any syntax changes, just using a > > higher-order function `compose()` (possible implementation at foot). > > Again, I'll assume auto-currying like the map/filter versions of those > > functions in toolz, as Steven does: > [...] > > > result = compose(map(str.strip), > > filter(lambda s: not startswith('#'), > > sorted, > > collapse, > > extract_dates, > > map(date_to_seconds), > > min > > )(myfile.readlines()) > > A ~~slight~~ major nit: given the implementation of compose you quote > below, this applies the functions in the wrong order. min() is called > first, and map(str.strip) last. > > But apart from being completely wrong *wink* that's not too bad :-) > > Now we start bike-shedding the aethetics of what looks better > and reads more nicely. Your version is pretty good, except: > > 1) The order of function composition is backwards to that normally > expected (more on this below); > > 2) there's that unfortunate call to "compose" which isn't actually part > of the algorithm, its just scaffolding to make it work; > > 3) the data being operated on still at the far end of the chain, instead > of the start; > > 4) and I believe that teaching a chain of function calls is easier than > teaching higher order function composition. Much easier. > > > The standard mathematical definition of function composition operates > left to right: > > (f?g?h)(x) = f(g(h(x)) > > http://mathworld.wolfram.com/Composition.html > > And that's precisely what your implementation does. Given your > implementation quoted below: > > py> def add_one(x): return x + 1 > ... > py> def double(x): return 2*x > ... > py> def take_one(x): return x - 1 > ... > py> > py> compose(add_one, > ... double, > ... take_one)(10) > 19 > py> > py> add_one(double(take_one(10))) > 19 > > which is the mathematically expected behaviour. But for chaining, we > want the operations in the opposite order: > > 10 -> add_one -> double -> take_one > > which is equivalent to: > > take_one(double(add_one(10)) > > > So to use composition for chaining, we need: > > > - a non-standard implementation of chaining, which operates in the > reverse to what mathematicians and functional programmers expect; > > - AND remember to use this rcompose() instead of compose() > > - stick to the standard compose(), but put the functions in the > reverse order to what we want; > > - or use the standard compose, but use even more scaffolding to > make it work: > > result = compose(*reversed( > ( map(str.strip), > filter(lambda s: not startswith('#')), > sorted, > collapse, > extract_dates, > map(date_to_seconds), > min > )))(myfile.readlines()) > > > > def compose(*funcs): > > """Return a new function s.t. > > compose(f,g,...)(x) == f(g(...(x))) > > """ > > def inner(data, funcs=funcs): > > result = data > > for f in reversed(funcs): > > result = f(result) > > return result > > return inner > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Aug 19 17:13:35 2017 From: mertz at gnosis.cx (David Mertz) Date: Sat, 19 Aug 2017 14:13:35 -0700 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: <20170819104202.GV7395@ando.pearwood.info> References: <20170818120609.GS7395@ando.pearwood.info> <20170819035710.GU7395@ando.pearwood.info> <20170819104202.GV7395@ando.pearwood.info> Message-ID: On Aug 19, 2017 3:44 AM, "Steven D'Aprano" wrote: 2) there's that unfortunate call to "compose" which isn't actually part of the algorithm, its just scaffolding to make it work; I see this as an ADVANTAGE, actually. We can save the composed function under another name before applying it to various data later. Or 'rcomposed' or whatever name. Moreover, composition is associative. op1 = compose(a, b, c) op2 = compose(d, e, f) op3 = compose(op1, op2) This is useful for creating compound operations that might be useful in themselves. The pipe operator doesn't lends itself nearly as well to this scenario. FWIW, while I think using a different function name is better, you could use a 'reversed=True' keyword argument on a compose() function. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Aug 19 17:33:53 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 19 Aug 2017 14:33:53 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: <50759657-9920-405a-8c08-7618024176e4@googlegroups.com> References: <50759657-9920-405a-8c08-7618024176e4@googlegroups.com> Message-ID: On Sat, Aug 19, 2017 at 12:09 PM, Neil Girdhar wrote: > Cool to see this on python-ideas. I'm really looking forward to this PEP > 550 or 521. > > On Wednesday, August 16, 2017 at 3:19:29 AM UTC-4, Nathaniel Smith wrote: >> 2) For classic decimal.localcontext context managers, the idea is >> still that you save/restore the value, so that you can nest multiple >> context managers without having to push/pop LCs all the time. But the >> above API is not actually sufficient to implement a proper >> save/restore, for a subtle reason: if you do >> >> ci.set(ci.get()) >> >> then you just (potentially) moved the value from a lower LC up to the top >> LC. > > > I agree with Nathaniel that this is an issue with the current API. I don't > think it's a good idea to have set and get methods. It would be much better > to reflect the underlying ExecutionContext *stack* in the API by exposing a > mutating *context manager* on the Context Key object instead of set. For > example, > > > my_context = sys.new_context_key('my_context') > > options = my_context.get() > options.some_mutating_method() > > with my_context.mutate(options): > # Do whatever you want with the mutated context > # Now, the context is reverted. > > Similarly, instead of > > my_context.set('spam') > > you would do > > with my_context.mutate('spam'): > # Do whatever you want with the mutated context > # Now, the context is reverted. Unfortunately, I don't think we can eliminate the set() operation entirely, because the libraries we want to migrate to using this -- like decimal and numpy -- generally provide set() operations in their public API. (See: decimal.setcontext, numpy.seterr, ...) They're generally not recommended for use in new code, but they do exist and are covered by compatibility guarantees, so we need some way to implement them using the PEP 550 API. OTOH we can certainly provide a context manager like this and make it the obvious convenient thing to use (and which also happens to do the right thing). We could potentially also give the 'set' primitive an ugly name to remind people that it has this pitfall, like make it 'set_in_top_context' or something. -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Sat Aug 19 17:42:21 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 19 Aug 2017 14:42:21 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: <5997939A.1090404@stoneleaf.us> References: <20170816142553.GA2837@bytereef.org> <20170816160856.GA2672@bytereef.org> <20170816171304.GA3261@bytereef.org> <5997939A.1090404@stoneleaf.us> Message-ID: On Fri, Aug 18, 2017 at 6:25 PM, Ethan Furman wrote: > On 08/17/2017 02:40 AM, Nick Coghlan wrote: >> >> On 17 August 2017 at 04:38, Yury Selivanov wrote: > > >> ck.get_value() attempts to look up the value for that key in the >> currently active execution context. >> If it doesn't find one, it then tries each of the execution >> contexts in the currently active dynamic context. >> If it *still* doesn't find one, then it will set the default value >> in the outermost execution context and then return that value. > > > For what it's worth, I find the term DynamicContext much easier to > understand with relation to these concepts. I really like DynamicContext -- if you know the classic dynamic/static terminology in language design then it works as a precise technical description, but it also makes sense as plain non-technical English. And it avoids the confusingly overloaded word "scope". Apropos Guido's point about container naming, how about DynamicContext and DynamicContextStack? That's only 3 letters longer than ExecutionContext. -n -- Nathaniel J. Smith -- https://vorpus.org From mistersheik at gmail.com Sat Aug 19 15:09:30 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 19 Aug 2017 12:09:30 -0700 (PDT) Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: <50759657-9920-405a-8c08-7618024176e4@googlegroups.com> Cool to see this on python-ideas. I'm really looking forward to this PEP 550 or 521. On Wednesday, August 16, 2017 at 3:19:29 AM UTC-4, Nathaniel Smith wrote: > > On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov > wrote: > > Hi, > > > > Here's the PEP 550 version 2. > > Awesome! > > Some of the changes from v1 to v2 might be a bit confusing -- in > particular the thing where ExecutionContext is now a stack of > LocalContext objects instead of just being a mapping. So here's the > big picture as I understand it: > > In discussions on the mailing list and off-line, we realized that the > main reason people use "thread locals" is to implement fake dynamic > scoping. Of course, generators/async/await mean that currently it's > impossible to *really* fake dynamic scoping in Python -- that's what > PEP 550 is trying to fix. So PEP 550 v1 essentially added "generator > locals" as a refinement of "thread locals". But... it turns out that > "generator locals" aren't enough to properly implement dynamic scoping > either! So the goal in PEP 550 v2 is to provide semantics strong > enough to *really* get this right. > > I wrote up some notes on what I mean by dynamic scoping, and why > neither thread-locals nor generator-locals can fake it: > > > https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb > > > Specification > > ============= > > > > Execution Context is a mechanism of storing and accessing data specific > > to a logical thread of execution. We consider OS threads, > > generators, and chains of coroutines (such as ``asyncio.Task``) > > to be variants of a logical thread. > > > > In this specification, we will use the following terminology: > > > > * **Local Context**, or LC, is a key/value mapping that stores the > > context of a logical thread. > > If you're more familiar with dynamic scoping, then you can think of an > LC as a single dynamic scope... > > > * **Execution Context**, or EC, is an OS-thread-specific dynamic > > stack of Local Contexts. > > ...and an EC as a stack of scopes. Looking up a ContextItem in an EC > proceeds by checking the first LC (innermost scope), then if it > doesn't find what it's looking for it checks the second LC (the > next-innermost scope), etc. > > > ``ContextItem`` objects have the following methods and attributes: > > > > * ``.description``: read-only description; > > > > * ``.set(o)`` method: set the value to ``o`` for the context item > > in the execution context. > > > > * ``.get()`` method: return the current EC value for the context item. > > Context items are initialized with ``None`` when created, so > > this method call never fails. > > Two issues here, that both require some expansion of this API to > reveal a *bit* more information about the EC structure. > > 1) For trio's cancel scope use case I described in the last, I > actually need some way to read out all the values on the LocalContext > stack. (It would also be helpful if there were some fast way to check > the depth of the ExecutionContext stack -- or at least tell whether > it's 1 deep or more-than-1 deep. I know that any cancel scopes that > are in the bottommost LC will always be attached to the given Task, so > I can set up the scope->task mapping once and re-use it indefinitely. > OTOH for scopes that are stored in higher LCs, I have to check at > every yield whether they're currently in effect. And I want to > minimize the per-yield workload as much as possible.) > > 2) For classic decimal.localcontext context managers, the idea is > still that you save/restore the value, so that you can nest multiple > context managers without having to push/pop LCs all the time. But the > above API is not actually sufficient to implement a proper > save/restore, for a subtle reason: if you do > > ci.set(ci.get()) > > then you just (potentially) moved the value from a lower LC up to the top > LC. > I agree with Nathaniel that this is an issue with the current API. I don't think it's a good idea to have set and get methods. It would be much better to reflect the underlying ExecutionContext *stack* in the API by exposing a mutating *context manager* on the Context Key object instead of set. For example, my_context = sys.new_context_key('my_context') options = my_context.get() options.some_mutating_method() with my_context.mutate(options): # Do whatever you want with the mutated context # Now, the context is reverted. Similarly, instead of my_context.set('spam') you would do with my_context.mutate('spam'): # Do whatever you want with the mutated context # Now, the context is reverted. > > Here's an example of a case where this can produce user-visible effects: > > > https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py > > There are probably a bunch of options for fixing this. But basically > we need some API that makes it possible to temporarily set a value in > the top LC, and then restore that value to what it was before (either > the previous value, or 'unset' to unshadow a value in a lower LC). One > simple option would be to make the idiom be something like: > > @contextmanager > def local_value(new_value): > state = ci.get_local_state() > ci.set(new_value) > try: > yield > finally: > ci.set_local_state(state) > > where 'state' is something like a tuple (ci in EC[-1], > EC[-1].get(ci)). A downside with this is that it's a bit error-prone > (very easy for an unwary user to accidentally use get/set instead of > get_local_state/set_local_state). But I'm sure we can come up with > something. > > > Manual Context Management > > ------------------------- > > > > Execution Context is generally managed by the Python interpreter, > > but sometimes it is desirable for the user to take the control > > over it. A few examples when this is needed: > > > > * running a computation in ``concurrent.futures.ThreadPoolExecutor`` > > with the current EC; > > > > * reimplementing generators with iterators (more on that later); > > > > * managing contexts in asynchronous frameworks (implement proper > > EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) > > > > For these purposes we add a set of new APIs (they will be used in > > later sections of this specification): > > > > * ``sys.new_local_context()``: create an empty ``LocalContext`` > > object. > > > > * ``sys.new_execution_context()``: create an empty > > ``ExecutionContext`` object. > > > > * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque > > to Python code, and there are no APIs to modify them. > > > > * ``sys.get_execution_context()`` function. The function returns a > > copy of the current EC: an ``ExecutionContext`` instance. > > If there are enough of these functions then it might make sense to > stick them in their own module instead of adding more stuff to sys. I > guess worrying about that can wait until the API details are more firm > though. > > > * If ``coro.cr_local_context`` is an empty ``LocalContext`` object > > that ``coro`` was created with, the interpreter will set > > ``coro.cr_local_context`` to ``None``. > > I like all the ideas in this section, but this specific point feels a > bit weird. Coroutine objects need a second hidden field somewhere to > keep track of whether the object they end up with is the same one they > were created with? > > If I set cr_local_context to something else, and then set it back to > the original value, does that trigger the magic await behavior or not? > What if I take the initial LocalContext off of one coroutine and > attach it to another, does that trigger the magic await behavior? > > Maybe it would make more sense to have two sentinel values: > UNINITIALIZED and INHERIT? > > > To enable correct Execution Context propagation into Tasks, the > > asynchronous framework needs to assist the interpreter: > > > > * When ``create_task`` is called, it should capture the current > > execution context with ``sys.get_execution_context()`` and save it > > on the Task object. > > I wonder if it would be useful to have an option to squash this > execution context down into a single LocalContext, since we know we'll > be using it for a while and once we've copied an ExecutionContext it > becomes impossible to tell the difference between one that has lots of > internal LocalContexts and one that doesn't. This could also be handy > for trio/curio's semantics where they initialize a new task's context > to be a shallow copy of the parent task: you could do > > new_task_coro.cr_local_context = get_current_context().squash() > > and then skip having to wrap every send() call in a run_in_context. > > > Generators > > ---------- > > > > Generators in Python, while similar to Coroutines, are used in a > > fundamentally different way. They are producers of data, and > > they use ``yield`` expression to suspend/resume their execution. > > > > A crucial difference between ``await coro`` and ``yield value`` is > > that the former expression guarantees that the ``coro`` will be > > executed fully, while the latter is producing ``value`` and > > suspending the generator until it gets iterated again. > > > > Generators, similarly to coroutines, have a ``gi_local_context`` > > attribute, which is set to an empty Local Context when created. > > > > Contrary to coroutines though, ``yield from o`` expression in > > generators (that are not generator-based coroutines) is semantically > > equivalent to ``for v in o: yield v``, therefore the interpreter does > > not attempt to control their ``gi_local_context``. > > Hmm. I assume you're simplifying for expository purposes, but 'yield > from' isn't the same as 'for v in o: yield v'. In fact PEP 380 says: > "Motivation: [...] a piece of code containing a yield cannot be > factored out and put into a separate function in the same way as other > code. [...] If yielding of values is the only concern, this can be > performed without much difficulty using a loop such as 'for v in g: > yield v'. However, if the subgenerator is to interact properly with > the caller in the case of calls to send(), throw() and close(), things > become considerably more difficult. As will be seen later, the > necessary code is very complicated, and it is tricky to handle all the > corner cases correctly." > > So it seems to me that the whole idea of 'yield from' is that it's > supposed to handle all the tricky bits needed to guarantee that if you > take some code out of a generator and refactor it into a subgenerator, > then everything works the same as before. This suggests that 'yield > from' should do the same magic as 'await', where by default the > subgenerator shares the same LocalContext as the parent generator. > (And as a bonus it makes things simpler if 'yield from' and 'await' > work the same.) > > > Asynchronous Generators > > ----------------------- > > > > Asynchronous Generators (AG) interact with the Execution Context > > similarly to regular generators. > > > > They have an ``ag_local_context`` attribute, which, similarly to > > regular generators, can be set to ``None`` to make them use the outer > > Local Context. This is used by the new > > ``contextlib.asynccontextmanager`` decorator. > > > > The EC support of ``await`` expression is implemented using the same > > approach as in coroutines, see the `Coroutine Object Modifications`_ > > section. > > You showed how to make an iterator that acts like a generator. Is it > also possible to make an async iterator that acts like an async > generator? It's not immediately obvious, because you need to make sure > that the local context gets restored each time you re-enter the > __anext__ generator. I think it's something like: > > class AIter: > def __init__(self): > self._local_context = ... > > # Note: intentionally not async > def __anext__(self): > coro = self._real_anext() > coro.cr_local_context = self._local_context > return coro > > async def _real_anext(self): > ... > > Does that look right? > > > ContextItem.get() Cache > > ----------------------- > > > > We can add three new fields to ``PyThreadState`` and > > ``PyInterpreterState`` structs: > > > > * ``uint64_t PyThreadState->unique_id``: a globally unique > > thread state identifier (we can add a counter to > > ``PyInterpreterState`` and increment it when a new thread state is > > created.) > > > > * ``uint64_t PyInterpreterState->context_item_deallocs``: every time > > a ``ContextItem`` is GCed, all Execution Contexts in all threads > > will lose track of it. ``context_item_deallocs`` will simply > > count all ``ContextItem`` deallocations. > > > > * ``uint64_t PyThreadState->execution_context_ver``: every time > > a new item is set, or an existing item is updated, or the stack > > of execution contexts is changed in the thread, we increment this > > counter. > > I think this can be refined further (and I don't understand > context_item_deallocs -- maybe it's a mistake?). AFAICT the things > that invalidate a ContextItem's cache are: > > 1) switching threadstates > 2) popping or pushing a non-empty LocalContext off the current > threadstate's ExecutionContext > 3) calling ContextItem.set() on *that* context item > > So I'd suggest tracking the thread state id, a counter of how many > non-empty LocalContexts have been pushed/popped on this thread state, > and a *per ContextItem* counter of how many times set() has been > called. > > > Backwards Compatibility > > ======================= > > > > This proposal preserves 100% backwards compatibility. > > While this is mostly true in the strict sense, in practice this PEP is > useless if existing thread-local users like decimal and numpy can't > migrate to it without breaking backcompat. So maybe this section > should discuss that? > > (For example, one constraint on the design is that we can't provide > only a pure push/pop API, even though that's what would be most > convenient context managers like decimal.localcontext or > numpy.errstate, because we also need to provide some backcompat story > for legacy functions like decimal.setcontext and numpy.seterr.) > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Aug 19 20:45:11 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 20 Aug 2017 10:45:11 +1000 Subject: [Python-ideas] How do you think about these language extensions? In-Reply-To: References: <20170818120609.GS7395@ando.pearwood.info> <20170819035710.GU7395@ando.pearwood.info> <20170819104202.GV7395@ando.pearwood.info> Message-ID: <20170820004510.GW7395@ando.pearwood.info> On Sat, Aug 19, 2017 at 09:05:36AM -0700, David Mertz wrote: > You are right, of course. Mine does the order wrong. But an 'rcompose()' or > 'pipe()' or 'funchain()' is easy enough to put in the right order. Indeed. I said earlier that your solution (corrected for its error) was a pretty neat solution, and it was mostly down to a sense of aethetics which we might prefer. I think a pipe or arror is aethetically nicer, and speaks much more closely to the intent. Analogy: We don't need operators + - * / etc, since it's trivial to get the same effect using the functions in the operator module. But operators look nicer and are closer to the way people think of arithmetic. I think that function composition is a neat and powerful tool for those who already think functionally, but higher order functions are harder to teach and even experts can mess them up. (The lesson here is that the pipe operator | is like a postfix version of the composition operator ? .) -- Steve From barry at barrys-emacs.org Sun Aug 20 17:01:06 2017 From: barry at barrys-emacs.org (Barry) Date: Sun, 20 Aug 2017 22:01:06 +0100 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: I'm not clear why there is a new_context_key which seems not to be a key. It seems that the object is a container for a single value. Key.set( value ) does not feel right. Container.set( value ) is fine. Barry > On 16 Aug 2017, at 00:55, Yury Selivanov wrote: > > Hi, > > Here's the PEP 550 version 2. Thanks to a very active and insightful > discussion here on Python-ideas, we've discovered a number of > problems with the first version of the PEP. This version is a complete > rewrite (only Abstract, Rationale, and Goals sections were not updated). > > The updated PEP is live on python.org: > https://www.python.org/dev/peps/pep-0550/ > > There is no reference implementation at this point, but I'm confident > that this version of the spec will have the same extremely low > runtime overhead as the first version. Thanks to the new ContextItem > design, accessing values in the context is even faster now. > > Thank you! > > > PEP: 550 > Title: Execution Context > Version: $Revision$ > Last-Modified: $Date$ > Author: Yury Selivanov > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 11-Aug-2017 > Python-Version: 3.7 > Post-History: 11-Aug-2017, 15-Aug-2017 > > > Abstract > ======== > > This PEP proposes a new mechanism to manage execution state--the > logical environment in which a function, a thread, a generator, > or a coroutine executes in. > > A few examples of where having a reliable state storage is required: > > * Context managers like decimal contexts, ``numpy.errstate``, > and ``warnings.catch_warnings``; > > * Storing request-related data such as security tokens and request > data in web applications, implementing i18n; > > * Profiling, tracing, and logging in complex and large code bases. > > The usual solution for storing state is to use a Thread-local Storage > (TLS), implemented in the standard library as ``threading.local()``. > Unfortunately, TLS does not work for the purpose of state isolation > for generators or asynchronous code, because such code executes > concurrently in a single thread. > > > Rationale > ========= > > Traditionally, a Thread-local Storage (TLS) is used for storing the > state. However, the major flaw of using the TLS is that it works only > for multi-threaded code. It is not possible to reliably contain the > state within a generator or a coroutine. For example, consider > the following generator:: > > def calculate(precision, ...): > with decimal.localcontext() as ctx: > # Set the precision for decimal calculations > # inside this block > ctx.prec = precision > > yield calculate_something() > yield calculate_something_else() > > Decimal context is using a TLS to store the state, and because TLS is > not aware of generators, the state can leak. If a user iterates over > the ``calculate()`` generator with different precisions one by one > using a ``zip()`` built-in, the above code will not work correctly. > For example:: > > g1 = calculate(precision=100) > g2 = calculate(precision=50) > > items = list(zip(g1, g2)) > > # items[0] will be a tuple of: > # first value from g1 calculated with 100 precision, > # first value from g2 calculated with 50 precision. > # > # items[1] will be a tuple of: > # second value from g1 calculated with 50 precision (!!!), > # second value from g2 calculated with 50 precision. > > An even scarier example would be using decimals to represent money > in an async/await application: decimal calculations can suddenly > lose precision in the middle of processing a request. Currently, > bugs like this are extremely hard to find and fix. > > Another common need for web applications is to have access to the > current request object, or security context, or, simply, the request > URL for logging or submitting performance tracing data:: > > async def handle_http_request(request): > context.current_http_request = request > > await ... > # Invoke your framework code, render templates, > # make DB queries, etc, and use the global > # 'current_http_request' in that code. > > # This isn't currently possible to do reliably > # in asyncio out of the box. > > These examples are just a few out of many, where a reliable way to > store context data is absolutely needed. > > The inability to use TLS for asynchronous code has lead to > proliferation of ad-hoc solutions, which are limited in scope and > do not support all required use cases. > > Current status quo is that any library, including the standard > library, that uses a TLS, will likely not work as expected in > asynchronous code or with generators (see [3]_ as an example issue.) > > Some languages that have coroutines or generators recommend to > manually pass a ``context`` object to every function, see [1]_ > describing the pattern for Go. This approach, however, has limited > use for Python, where we have a huge ecosystem that was built to work > with a TLS-like context. Moreover, passing the context explicitly > does not work at all for libraries like ``decimal`` or ``numpy``, > which use operator overloading. > > .NET runtime, which has support for async/await, has a generic > solution of this problem, called ``ExecutionContext`` (see [2]_). > On the surface, working with it is very similar to working with a TLS, > but the former explicitly supports asynchronous code. > > > Goals > ===== > > The goal of this PEP is to provide a more reliable alternative to > ``threading.local()``. It should be explicitly designed to work with > Python execution model, equally supporting threads, generators, and > coroutines. > > An acceptable solution for Python should meet the following > requirements: > > * Transparent support for code executing in threads, coroutines, > and generators with an easy to use API. > > * Negligible impact on the performance of the existing code or the > code that will be using the new mechanism. > > * Fast C API for packages like ``decimal`` and ``numpy``. > > Explicit is still better than implicit, hence the new APIs should only > be used when there is no acceptable way of passing the state > explicitly. > > > Specification > ============= > > Execution Context is a mechanism of storing and accessing data specific > to a logical thread of execution. We consider OS threads, > generators, and chains of coroutines (such as ``asyncio.Task``) > to be variants of a logical thread. > > In this specification, we will use the following terminology: > > * **Local Context**, or LC, is a key/value mapping that stores the > context of a logical thread. > > * **Execution Context**, or EC, is an OS-thread-specific dynamic > stack of Local Contexts. > > * **Context Item**, or CI, is an object used to set and get values > from the Execution Context. > > Please note that throughout the specification we use simple > pseudo-code to illustrate how the EC machinery works. The actual > algorithms and data structures that we will use to implement the PEP > are discussed in the `Implementation Strategy`_ section. > > > Context Item Object > ------------------- > > The ``sys.new_context_item(description)`` function creates a > new ``ContextItem`` object. The ``description`` parameter is a > ``str``, explaining the nature of the context key for introspection > and debugging purposes. > > ``ContextItem`` objects have the following methods and attributes: > > * ``.description``: read-only description; > > * ``.set(o)`` method: set the value to ``o`` for the context item > in the execution context. > > * ``.get()`` method: return the current EC value for the context item. > Context items are initialized with ``None`` when created, so > this method call never fails. > > The below is an example of how context items can be used:: > > my_context = sys.new_context_item(description='mylib.context') > my_context.set('spam') > > # Later, to access the value of my_context: > print(my_context.get()) > > > Thread State and Multi-threaded code > ------------------------------------ > > Execution Context is implemented on top of Thread-local Storage. > For every thread there is a separate stack of Local Contexts -- > mappings of ``ContextItem`` objects to their values in the LC. > New threads always start with an empty EC. > > For CPython:: > > PyThreadState: > execution_context: ExecutionContext([ > LocalContext({ci1: val1, ci2: val2, ...}), > ... > ]) > > The ``ContextItem.get()`` and ``.set()`` methods are defined as > follows (in pseudo-code):: > > class ContextItem: > > def get(self): > tstate = PyThreadState_Get() > > for local_context in reversed(tstate.execution_context): > if self in local_context: > return local_context[self] > > def set(self, value): > tstate = PyThreadState_Get() > > if not tstate.execution_context: > tstate.execution_context = [LocalContext()] > > tstate.execution_context[-1][self] = value > > With the semantics defined so far, the Execution Context can already > be used as an alternative to ``threading.local()``:: > > def print_foo(): > print(ci.get() or 'nothing') > > ci = sys.new_context_item(description='test') > ci.set('foo') > > # Will print "foo": > print_foo() > > # Will print "nothing": > threading.Thread(target=print_foo).start() > > > Manual Context Management > ------------------------- > > Execution Context is generally managed by the Python interpreter, > but sometimes it is desirable for the user to take the control > over it. A few examples when this is needed: > > * running a computation in ``concurrent.futures.ThreadPoolExecutor`` > with the current EC; > > * reimplementing generators with iterators (more on that later); > > * managing contexts in asynchronous frameworks (implement proper > EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.) > > For these purposes we add a set of new APIs (they will be used in > later sections of this specification): > > * ``sys.new_local_context()``: create an empty ``LocalContext`` > object. > > * ``sys.new_execution_context()``: create an empty > ``ExecutionContext`` object. > > * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque > to Python code, and there are no APIs to modify them. > > * ``sys.get_execution_context()`` function. The function returns a > copy of the current EC: an ``ExecutionContext`` instance. > > The runtime complexity of the actual implementation of this function > can be O(1), but for the purposes of this section it is equivalent > to:: > > def get_execution_context(): > tstate = PyThreadState_Get() > return copy(tstate.execution_context) > > * ``sys.run_with_execution_context(ec: ExecutionContext, func, *args, > **kwargs)`` runs ``func(*args, **kwargs)`` in the provided execution > context:: > > def run_with_execution_context(ec, func, *args, **kwargs): > tstate = PyThreadState_Get() > > old_ec = tstate.execution_context > > tstate.execution_context = ExecutionContext( > ec.local_contexts + [LocalContext()] > ) > > try: > return func(*args, **kwargs) > finally: > tstate.execution_context = old_ec > > Any changes to Local Context by ``func`` will be ignored. > This allows to reuse one ``ExecutionContext`` object for multiple > invocations of different functions, without them being able to > affect each other's environment:: > > ci = sys.new_context_item('example') > ci.set('spam') > > def func(): > print(ci.get()) > ci.set('ham') > > ec = sys.get_execution_context() > > sys.run_with_execution_context(ec, func) > sys.run_with_execution_context(ec, func) > > # Will print: > # spam > # spam > > * ``sys.run_with_local_context(lc: LocalContext, func, *args, > **kwargs)`` runs ``func(*args, **kwargs)`` in the current execution > context using the specified local context. > > Any changes that ``func`` does to the local context will be > persisted in ``lc``. This behaviour is different from the > ``run_with_execution_context()`` function, which always creates > a new throw-away local context. > > In pseudo-code:: > > def run_with_local_context(lc, func, *args, **kwargs): > tstate = PyThreadState_Get() > > old_ec = tstate.execution_context > > tstate.execution_context = ExecutionContext( > old_ec.local_contexts + [lc] > ) > > try: > return func(*args, **kwargs) > finally: > tstate.execution_context = old_ec > > Using the previous example:: > > ci = sys.new_context_item('example') > ci.set('spam') > > def func(): > print(ci.get()) > ci.set('ham') > > ec = sys.get_execution_context() > lc = sys.new_local_context() > > sys.run_with_local_context(lc, func) > sys.run_with_local_context(lc, func) > > # Will print: > # spam > # ham > > As an example, let's make a subclass of > ``concurrent.futures.ThreadPoolExecutor`` that preserves the execution > context for scheduled functions:: > > class Executor(concurrent.futures.ThreadPoolExecutor): > > def submit(self, fn, *args, **kwargs): > context = sys.get_execution_context() > > fn = functools.partial( > sys.run_with_execution_context, context, > fn, *args, **kwargs) > > return super().submit(fn) > > > EC Semantics for Coroutines > --------------------------- > > Python :pep:`492` coroutines are used to implement cooperative > multitasking. For a Python end-user they are similar to threads, > especially when it comes to sharing resources or modifying > the global state. > > An event loop is needed to schedule coroutines. Coroutines that > are explicitly scheduled by the user are usually called Tasks. > When a coroutine is scheduled, it can schedule other coroutines using > an ``await`` expression. In async/await world, awaiting a coroutine > is equivalent to a regular function call in synchronous code. Thus, > Tasks are similar to threads. > > By drawing a parallel between regular multithreaded code and > async/await, it becomes apparent that any modification of the > execution context within one Task should be visible to all coroutines > scheduled within it. Any execution context modifications, however, > must not be visible to other Tasks executing within the same OS > thread. > > > Coroutine Object Modifications > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > To achieve this, a small set of modifications to the coroutine object > is needed: > > * New ``cr_local_context`` attribute. This attribute is readable > and writable for Python code. > > * When a coroutine object is instantiated, its ``cr_local_context`` > is initialized with an empty Local Context. > > * Coroutine's ``.send()`` and ``.throw()`` methods are modified as > follows (in pseudo-C):: > > if coro.cr_local_context is not None: > tstate = PyThreadState_Get() > > tstate.execution_context.push(coro.cr_local_context) > > try: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro.send(...) > finally: > coro.cr_local_context = tstate.execution_context.pop() > else: > # Perform the actual `Coroutine.send()` or > # `Coroutine.throw()` call. > return coro.send(...) > > * When Python interpreter sees an ``await`` instruction, it inspects > the ``cr_local_context`` attribute of the coroutine that is about > to be awaited. For ``await coro``: > > * If ``coro.cr_local_context`` is an empty ``LocalContext`` object > that ``coro`` was created with, the interpreter will set > ``coro.cr_local_context`` to ``None``. > > * If ``coro.cr_local_context`` was modified by Python code, the > interpreter will leave it as is. > > This makes any changes to execution context made by nested coroutine > calls within a Task to be visible throughout the Task:: > > ci = sys.new_context_item('example') > > async def nested(): > ci.set('nested') > > asynd def main(): > ci.set('main') > print('before:', ci.get()) > await nested() > print('after:', ci.get()) > > # Will print: > # before: main > # after: nested > > Essentially, coroutines work with Execution Context items similarly > to threads, and ``await`` expression acts like a function call. > > This mechanism also works for ``yield from`` in generators decorated > with ``@types.coroutine`` or ``@asyncio.coroutine``, which are > called "generator-based coroutines" according to :pep:`492`, > and should be fully compatible with native async/await coroutines. > > > Tasks > ^^^^^ > > In asynchronous frameworks like asyncio, coroutines are run by > an event loop, and need to be explicitly scheduled (in asyncio > coroutines are run by ``asyncio.Task``.) > > With the currently defined semantics, the interpreter makes > coroutines linked by an ``await`` expression share the same > Local Context. > > The interpreter, however, is not aware of the Task concept, and > cannot help with ensuring that new Tasks started in coroutines, > use the correct EC:: > > current_request = sys.new_context_item(description='request') > > async def child(): > print('current request:', repr(current_request.get())) > > async def handle_request(request): > current_request.set(request) > event_loop.create_task(child) > > run(top_coro()) > > # Will print: > # current_request: None > > To enable correct Execution Context propagation into Tasks, the > asynchronous framework needs to assist the interpreter: > > * When ``create_task`` is called, it should capture the current > execution context with ``sys.get_execution_context()`` and save it > on the Task object. > > * When the Task object runs its coroutine object, it should execute > ``.send()`` and ``.throw()`` methods within the captured > execution context, using the ``sys.run_with_execution_context()`` > function. > > With help from the asynchronous framework, the above snippet will > run correctly, and the ``child()`` coroutine will be able to access > the current request object through the ``current_request`` > Context Item. > > > Event Loop Callbacks > ^^^^^^^^^^^^^^^^^^^^ > > Similarly to Tasks, functions like asyncio's ``loop.call_soon()`` > should capture the current execution context with > ``sys.get_execution_context()`` and execute callbacks > within it with ``sys.run_with_execution_context()``. > > This way the following code will work:: > > current_request = sys.new_context_item(description='request') > > def log(): > request = current_request.get() > print(request) > > async def request_handler(request): > current_request.set(request) > get_event_loop.call_soon(log) > > > Generators > ---------- > > Generators in Python, while similar to Coroutines, are used in a > fundamentally different way. They are producers of data, and > they use ``yield`` expression to suspend/resume their execution. > > A crucial difference between ``await coro`` and ``yield value`` is > that the former expression guarantees that the ``coro`` will be > executed fully, while the latter is producing ``value`` and > suspending the generator until it gets iterated again. > > Generators, similarly to coroutines, have a ``gi_local_context`` > attribute, which is set to an empty Local Context when created. > > Contrary to coroutines though, ``yield from o`` expression in > generators (that are not generator-based coroutines) is semantically > equivalent to ``for v in o: yield v``, therefore the interpreter does > not attempt to control their ``gi_local_context``. > > > EC Semantics for Generators > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Every generator object has its own Local Context that stores > only its own local modifications of the context. When a generator > is being iterated, its local context will be put in the EC stack > of the current thread. This means that the generator will be able > to see access items from the surrounding context:: > > local = sys.new_context_item("local") > global = sys.new_context_item("global") > > def generator(): > local.set('inside gen:') > while True: > print(local.get(), global.get()) > yield > > g = gen() > > local.set('hello') > global.set('spam') > next(g) > > local.set('world') > global.set('ham') > next(g) > > # Will print: > # inside gen: spam > # inside gen: ham > > Any changes to the EC in nested generators are invisible to the outer > generator:: > > local = sys.new_context_item("local") > > def inner_gen(): > local.set('spam') > yield > > def outer_gen(): > local.set('ham') > yield from gen() > print(local.get()) > > list(outer_gen()) > > # Will print: > # ham > > > Running generators without LC > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Similarly to coroutines, generators with ``gi_local_context`` > set to ``None`` simply use the outer Local Context. > > The ``@contextlib.contextmanager`` decorator uses this mechanism to > allow its generator to affect the EC:: > > item = sys.new_context_item('test') > > @contextmanager > def context(x): > old = item.get() > item.set('x') > try: > yield > finally: > item.set(old) > > with context('spam'): > > with context('ham'): > print(1, item.get()) > > print(2, item.get()) > > # Will print: > # 1 ham > # 2 spam > > > Implementing Generators with Iterators > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > The Execution Context API allows to fully replicate EC behaviour > imposed on generators with a regular Python iterator class:: > > class Gen: > > def __init__(self): > self.local_context = sys.new_local_context() > > def __iter__(self): > return self > > def __next__(self): > return sys.run_with_local_context( > self.local_context, self._next_impl) > > def _next_impl(self): > # Actual __next__ implementation. > ... > > > Asynchronous Generators > ----------------------- > > Asynchronous Generators (AG) interact with the Execution Context > similarly to regular generators. > > They have an ``ag_local_context`` attribute, which, similarly to > regular generators, can be set to ``None`` to make them use the outer > Local Context. This is used by the new > ``contextlib.asynccontextmanager`` decorator. > > The EC support of ``await`` expression is implemented using the same > approach as in coroutines, see the `Coroutine Object Modifications`_ > section. > > > Greenlets > --------- > > Greenlet is an alternative implementation of cooperative > scheduling for Python. Although greenlet package is not part of > CPython, popular frameworks like gevent rely on it, and it is > important that greenlet can be modified to support execution > contexts. > > In a nutshell, greenlet design is very similar to design of > generators. The main difference is that for generators, the stack > is managed by the Python interpreter. Greenlet works outside of the > Python interpreter, and manually saves some ``PyThreadState`` > fields and pushes/pops the C-stack. Thus the ``greenlet`` package > can be easily updated to use the new low-level `C API`_ to enable > full support of EC. > > > New APIs > ======== > > Python > ------ > > Python APIs were designed to completely hide the internal > implementation details, but at the same time provide enough control > over EC and LC to re-implement all of Python built-in objects > in pure Python. > > 1. ``sys.new_context_item(description='...')``: create a > ``ContextItem`` object used to access/set values in EC. > > 2. ``ContextItem``: > > * ``.description``: read-only attribute. > * ``.get()``: return the current value for the item. > * ``.set(o)``: set the current value in the EC for the item. > > 3. ``sys.get_execution_context()``: return the current > ``ExecutionContext``. > > 4. ``sys.new_execution_context()``: create a new empty > ``ExecutionContext``. > > 5. ``sys.new_local_context()``: create a new empty ``LocalContext``. > > 6. ``sys.run_with_execution_context(ec: ExecutionContext, > func, *args, **kwargs)``. > > 7. ``sys.run_with_local_context(lc:LocalContext, > func, *args, **kwargs)``. > > > C API > ----- > > 1. ``PyContextItem * PyContext_NewItem(char *desc)``: create a > ``PyContextItem`` object. > > 2. ``PyObject * PyContext_GetItem(PyContextItem *)``: get the > current value for the context item. > > 3. ``int PyContext_SetItem(PyContextItem *, PyObject *)``: set > the current value for the context item. > > 4. ``PyLocalContext * PyLocalContext_New()``: create a new empty > ``PyLocalContext``. > > 5. ``PyLocalContext * PyExecutionContext_New()``: create a new empty > ``PyExecutionContext``. > > 6. ``PyExecutionContext * PyExecutionContext_Get()``: get the > EC for the active thread state. > > 7. ``int PyExecutionContext_Set(PyExecutionContext *)``: set the > passed EC object as the current for the active thread state. > > 8. ``int PyExecutionContext_SetWithLocalContext(PyExecutionContext *, > PyLocalContext *)``: allows to implement > ``sys.run_with_local_context`` Python API. > > > Implementation Strategy > ======================= > > LocalContext is a Weak Key Mapping > ---------------------------------- > > Using a weak key mapping for ``LocalContext`` implementation > enables the following properties with regards to garbage > collection: > > * ``ContextItem`` objects are strongly-referenced only from the > application code, not from any of the Execution Context > machinery or values they point to. This means that there > are no reference cycles that could extend their lifespan > longer than necessary, or prevent their garbage collection. > > * Values put in the Execution Context are guaranteed to be kept > alive while there is a ``ContextItem`` key referencing them in > the thread. > > * If a ``ContextItem`` is garbage collected, all of its values will > be removed from all contexts, allowing them to be GCed if needed. > > * If a thread has ended its execution, its thread state will be > cleaned up along with its ``ExecutionContext``, cleaning > up all values bound to all Context Items in the thread. > > > ContextItem.get() Cache > ----------------------- > > We can add three new fields to ``PyThreadState`` and > ``PyInterpreterState`` structs: > > * ``uint64_t PyThreadState->unique_id``: a globally unique > thread state identifier (we can add a counter to > ``PyInterpreterState`` and increment it when a new thread state is > created.) > > * ``uint64_t PyInterpreterState->context_item_deallocs``: every time > a ``ContextItem`` is GCed, all Execution Contexts in all threads > will lose track of it. ``context_item_deallocs`` will simply > count all ``ContextItem`` deallocations. > > * ``uint64_t PyThreadState->execution_context_ver``: every time > a new item is set, or an existing item is updated, or the stack > of execution contexts is changed in the thread, we increment this > counter. > > The above two fields allow implementing a fast cache path in > ``ContextItem.get()``, in pseudo-code:: > > class ContextItem: > > def get(self): > tstate = PyThreadState_Get() > > if (self.last_tstate_id == tstate.unique_id and > self.last_ver == tstate.execution_context_ver > self.last_deallocs == > tstate.iterp.context_item_deallocs): > return self.last_value > > value = None > for mapping in reversed(tstate.execution_context): > if self in mapping: > value = mapping[self] > break > > self.last_value = value > self.last_tstate_id = tstate.unique_id > self.last_ver = tstate.execution_context_ver > self.last_deallocs = tstate.interp.context_item_deallocs > > return value > > This is similar to the trick that decimal C implementation uses > for caching the current decimal context, and will have the same > performance characteristics, but available to all > Execution Context users. > > > Approach #1: Use a dict for LocalContext > ---------------------------------------- > > The straightforward way of implementing the proposed EC > mechanisms is to create a ``WeakKeyDict`` on top of Python > ``dict`` type. > > To implement the ``ExecutionContext`` type we can use Python > ``list`` (or a custom stack implementation with some > pre-allocation optimizations). > > This approach will have the following runtime complexity: > > * O(M) for ``ContextItem.get()``, where ``M`` is the number of > Local Contexts in the stack. > > It is important to note that ``ContextItem.get()`` will implement > a cache making the operation O(1) for packages like ``decimal`` > and ``numpy``. > > * O(1) for ``ContextItem.set()``. > > * O(N) for ``sys.get_execution_context()``, where ``N`` is the > total number of items in the current **execution** context. > > > Approach #2: Use HAMT for LocalContext > -------------------------------------- > > Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) > to implement high performance immutable collections [5]_, [6]_. > > Immutable mappings implemented with HAMT have O(log\ :sub:`32`\ N) > performance for both ``set()``, ``get()``, and ``merge()`` operations, > which is essentially O(1) for relatively small mappings > (read about HAMT performance in CPython in the > `Appendix: HAMT Performance`_ section.) > > In this approach we use the same design of the ``ExecutionContext`` > as in Approach #1, but we will use HAMT backed weak key Local Context > implementation. With that we will have the following runtime > complexity: > > * O(M * log\ :sub:`32`\ N) for ``ContextItem.get()``, > where ``M`` is the number of Local Contexts in the stack, > and ``N`` is the number of items in the EC. The operation will > essentially be O(M), because execution contexts are normally not > expected to have more than a few dozen of items. > > (``ContextItem.get()`` will have the same caching mechanism as in > Approach #1.) > > * O(log\ :sub:`32`\ N) for ``ContextItem.set()`` where ``N`` is the > number of items in the current **local** context. This will > essentially be an O(1) operation most of the time. > > * O(log\ :sub:`32`\ N) for ``sys.get_execution_context()``, where > ``N`` is the total number of items in the current **execution** > context. > > Essentially, using HAMT for Local Contexts instead of Python dicts, > allows to bring down the complexity of ``sys.get_execution_context()`` > from O(N) to O(log\ :sub:`32`\ N) because of the more efficient > merge algorithm. > > > Approach #3: Use HAMT and Immutable Linked List > ----------------------------------------------- > > We can make an alternative ``ExecutionContext`` design by using > a linked list. Each ``LocalContext`` in the ``ExecutionContext`` > object will be wrapped in a linked-list node. > > ``LocalContext`` objects will use an HAMT backed weak key > implementation described in the Approach #2. > > Every modification to the current ``LocalContext`` will produce a > new version of it, which will be wrapped in a **new linked list > node**. Essentially this means, that ``ExecutionContext`` is an > immutable forest of ``LocalContext`` objects, and can be safely > copied by reference in ``sys.get_execution_context()`` (eliminating > the expensive "merge" operation.) > > With this approach, ``sys.get_execution_context()`` will be an > **O(1) operation**. > > > Summary > ------- > > We believe that approach #3 enables an efficient and complete > Execution Context implementation, with excellent runtime performance. > > `ContextItem.get() Cache`_ enables fast retrieval of context items > for performance critical libraries like decimal and numpy. > > Fast ``sys.get_execution_context()`` enables efficient management > of execution contexts in asynchronous libraries like asyncio. > > > Design Considerations > ===================== > > Can we fix ``PyThreadState_GetDict()``? > --------------------------------------- > > ``PyThreadState_GetDict`` is a TLS, and some of its existing users > might depend on it being just a TLS. Changing its behaviour to follow > the Execution Context semantics would break backwards compatibility. > > > PEP 521 > ------- > > :pep:`521` proposes an alternative solution to the problem: > enhance Context Manager Protocol with two new methods: ``__suspend__`` > and ``__resume__``. To make it compatible with async/await, > the Asynchronous Context Manager Protocol will also need to be > extended with ``__asuspend__`` and ``__aresume__``. > > This allows to implement context managers like decimal context and > ``numpy.errstate`` for generators and coroutines. > > The following code:: > > class Context: > > def __enter__(self): > self.old_x = get_execution_context_item('x') > set_execution_context_item('x', 'something') > > def __exit__(self, *err): > set_execution_context_item('x', self.old_x) > > would become this:: > > local = threading.local() > > class Context: > > def __enter__(self): > self.old_x = getattr(local, 'x', None) > local.x = 'something' > > def __suspend__(self): > local.x = self.old_x > > def __resume__(self): > local.x = 'something' > > def __exit__(self, *err): > local.x = self.old_x > > Besides complicating the protocol, the implementation will likely > negatively impact performance of coroutines, generators, and any code > that uses context managers, and will notably complicate the > interpreter implementation. > > :pep:`521` also does not provide any mechanism to propagate state > in a local context, like storing a request object in an HTTP request > handler to have better logging. Nor does it solve the leaking state > problem for greenlet/gevent. > > > Can Execution Context be implemented outside of CPython? > -------------------------------------------------------- > > Because async/await code needs an event loop to run it, an EC-like > solution can be implemented in a limited way for coroutines. > > Generators, on the other hand, do not have an event loop or > trampoline, making it impossible to intercept their ``yield`` points > outside of the Python interpreter. > > > Backwards Compatibility > ======================= > > This proposal preserves 100% backwards compatibility. > > > Appendix: HAMT Performance > ========================== > > To assess if HAMT can be used for Execution Context, we implemented > it in CPython [7]_. > > .. figure:: pep-0550-hamt_vs_dict.png > :align: center > :width: 100% > > Figure 1. Benchmark code can be found here: [9]_. > > Figure 1 shows that HAMT indeed displays O(1) performance for all > benchmarked dictionary sizes. For dictionaries with less than 100 > items, HAMT is a bit slower than Python dict/shallow copy. > > .. figure:: pep-0550-lookup_hamt.png > :align: center > :width: 100% > > Figure 2. Benchmark code can be found here: [10]_. > > Figure 2 shows comparison of lookup costs between Python dict > and an HAMT immutable mapping. HAMT lookup time is 30-40% worse > than Python dict lookups on average, which is a very good result, > considering how well Python dicts are optimized. > > Note, that according to [8]_, HAMT design can be further improved. > > > Acknowledgments > =============== > > I thank Elvis Pranskevichus and Victor Petrovykh for countless > discussions around the topic and PEP proof reading and edits. > > Thanks to Nathaniel Smith for proposing the ``ContextItem`` design > [17]_ [18]_, for pushing the PEP towards a more complete design, and > coming up with the idea of having a stack of contexts in the thread > state. > > Thanks to Nick Coghlan for numerous suggestions and ideas on the > mailing list, and for coming up with a case that cause the complete > rewrite of the initial PEP version [19]_. > > > References > ========== > > .. [1] https://blog.golang.org/context > > .. [2] https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx > > .. [3] https://github.com/numpy/numpy/issues/9444 > > .. [4] http://bugs.python.org/issue31179 > > .. [5] https://en.wikipedia.org/wiki/Hash_array_mapped_trie > > .. [6] http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html > > .. [7] https://github.com/1st1/cpython/tree/hamt > > .. [8] https://michael.steindorfer.name/publications/oopsla15.pdf > > .. [9] https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd > > .. [10] https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e > > .. [11] https://github.com/1st1/cpython/tree/pep550 > > .. [12] https://www.python.org/dev/peps/pep-0492/#async-await > > .. [13] https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py > > .. [14] https://github.com/MagicStack/pgbench > > .. [15] https://github.com/python/performance > > .. [16] https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c > > .. [17] https://mail.python.org/pipermail/python-ideas/2017-August/046752.html > > .. [18] https://mail.python.org/pipermail/python-ideas/2017-August/046772.html > > .. [19] https://mail.python.org/pipermail/python-ideas/2017-August/046780.html > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From mistersheik at gmail.com Sun Aug 20 21:32:19 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 20 Aug 2017 18:32:19 -0700 (PDT) Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase Message-ID: This question describes an example of the problem: https://stackoverflow.com/questions/8416208/in-python-is-there-a-good-idiom-for-using-context-managers-in-setup-teardown. You want to invoke a context manager in your setup/tearing-down, but the easiest way to do that is to override run, which seems ugly. Why not add two methods to unittest.TestCase whose default implementations are given below: class TestCase: @contextmanager def method_context(self): self.setUp() try: yield finally: self.tearDown() @contextmanager def class_context(self): self.setUpClass() try: yield finally: self.tearDown() Then, if for example someone wants to use a context manager in setUp, they can do so: class SomeTest(TestCase): @contextmanager def method_context(self): with np.errstate(all='raise'): with super().method_context(): yield Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Mon Aug 21 12:02:17 2017 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 21 Aug 2017 12:02:17 -0400 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: On 8/20/17 9:32 PM, Neil Girdhar wrote: > This question describes an example of the > problem: https://stackoverflow.com/questions/8416208/in-python-is-there-a-good-idiom-for-using-context-managers-in-setup-teardown. > You want to invoke a context manager in your setup/tearing-down, but > the easiest way to do that is to override run, which seems ugly. > > Why not add two methods to unittest.TestCase whose default > implementations are given below: > > class TestCase: > > @contextmanager > def method_context(self): > self.setUp() > try: > yield > finally: > self.tearDown() > > @contextmanager > def class_context(self): > self.setUpClass() > try: > yield > finally: > self.tearDown() > > > Then, if for example someone wants to use a context manager in setUp, > they can do so: > > class SomeTest(TestCase): > > @contextmanager > def method_context(self): > with np.errstate(all='raise'): > with super().method_context(): > yield > > Best, > > Neil > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ I've achieved a similar effect with this: def setup_with_context_manager(testcase, cm): """Use a contextmanager to setUp a test case. If you have a context manager you like:: with ctxmgr(a, b, c) as v: # do something with v and you want to have that effect for a test case, call this function from your setUp, and it will start the context manager for your test, and end it when the test is done:: def setUp(self): self.v = setup_with_context_manager(self, ctxmgr(a, b, c)) def test_foo(self): # do something with self.v """ val = cm.__enter__() testcase.addCleanup(cm.__exit__, None, None, None) return val I think the use is easier than yours, which needs too much super and @contextmanager boilerplate. --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Mon Aug 21 17:05:46 2017 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Mon, 21 Aug 2017 17:05:46 -0400 Subject: [Python-ideas] PEP 550 dumbed down Message-ID: I know I'm not the only one who is confused by at least some of the alternative terminology choices. I suspect I'm not the only one who sometimes missed part of the argument because I was distracted figuring out what the objects were, and forgot to verify what was being done and why. I also suspect that it could be much simpler to follow if the API were designed in the abstract, with the implementation left for later. So is the following API missing anything important? (1) Get the current (writable) context. Currently proposed as a sys.* call, but I think injecting to __builtins__ or globals would work as well. (2) Get a value from the current context, by string key. Currently proposed as key.get, rather env.__getitem__ (3) Write a value to the current context, by string key. Currently proposed as key.set, rather env.__setitem__ (4) Create a new (writable) empty context. (5) Create a copy of the current context, so that changes can be isolated. The copy will not be able to change anything in the current context, though it can shadow keys. (6) Choose which context to use when calling another function/generator/iterator/etc/ At this point, it looks an awful lot like a subset of ChainMap, except that: (A) The current mapping is available through a series of sys.* calls. (why not a builtin? Or at least a global, injected when a different environment is needed?) (B) Concurrency APIs are supposed to ensure that each process/thread/Task/worker is using its own private context, unless the call explicitly requests a shared or otherwise different context. (C) The current API requires users to initialize every key before it can be added to a context. This is presumably to support limits of the proposed implementation. If the semantics are right, and collections.ChainMap is rejected only for efficiency, please say so in the PEP. If the semantics are wrong, please explain how they differ. Sample code: olduser=env["username"] env["reason"] = "Spanish Inquisition" with env.copy(): env["username"] = "secret admin" foo() print ("debugging", env["foodebug"]) bar() with env.empty(): assert "username" not in env assert env["username"] is olduser -jJ From chris.barker at noaa.gov Mon Aug 21 18:44:52 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 21 Aug 2017 15:44:52 -0700 Subject: [Python-ideas] How do you think about these language extensions?(Thautwarm) In-Reply-To: References: Message-ID: On Sat, Aug 19, 2017 at 3:34 AM, ?? ? wrote: > Could you please think this way to define a variable: > > >> var = expr() -> g1(_) if f(_) else g2(_) > > which equals > > >> test = f(expr()) > >> var = g1(test) if f(test) else g2(test) > OK, I do see this as a nice way to avoid as many "temp" variables, though in this example, I am confused: in the above version, It seems to me that the equivelent wordy version is: temp = expr() var = g1(temp) if f(temp) else g(temp) rather than the f(expr()) -- i.e. you seem to have called f() on expr and extra time? Maybe just a typo. or, of course: var = g1(expr()) if f(expr()) else g(expr()) which I can see would be bad if expr() is expensive (or even worse, has side effects) so I'm coming around this this, though you could currently write that as: _ = expr(); g1(_) if f(_) else g(_) not that different! Also, there is something to be said for giving a name to expr() -- it may make the code more readable. In another words,I can clearly state what I mean to say in order of my > thinking. > well, in the above case, particularly if you use a meaningful name, rather than "temp" or "test", then you are still writing in in the order of meaning. (though elsewhere in this thread there are better examples of how the current nested function call syntax does reverse the logical order of operations) > For example, > > >> lambda x: f(g(x)) -> map(_, range(100)) > > The codes above means that I'm stressing on what(an action) I'm going to > do on an object "range(100)". > This is still putting the range(100) at the end of the expression, rather than making it clear that you are starting with it. and putting that much logic in a lambda can be confusing -- in fact, I'm still not sure what that does! (I guess I am still not sure of the order of operations is the lambda expression (f(g(x))) or the whole thing? if not the whole thing, then: is it the same as ?: (f(g(x)) for x in range(100)) I'm also seeing a nested function there -- f(g(x)) which is what I thought you were trying to avoid -- maybe: lambda x: (g(x) -> f(_)) -> map(_, range(100)) ??? In general, much of this seems to be trying to make map cleaner or more clear -- but python has comprehensions, which so far work better, and are more compact and clear for the examples you have provided. granted, deeply nested comprehensions can be pretty ugly -- maybe this will be clearer for those?? However, sometimes the actions are not important, so if we want to stress > on what we're going to do something on, we write this codes: > > >> range(100) -> map( lambda x:f(g(x)), _ ) > OK, so THAT makes more sense to me -- start with the "source data", then go to the action on it. but again, is that really clearer than the comprehension (generator expression - why don't we call that a generator comprehension?): (f(g(x)) for x in range(100)) maybe this would be better: range(100) -> (f(g(x)) for x in _) it does put the source data up front -- and could be nicer for nested comprehensions. Hmm, maybe this is an example of the kind of thing I've needed to do is illustrative: [s.upper() for s in (s.replace('"','') for s in (s.strip() for s in line.split()))] would be better as: line.split() -> (s.strip() for s in _) -> (s.replace('"','') for s in _) -> [s.upper() for s in _] though, actually, really best as: [s.strip().replace('"','').upper() for s in line.split()] (which only works for methods, not general functions) but for functions: [fun3(fun2(fun1(x))) for x in an_iterable] so, backwards logic, but that's it for the benefit. So still having a hard time comeing up with an example that's notable better... >> someone -> dosomething( _, options=options) \ > -> is_meeting_some_conditions( _ ) \ > -> result1() if _ else result2() where: > options = ... > result1 = lambda: ... > result2 = lambda: ... > def dosomething(obj, options) -> Any: > ... > > def is_meeting_some_conditions( event : Any ) -> bool : > ... > again with the lambdas -- this is all making me think that this is about making Python a better functional language, which I'm not sure is a goal of Python... but anyway, the real extra there is the where: clause But that seems to be doing the opposite -- putting the definitions of what you are actually doing AFTER the logic> I'm going to chain all this logic together and by the way, this is what that logic is... If we really wanted to have a kind of context like that, maybe something more like a context manager on the fly: with: options = ... result1 = lambda: ... result2 = lambda: ... def dosomething(obj, options) -> Any: ... def is_meeting_some_conditions( event : Any ) -> bool : ... do: (result1() if is_meeting_some_conditions( dosomething( someone, options=options)) else result2() > Also, we need to remember that functions can take *args, **kwargs, etc, > > and can return a tuple of just about anything -- not sure how well that > > maps to the "pipe" model. > > I think that using "pipe" model cannot be the right choice. > > We don't need to worry about this problem if we use the grammar I've > implemented yet :) > > >> (lambda x: (x%5, x) ) -> max( range(99), key = _) > >> 94 > > >> def max_from_seq(*args): return max(args) > >> [1,2,3] -> max_from_seq(*_) > >> 3 > this gets uglier if we have both *args and **kwargs..... Which maybe is OK -- don't use it with complex structures like that. For example, sometimes we just need to know that surface area of a > cylinder is > > 2*S_top + S_side > > If someone see the codes, he may not need to know how S_top and S_side > are evaluated,getting > a knowledge of what it means to is enough. > And if you want to get more about how to evaluate S_side and S_top, just > see > the next "where syntax" and find the answers. > how is that clearer than: S_topo = something S_side = something else surface_area = 2*S_top + S_side ??? (Or, of course, defining a function) Sure, we see the: some expression..."where" some definitions structure a lot in technical papers, but frankly: I'd probably rather see the definitions first and/or the definitions are often only there to support you if you don't already know the nomenclature -- when you go back to read the paper again, you may not need the where. Coding is different, I'd rather see stuff defined BEFORE it is used. >> Here is an example to use flowpython, which gives the permutations of a > sequence. > >> > >> from copy import deepcopy > >> permutations = .seq -> seq_seq where: > >> condic+[] seq: > >> case (a, ) => seq_seq = [a,] > >> case (a, b) => seq_seq = [[a,b],[b,a]] > >> case (a,*b) => > >> seq_seq = permutations(b) -> map(.x -> insertAll(x, a), > _) -> sum(_, []) where: > >> insertAll = . x, a -> ret where: > >> ret = [ deepcopy(x) -> _.insert(i, a) or _ for > i in (len(x) -> range(_+1)) ] > > > I find that almost unreadable. > me too. > Too many new features all at once, it's > > like trying to read a completely unfamiliar language. > exactly -- this seems to be an effort to make Python a different language! This algorithm can be fixed a little because the second case is redundant. > And here is the regular Python codes transformed > from the codes above. > looks like we lost indenting, so I'm going to try to fix that: from copy import deepcopy def permutations(seq): try: # the first case (a, ) = seq return [a ,] except: try: # the third case (the second case is redundant) def insertAll(x, a): # insertAll([1,2,3], 0) -> [[0, 1, 2, 3], [1, 0, 2, 3], [1, 2, 0, 3], [1, 2, 3, 0]] ret = [] for i in range( len(x) + 1 ): tmp = deepcopy(x) tmp.insert(i, a) ret.append(tmp) return ret (a, *b) = seq tmp = permutations(b) tmp = map(lambda x : insertAll(x, a) , tmp) return sum(tmp, []) # sum([[1,2,3], [-1,-2,-3]], []) -> [1,2,3,-1,-2,-3] except: # no otherwise! pass Have I got that right? but anyway, there has GOT to be a more pythonic way to write that! And I say that because this feels to me like trying to write functional code in Python in an unnatural-for-python way, then saying we need to add features to python to make that natural. SoL I think the challenge is: find some nice compeling examples write them in a nice pythonic way show us that that these new features would allow a cleaner, more readable solution. Steven did have a nice example of that: result = (myfile.readlines() -> map(str.strip) -> filter( lambda s: not s.startwith('#') ) -> sorted -> collapse # collapse runs of identical lines -> extract_dates -> map(date_to_seconds) -> min ) Though IIUC, the proposal would make that: result = (myfile.readlines() -> map(str.strip, _) -> filter( lambda s: not s.startwith('#'), _ ) -> sorted( _ ) -> collapse( _ ) # collapse runs of identical lines -> extract_dates( _ ) -> map(date_to_seconds, _) -> min(_) ) The current Python for that might be: result = min((date_to_seconds(d) for d in extract_dates( collapse( sorted([s for s in (s.strip() for line in myfile.readlines) if not s.startswith] ))))) Which really does make the point that nesting comprehension gets ugly fast! So "don't do that": lines = collapse(sorted((l.strip().split("#")[0] for l in myfile.readlines()))) dates = min((date_to_seconds(extract_date(l)) for l in lines)) or any number of other ways -- clearer, less clear?? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Aug 21 19:56:57 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 19:56:57 -0400 Subject: [Python-ideas] PEP 550 dumbed down In-Reply-To: References: Message-ID: Hi Jim, In short, yes, we can "dumb down" PEP 550 to a chain of maps. PEP 550 does the following on top of that dumbed down version: 0. Adds execution_context "chain" root to PyThreadState. 1. Extends (async-)generator objects to support this chaining -- each generator has its own "env" to accumulate its changes. 2. ContextKey is an object that we use to work with EC. Compared to using strings, using an object allows us to implement caching (important for numpy and decimal-like libs) and avoids name clashes. 3. Yes, efficiency is important. If you start an asyncio.Task, or schedule an asyncio callback, or want to run some code in a separate OS thread, you need to capture the current EC -- make a shallow copy of all LCs in it. That's expensive, and the PEP solves this problem by using special datastructures (a), and providing just enough APIs to work with the EC so that those datastructures are not exposed to the end user (b). 4. Provides common APIs that will be used by asyncio, decimal, numpy, etc. > (A) The current mapping is available through a series of sys.* calls. > (why not a builtin? Or at least a global, injected when a different > environment is needed?) This was never proposed :) I decided to put new APIs to the sys module as we usually are conservative about adding new globals, and the feature is low-level (like working with frames). > If the semantics are right, and collections.ChainMap is rejected only > for efficiency, please say so in the PEP. `collections.ChainMap` on its own is not a solution, it's one of possible implementations. Efficiency is indeed the reason why using ChainMap is not an option (see (3) above). This whole "capturing of execution context" topic is not covered well enough in the PEP, and is something that we'll fix in the next version (soon). Yury From greg at krypto.org Mon Aug 21 20:38:32 2017 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 22 Aug 2017 00:38:32 +0000 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: Neil, you might also bring this up on the http://lists.idyll.org/listinfo/testing-in-python list as I suspect people there have opinions on this topic. -gps On Mon, Aug 21, 2017 at 9:07 AM Ned Batchelder wrote: > On 8/20/17 9:32 PM, Neil Girdhar wrote: > > This question describes an example of the problem: > https://stackoverflow.com/questions/8416208/in-python-is-there-a-good-idiom-for-using-context-managers-in-setup-teardown. > You want to invoke a context manager in your setup/tearing-down, but the > easiest way to do that is to override run, which seems ugly. > > Why not add two methods to unittest.TestCase whose default implementations > are given below: > > class TestCase: > > @contextmanager > def method_context(self): > self.setUp() > try: > yield > finally: > self.tearDown() > > @contextmanager > def class_context(self): > self.setUpClass() > try: > yield > finally: > self.tearDown() > > > Then, if for example someone wants to use a context manager in setUp, they > can do so: > > class SomeTest(TestCase): > > @contextmanager > def method_context(self): > with np.errstate(all='raise'): > with super().method_context(): > yield > > Best, > > Neil > > > _______________________________________________ > Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > I've achieved a similar effect with this: > > def setup_with_context_manager(testcase, cm): > """Use a contextmanager to setUp a test case. > > If you have a context manager you like:: > > with ctxmgr(a, b, c) as v: > # do something with v > > and you want to have that effect for a test case, call this function > from > your setUp, and it will start the context manager for your test, and > end it > when the test is done:: > > def setUp(self): > self.v = setup_with_context_manager(self, ctxmgr(a, b, c)) > > def test_foo(self): > # do something with self.v > > """ > val = cm.__enter__() > testcase.addCleanup(cm.__exit__, None, None, None) > return val > > > I think the use is easier than yours, which needs too much super and > @contextmanager boilerplate. > > > --Ned. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Aug 21 23:53:17 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 21 Aug 2017 22:53:17 -0500 Subject: [Python-ideas] How do you think about these language extensions?(Thautwarm) In-Reply-To: References: Message-ID: https://github.com/kachayev/fn.py/blob/master/README.rst#scala-style-lambdas-definition On Monday, August 21, 2017, Chris Barker wrote: > > > On Sat, Aug 19, 2017 at 3:34 AM, ?? ? > wrote: > >> Could you please think this way to define a variable: >> >> >> var = expr() -> g1(_) if f(_) else g2(_) >> >> which equals >> >> >> test = f(expr()) >> >> var = g1(test) if f(test) else g2(test) >> > > OK, I do see this as a nice way to avoid as many "temp" variables, though > in this example, I am confused: > > in the above version, It seems to me that the equivelent wordy version is: > > temp = expr() > var = g1(temp) if f(temp) else g(temp) > > rather than the f(expr()) -- i.e. you seem to have called f() on expr and > extra time? Maybe just a typo. > > or, of course: > > var = g1(expr()) if f(expr()) else g(expr()) > > which I can see would be bad if expr() is expensive (or even worse, has > side effects) > > so I'm coming around this this, though you could currently write that as: > > _ = expr(); g1(_) if f(_) else g(_) > > not that different! > > Also, there is something to be said for giving a name to expr() -- it may > make the code more readable. > > > In another words,I can clearly state what I mean to say in order of my >> thinking. >> > > well, in the above case, particularly if you use a meaningful name, rather > than "temp" or "test", then you are still writing in in the order of > meaning. > > (though elsewhere in this thread there are better examples of how the > current nested function call syntax does reverse the logical order of > operations) > > >> For example, >> >> >> lambda x: f(g(x)) -> map(_, range(100)) >> >> The codes above means that I'm stressing on what(an action) I'm going to >> do on an object "range(100)". >> > > This is still putting the range(100) at the end of the expression, rather > than making it clear that you are starting with it. > > and putting that much logic in a lambda can be confusing -- in fact, I'm > still not sure what that does! (I guess I am still not sure of the order of > operations is the lambda expression (f(g(x))) or the whole thing? if not > the whole thing, then: > > is it the same as ?: > > (f(g(x)) for x in range(100)) > > I'm also seeing a nested function there -- f(g(x)) which is what I > thought you were trying to avoid -- maybe: > > lambda x: (g(x) -> f(_)) -> map(_, range(100)) > > ??? > > In general, much of this seems to be trying to make map cleaner or more > clear -- but python has comprehensions, which so far work better, and are > more compact and clear for the examples you have provided. > > granted, deeply nested comprehensions can be pretty ugly -- maybe this > will be clearer for those?? > > > However, sometimes the actions are not important, so if we want to stress >> on what we're going to do something on, we write this codes: >> >> >> range(100) -> map( lambda x:f(g(x)), _ ) >> > > OK, so THAT makes more sense to me -- start with the "source data", then > go to the action on it. > > but again, is that really clearer than the comprehension (generator > expression - why don't we call that a generator comprehension?): > > (f(g(x)) for x in range(100)) > > maybe this would be better: > > range(100) -> (f(g(x)) for x in _) > > it does put the source data up front -- and could be nicer for nested > comprehensions. > > Hmm, maybe this is an example of the kind of thing I've needed to do is > illustrative: > > > [s.upper() for s in > (s.replace('"','') for s in > (s.strip() for s in > line.split()))] > > would be better as: > > line.split() -> (s.strip() for s in _) -> (s.replace('"','') for s in _) > -> [s.upper() for s in _] > > though, actually, really best as: > > [s.strip().replace('"','').upper() for s in line.split()] > > (which only works for methods, not general functions) > > but for functions: > > [fun3(fun2(fun1(x))) for x in an_iterable] > > > so, backwards logic, but that's it for the benefit. > > So still having a hard time comeing up with an example that's notable > better... > > >> someone -> dosomething( _, options=options) \ >> -> is_meeting_some_conditions( _ ) \ >> -> result1() if _ else result2() where: >> options = ... >> result1 = lambda: ... >> result2 = lambda: ... >> def dosomething(obj, options) -> Any: >> ... >> >> def is_meeting_some_conditions( event : Any ) -> bool : >> ... >> > > again with the lambdas -- this is all making me think that this is about > making Python a better functional language, which I'm not sure is a goal of > Python... > > but anyway, the real extra there is the where: clause > > But that seems to be doing the opposite -- putting the definitions of what > you are actually doing AFTER the logic> > > I'm going to chain all this logic together > and by the way, this is what that logic is... > > If we really wanted to have a kind of context like that, maybe something > more like a context manager on the fly: > > with: > options = ... > result1 = lambda: ... > result2 = lambda: ... > def dosomething(obj, options) -> Any: > ... > > def is_meeting_some_conditions( event : Any ) -> bool : > ... > do: > (result1() if is_meeting_some_conditions( > dosomething( someone, options=options)) > else result2() > > > Also, we need to remember that functions can take *args, **kwargs, etc, >> > and can return a tuple of just about anything -- not sure how well that >> > maps to the "pipe" model. >> >> I think that using "pipe" model cannot be the right choice. >> >> We don't need to worry about this problem if we use the grammar I've >> implemented yet :) >> >> >> (lambda x: (x%5, x) ) -> max( range(99), key = _) >> >> 94 >> >> >> def max_from_seq(*args): return max(args) >> >> [1,2,3] -> max_from_seq(*_) >> >> 3 >> > > this gets uglier if we have both *args and **kwargs..... > > Which maybe is OK -- don't use it with complex structures like that. > > For example, sometimes we just need to know that surface area of a >> cylinder is >> >> 2*S_top + S_side >> >> If someone see the codes, he may not need to know how S_top and S_side >> are evaluated,getting >> a knowledge of what it means to is enough. >> And if you want to get more about how to evaluate S_side and S_top, just >> see >> the next "where syntax" and find the answers. >> > > how is that clearer than: > > S_topo = something > S_side = something else > surface_area = 2*S_top + S_side > > ??? > (Or, of course, defining a function) > > Sure, we see the: some expression..."where" some definitions structure a > lot in technical papers, but frankly: > > I'd probably rather see the definitions first > > and/or > > the definitions are often only there to support you if you don't already > know the nomenclature -- when you go back to read the paper again, you may > not need the where. Coding is different, I'd rather see stuff defined > BEFORE it is used. > > > >> Here is an example to use flowpython, which gives the permutations of a >> sequence. >> >> >> >> from copy import deepcopy >> >> permutations = .seq -> seq_seq where: >> >> condic+[] seq: >> >> case (a, ) => seq_seq = [a,] >> >> case (a, b) => seq_seq = [[a,b],[b,a]] >> >> case (a,*b) => >> >> seq_seq = permutations(b) -> map(.x -> insertAll(x, >> a), _) -> sum(_, []) where: >> >> insertAll = . x, a -> ret where: >> >> ret = [ deepcopy(x) -> _.insert(i, a) or _ >> for i in (len(x) -> range(_+1)) ] >> >> > I find that almost unreadable. >> > > me too. > > >> Too many new features all at once, it's >> > like trying to read a completely unfamiliar language. >> > > exactly -- this seems to be an effort to make Python a different language! > > This algorithm can be fixed a little because the second case is redundant. >> And here is the regular Python codes transformed >> from the codes above. >> > > looks like we lost indenting, so I'm going to try to fix that: > > from copy import deepcopy > > def permutations(seq): > try: > # the first case > (a, ) = seq > return [a ,] > except: > try: > # the third case (the second case is redundant) > def insertAll(x, a): > # insertAll([1,2,3], 0) -> [[0, 1, 2, 3], [1, 0, 2, 3], > [1, 2, 0, 3], [1, 2, 3, 0]] > ret = [] > for i in range( len(x) + 1 ): > tmp = deepcopy(x) > tmp.insert(i, a) > ret.append(tmp) > return ret > > (a, *b) = seq > tmp = permutations(b) > tmp = map(lambda x : insertAll(x, a) , tmp) > > return sum(tmp, []) # sum([[1,2,3], [-1,-2,-3]], []) -> > [1,2,3,-1,-2,-3] > except: > # no otherwise! > pass > > Have I got that right? but anyway, there has GOT to be a more pythonic way > to write that! And I say that because this feels to me like trying to write > functional code in Python in an unnatural-for-python way, then saying we > need to add features to python to make that natural. > > SoL I think the challenge is: > > find some nice compeling examples > write them in a nice pythonic way > show us that that these new features would allow a cleaner, more readable > solution. > > Steven did have a nice example of that: > > result = (myfile.readlines() > -> map(str.strip) > -> filter( lambda s: not s.startwith('#') ) > -> sorted > -> collapse # collapse runs of identical lines > -> extract_dates > -> map(date_to_seconds) > -> min > ) > > Though IIUC, the proposal would make that: > > result = (myfile.readlines() > -> map(str.strip, _) > -> filter( lambda s: not s.startwith('#'), _ ) > -> sorted( _ ) > -> collapse( _ ) # collapse runs of identical lines > -> extract_dates( _ ) > -> map(date_to_seconds, _) > -> min(_) > ) > > > The current Python for that might be: > > result = min((date_to_seconds(d) for d in > extract_dates( > collapse( > sorted([s for s in > (s.strip() for line in myfile.readlines) > if not s.startswith] > ))))) > > Which really does make the point that nesting comprehension gets ugly fast! > > So "don't do that": > > lines = collapse(sorted((l.strip().split("#")[0] for l in > myfile.readlines()))) > dates = min((date_to_seconds(extract_date(l)) for l in lines)) > > or any number of other ways -- clearer, less clear?? > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Aug 22 01:34:52 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Aug 2017 15:34:52 +1000 Subject: [Python-ideas] Fwd: Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: Folks, this has come up before, but: please don't post through Google Groups, as it breaks everyone else's ability to easily reply to the entire mailing list. ---------- Forwarded message ---------- From: Nick Coghlan Date: 22 August 2017 at 15:32 Subject: Re: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase To: Neil Girdhar Cc: python-ideas On 21 August 2017 at 11:32, Neil Girdhar wrote: > This question describes an example of the problem: > https://stackoverflow.com/questions/8416208/in-python-is-there-a-good-idiom-for-using-context-managers-in-setup-teardown. > You want to invoke a context manager in your setup/tearing-down, but the > easiest way to do that is to override run, which seems ugly. Using context managers when you can't use a with statement is one of the main use cases for contextlib.ExitStack(): def setUp(self): self._resource_stack = stack = contextlib.ExitStack() self._resource = stack.enter_context(MyResource()) def tearDown(self): self._resource_stack.close() I posted that as an additional answer to the question: https://stackoverflow.com/questions/8416208/in-python-is-there-a-good-idiom-for-using-context-managers-in-setup-teardown/45809502#45809502 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From k7hoven at gmail.com Tue Aug 22 02:47:07 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 22 Aug 2017 09:47:07 +0300 Subject: [Python-ideas] PEP 550 dumbed down In-Reply-To: References: Message-ID: On Tue, Aug 22, 2017 at 2:56 AM, Yury Selivanov wrote: > Hi Jim, > > In short, yes, we can "dumb down" PEP 550 to a chain of maps. > > I think it's also good to think about the actual problem(s) that are being solved, without going too deeply into the implementation. It might be useful to look at all the motivating use cases and to make sure this is really the best way to provide a solution to them. > PEP 550 does the following on top of that dumbed down version: > > ?[...]? > 2. ContextKey is an object that we use to work with EC. Compared to > using strings, using an object allows us to implement caching > (important for numpy and decimal-like libs) and avoids name clashes. > > ?How exactly is caching dependent on the proposed ContextKey thing? To avoid a dict-lookup or similar to get the cached value? But now we need to look up the key object from somewhere? [...] > > 4. Provides common APIs that will be used by asyncio, decimal, numpy, etc. > > Which APIs? The C API you mean? Something that is not in Jim's list? Something that is (not) in the PEP? People need to get a clear picture of what is being proposed. ?-- Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Aug 22 08:42:02 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Aug 2017 22:42:02 +1000 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: On 22 August 2017 at 15:34, Nick Coghlan wrote: > On 21 August 2017 at 11:32, Neil Girdhar wrote: >> This question describes an example of the problem: >> https://stackoverflow.com/questions/8416208/in-python-is-there-a-good-idiom-for-using-context-managers-in-setup-teardown. >> You want to invoke a context manager in your setup/tearing-down, but the >> easiest way to do that is to override run, which seems ugly. > > Using context managers when you can't use a with statement is one of > the main use cases for contextlib.ExitStack(): > > def setUp(self): > self._resource_stack = stack = contextlib.ExitStack() > self._resource = stack.enter_context(MyResource()) > > def tearDown(self): > self._resource_stack.close() > > I posted that as an additional answer to the question: > https://stackoverflow.com/questions/8416208/in-python-is-there-a-good-idiom-for-using-context-managers-in-setup-teardown/45809502#45809502 Sjoerd pointed out off-list that this doesn't cover the case where you're acquiring multiple resources and one of the later acquisitions fails, so I added the ExitStack idiom that covers that case (using stack.pop_all() as the last operation in a with statement): def setUp(self): with contextlib.ExitStack() as stack: self._resource1 = stack.enter_context(GetResource()) self._resource2 = stack.enter_context(GetOtherResource()) # Failures before here -> immediate cleanup self.addCleanup(stack.pop_all().close) # Now cleanup won't happen until the cleanup functions run I also remember that using addCleanup lets you avoid defining tearDown entirely. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From prometheus235 at gmail.com Tue Aug 22 14:21:54 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Tue, 22 Aug 2017 13:21:54 -0500 Subject: [Python-ideas] Fwd: Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: On Tue, Aug 22, 2017 at 12:34 AM, Nick Coghlan wrote: > Folks, this has come up before, but: please don't post through Google > Groups, as it breaks everyone else's ability to easily reply to the > entire mailing list. > Mentioning this is probably going to do nothing, especially for new, future users. Can you block python-ideas at googlegroups.com (or if it's CC'd or whatever) from posting if you just want the Groups page to be a read-only thing? -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Aug 22 18:08:07 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 22 Aug 2017 15:08:07 -0700 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: ** Caution: cranky curmudgeonly opinionated comment ahead: ** unitest is such an ugly Java-esque static mess of an API that there's really no point in trying to clean it up and make it more pythonic -- go off and use pytest and be happier. -CHB On Tue, Aug 22, 2017 at 5:42 AM, Nick Coghlan wrote: > On 22 August 2017 at 15:34, Nick Coghlan wrote: > > On 21 August 2017 at 11:32, Neil Girdhar wrote: > >> This question describes an example of the problem: > >> https://stackoverflow.com/questions/8416208/in-python- > is-there-a-good-idiom-for-using-context-managers-in-setup-teardown. > >> You want to invoke a context manager in your setup/tearing-down, but the > >> easiest way to do that is to override run, which seems ugly. > > > > Using context managers when you can't use a with statement is one of > > the main use cases for contextlib.ExitStack(): > > > > def setUp(self): > > self._resource_stack = stack = contextlib.ExitStack() > > self._resource = stack.enter_context(MyResource()) > > > > def tearDown(self): > > self._resource_stack.close() > > > > I posted that as an additional answer to the question: > > https://stackoverflow.com/questions/8416208/in-python- > is-there-a-good-idiom-for-using-context-managers-in- > setup-teardown/45809502#45809502 > > Sjoerd pointed out off-list that this doesn't cover the case where > you're acquiring multiple resources and one of the later acquisitions > fails, so I added the ExitStack idiom that covers that case (using > stack.pop_all() as the last operation in a with statement): > > def setUp(self): > with contextlib.ExitStack() as stack: > self._resource1 = stack.enter_context(GetResource()) > self._resource2 = stack.enter_context(GetOtherResource()) > # Failures before here -> immediate cleanup > self.addCleanup(stack.pop_all().close) > # Now cleanup won't happen until the cleanup functions run > > I also remember that using addCleanup lets you avoid defining tearDown > entirely. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Tue Aug 22 18:20:50 2017 From: rymg19 at gmail.com (rymg19 at gmail.com) Date: Tue, 22 Aug 2017 18:20:50 -0400 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: <> <> References: <> <> <> <> Message-ID: TBH you're completely right. Every time I see someone using unittest andItsHorriblyUnpythonicNames, I want to kill a camel. Sometimes, though, I feel like part of the struggle is the alternative. If you dislike unittest, but pytest is too "magical" for you, what do you use? Many Python testing tools like nose are just test *runners*, so you still need something else. In the end, many just end up back at unittest, maybe with nose on top. As much as I hate JavaScript, their testing libraries are leagues above what Python has. -- Ryan (????) Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone elsehttp://refi64.com On Aug 22, 2017 at 5:09 PM, > wrote: ** Caution: cranky curmudgeonly opinionated comment ahead: ** unitest is such an ugly Java-esque static mess of an API that there's really no point in trying to clean it up and make it more pythonic -- go off and use pytest and be happier. -CHB On Tue, Aug 22, 2017 at 5:42 AM, Nick Coghlan wrote: > On 22 August 2017 at 15:34, Nick Coghlan wrote: > > On 21 August 2017 at 11:32, Neil Girdhar wrote: > >> This question describes an example of the problem: > >> https://stackoverflow.com/questions/8416208/in-python- > is-there-a-good-idiom-for-using-context-managers-in-setup-teardown. > >> You want to invoke a context manager in your setup/tearing-down, but the > >> easiest way to do that is to override run, which seems ugly. > > > > Using context managers when you can't use a with statement is one of > > the main use cases for contextlib.ExitStack(): > > > > def setUp(self): > > self._resource_stack = stack = contextlib.ExitStack() > > self._resource = stack.enter_context(MyResource()) > > > > def tearDown(self): > > self._resource_stack.close() > > > > I posted that as an additional answer to the question: > > https://stackoverflow.com/questions/8416208/in-python- > is-there-a-good-idiom-for-using-context-managers-in- > setup-teardown/45809502#45809502 > > Sjoerd pointed out off-list that this doesn't cover the case where > you're acquiring multiple resources and one of the later acquisitions > fails, so I added the ExitStack idiom that covers that case (using > stack.pop_all() as the last operation in a with statement): > > def setUp(self): > with contextlib.ExitStack() as stack: > self._resource1 = stack.enter_context(GetResource()) > self._resource2 = stack.enter_context(GetOtherResource()) > # Failures before here -> immediate cleanup > self.addCleanup(stack.pop_all().close) > # Now cleanup won't happen until the cleanup functions run > > I also remember that using addCleanup lets you avoid defining tearDown > entirely. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Tue Aug 22 19:14:21 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Tue, 22 Aug 2017 18:14:21 -0500 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: Knowing nothing about the JavaScript ecosystem (other than that leftpad is apparently not a joke and everything needs more jQuery), what are the leagues-above testing libraries? On Tue, Aug 22, 2017 at 5:20 PM, rymg19 at gmail.com wrote: > TBH you're completely right. Every time I see someone using unittest > andItsHorriblyUnpythonicNames, I want to kill a camel. > > Sometimes, though, I feel like part of the struggle is the alternative. If > you dislike unittest, but pytest is too "magical" for you, what do you use? > Many Python testing tools like nose are just test *runners*, so you still > need something else. In the end, many just end up back at unittest, maybe > with nose on top. > > As much as I hate JavaScript, their testing libraries are leagues above > what Python has. > > -- > Ryan (????) > Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone elsehttp://refi64.com > > On Aug 22, 2017 at 5:09 PM, > wrote: > > ** Caution: cranky curmudgeonly opinionated comment ahead: ** > > > unitest is such an ugly Java-esque static mess of an API that there's > really no point in trying to clean it up and make it more pythonic -- go > off and use pytest and be happier. > > -CHB > > > > On Tue, Aug 22, 2017 at 5:42 AM, Nick Coghlan wrote: > >> On 22 August 2017 at 15:34, Nick Coghlan wrote: >> > On 21 August 2017 at 11:32, Neil Girdhar wrote: >> >> This question describes an example of the problem: >> >> https://stackoverflow.com/questions/8416208/in-python-is- >> there-a-good-idiom-for-using-context-managers-in-setup-teardown. >> >> You want to invoke a context manager in your setup/tearing-down, but >> the >> >> easiest way to do that is to override run, which seems ugly. >> > >> > Using context managers when you can't use a with statement is one of >> > the main use cases for contextlib.ExitStack(): >> > >> > def setUp(self): >> > self._resource_stack = stack = contextlib.ExitStack() >> > self._resource = stack.enter_context(MyResource()) >> > >> > def tearDown(self): >> > self._resource_stack.close() >> > >> > I posted that as an additional answer to the question: >> > https://stackoverflow.com/questions/8416208/in-python-is- >> there-a-good-idiom-for-using-context-managers-in-setup- >> teardown/45809502#45809502 >> >> Sjoerd pointed out off-list that this doesn't cover the case where >> you're acquiring multiple resources and one of the later acquisitions >> fails, so I added the ExitStack idiom that covers that case (using >> stack.pop_all() as the last operation in a with statement): >> >> def setUp(self): >> with contextlib.ExitStack() as stack: >> self._resource1 = stack.enter_context(GetResource()) >> self._resource2 = stack.enter_context(GetOtherResource()) >> # Failures before here -> immediate cleanup >> self.addCleanup(stack.pop_all().close) >> # Now cleanup won't happen until the cleanup functions run >> >> I also remember that using addCleanup lets you avoid defining tearDown >> entirely. >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ Python-ideas mailing list > Python-ideas at python.org https://mail.python.org/ > mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/ > codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Aug 22 19:37:56 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 23 Aug 2017 09:37:56 +1000 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: <> <> <> <> Message-ID: <20170822233755.GB7395@ando.pearwood.info> On Tue, Aug 22, 2017 at 06:20:50PM -0400, rymg19 at gmail.com wrote: > TBH you're completely right. Every time I see someone using unittest > andItsHorriblyUnpythonicNames, I want to kill a camel. If your only complaint about unittest is that you_miss_writing_underscores_between_all_the_words, then unittest must be pretty good. -- Steve From chris.barker at noaa.gov Tue Aug 22 20:24:49 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 22 Aug 2017 17:24:49 -0700 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: On Tue, Aug 22, 2017 at 5:19 PM, Chris Barker wrote: > anyway, that's enough ranting..... > Got carried away with the ranting, and didn't flesh out my point. My point is that unittest is a very static, not very pythonic framework -- if you are productive with it, great, but I don't think it's worth trying to add more pythonic niceties to. Chances are pytest (Or nose2?) may already have them, or, if not, the simpler structure of pytest tests make them easier to write yourself. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Aug 22 20:19:33 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 22 Aug 2017 17:19:33 -0700 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: Getting kind of OT, but: ... pytest is too "magical" for you, > I do get confused a bit sometimes, but for the most part, I simple don't use the magic -- pytest does a great job of making the simple things simple. what do you use? Many Python testing tools like nose are just test > *runners*, so you still need something else. > nose did provide a number of utilities to make testing friendly, but it is apparently dead, and AFAICT, nose2, is mostly a test runner for unittest2 :-( I converted to pytest a while back mostly inspired by it's wonderful reporting of the details of test failures. If your only complaint about unittest is that > you_miss_writing_underscores_between_all_the_words, then unittest must > be pretty good. For my part, I kinda liked StudlyCaps before I drank the pep8 kool-aid. What I dislike about unitest is that it is a pile of almost completely worthless boilerplate that you have to write. what the heck are all those assertThis methods for? I always thought they were ridiculous, but then I went in to write a new one (for math.isclose(), which was rejected, and one of these days I may add it to assertAlmostEqual ... and assertNotAlmostEqual ! ) -- low and behold, the entire purpose of the assert methods is to create a nice message when the test fails. really! This in a dynamic language with wonderful introspection capabilities. So that's most of the code in unitest -- completely worthless boilerplate that just makes you have to type more. Then there is the fixture stuff -- not too bad, but still a lot klunkier than pytest fixtures. And no parameterized testing -- that's a killer feature (that nose provided as well) anyway, that's enough ranting..... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Aug 22 22:05:17 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 23 Aug 2017 02:05:17 +0000 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: Like you, I used nose and then switched to pytest. The reason I proposed this for unittest is because pytest and nose and (I think) most of the other testing frameworks inherit from unittest, so improving unittest has downstream benefits. I may nevertheless propose this to the pytest people if this doesn't make it into unittest. On Tue, Aug 22, 2017 at 8:26 PM Chris Barker wrote: > On Tue, Aug 22, 2017 at 5:19 PM, Chris Barker > wrote: > >> anyway, that's enough ranting..... >> > > Got carried away with the ranting, and didn't flesh out my point. > > My point is that unittest is a very static, not very pythonic framework -- > if you are productive with it, great, but I don't think it's worth trying > to add more pythonic niceties to. Chances are pytest (Or nose2?) may > already have them, or, if not, the simpler structure of pytest tests make > them easier to write yourself. > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/cF_4IlJq698/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/cF_4IlJq698/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Aug 23 03:30:46 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Aug 2017 17:30:46 +1000 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: On 23 August 2017 at 08:20, rymg19 at gmail.com wrote: > TBH you're completely right. Every time I see someone using unittest > andItsHorriblyUnpythonicNames, I want to kill a camel. > > Sometimes, though, I feel like part of the struggle is the alternative. If > you dislike unittest, but pytest is too "magical" for you, what do you use? > Many Python testing tools like nose are just test *runners*, so you still > need something else. In the end, many just end up back at unittest, maybe > with nose on top. A snake_case helper API for unittest that I personally like is hamcrest, since that also separates out the definition of testing assertions from being part of a test case: https://pypi.python.org/pypi/PyHamcrest Introducing such a split natively into unittest is definitely attractive, but would currently be difficult due to the way that some features like self.maxDiff and self.subTest work. However, PEP 550's execution contexts may provide a way to track the test state reliably that's independent of being a method on a test case instance, in which case it would become feasible to offer a more procedural interface in addition to the current visibly object-oriented one. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Aug 23 05:00:56 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Aug 2017 19:00:56 +1000 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On 21 August 2017 at 07:01, Barry wrote: > I'm not clear why there is a new_context_key which seems not to be a key. > It seems that the object is a container for a single value. > > Key.set( value ) does not feel right. It's basically borrowed from procedural thread local APIs, which tend to use APIs like "tss_set(key, value)". That said, in a separate discussion, Caleb Hattingh mentioned C#'s AsyncLocal API, and it occurred to me that "context local" might work well as the name of the context access API: my_implicit_state = sys.new_context_local('my_state') my_implicit_state.set('spam') # Later, to access the value of my_implicit_state: print(my_implicit_state.get()) That way, we'd have 3 clearly defined kinds of local variables: * frame locals (the regular kind) * thread locals (threading.locals() et al) * context locals (PEP 550) The fact contexts can be nested, and a failed lookup in the active implicit context may then query outer namespaces in the current execution context would then be directly analogous to the way name lookups are resolved for frame locals. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Aug 23 11:41:06 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2017 08:41:06 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Wed, Aug 23, 2017 at 2:00 AM, Nick Coghlan wrote: > On 21 August 2017 at 07:01, Barry wrote: > > I'm not clear why there is a new_context_key which seems not to be a key. > > It seems that the object is a container for a single value. > > > > Key.set( value ) does not feel right. > > It's basically borrowed from procedural thread local APIs, which tend > to use APIs like "tss_set(key, value)". > > That said, in a separate discussion, Caleb Hattingh mentioned C#'s > AsyncLocal API, and it occurred to me that "context local" might work > well as the name of the context access API: > > my_implicit_state = sys.new_context_local('my_state') > my_implicit_state.set('spam') > > # Later, to access the value of my_implicit_state: > print(my_implicit_state.get()) > > That way, we'd have 3 clearly defined kinds of local variables: > > * frame locals (the regular kind) > * thread locals (threading.locals() et al) > * context locals (PEP 550) > > The fact contexts can be nested, and a failed lookup in the active > implicit context may then query outer namespaces in the current > execution context would then be directly analogous to the way name > lookups are resolved for frame locals. If we're extending the analogy with thread-locals we should at least consider making each instantiation return a namespace rather than something holding a single value. We have log_state = threading.local() log_state.verbose = False def action(x): if log_state.verbose: print(x) def make_verbose(): log_state.verbose = True It would be nice if we could upgrade this to make it PEP 550-aware so that only the first line needs to change: log_state = sys.AsyncLocal("log state") # The rest is the same We might even support the alternative notation where you can provide default values and suggest a schema, similar to to threading.local: class LogState(threading.local): verbose = False log_state = LogState() (I think that for calls that construct empty instances of various types we should just use the class name rather than some factory function. I also think none of this should live in sys but that's separate.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Aug 23 12:40:16 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 23 Aug 2017 09:40:16 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: <599DAFF0.4030506@stoneleaf.us> On 08/23/2017 08:41 AM, Guido van Rossum wrote: > If we're extending the analogy with thread-locals we should at least consider making each instantiation return a > namespace rather than something holding a single value. +1 -- ~Ethan~ From john.torakis at gmail.com Wed Aug 23 12:55:00 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 19:55:00 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S Message-ID: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> Hello all! Today I opened an issue in bugs.python.org (http://bugs.python.org/issue31264) proposing a module I created for remote package/module imports through standard HTTP/S. The concept is that, if a directory is served through HTTP/S (the way SimpleHTTPServer module serves directories), a Finder/Loader object can fetch Python files from that directory using HTTP requests, and finally load them as modules (or packages) in the running namespace. The repo containing a primitive (but working) version of the Finder/Loader, also contains self explanatory examples (in the README.md): https://github.com/operatorequals/httpimport My proposal is that this module can become a core Python feature, providing a way to load modules even from Github.com repositories, without the need to "git clone - setup.py install" them. Other languages, like golang, provide this functionality from their early days (day one?). Python development can be greatly improved if a "try before pip installing" mechanism gets in place, as it will add a lot to the REPL nature of the testing/experimenting process. Thank you for your time, John Torakis, IT Security Researcher P.S: It is my first time in this mailing list and generally Python contribution. Please be tolerant! From rosuav at gmail.com Wed Aug 23 13:17:17 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 24 Aug 2017 03:17:17 +1000 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> Message-ID: On Thu, Aug 24, 2017 at 2:55 AM, John Torakis wrote: > Hello all! > > Today I opened an issue in bugs.python.org > (http://bugs.python.org/issue31264) proposing a module I created for > remote package/module imports through standard HTTP/S. > > The concept is that, if a directory is served through HTTP/S (the way > SimpleHTTPServer module serves directories), a Finder/Loader object can > fetch Python files from that directory using HTTP requests, and finally > load them as modules (or packages) in the running namespace. > > The repo containing a primitive (but working) version of the > Finder/Loader, also contains self explanatory examples (in the README.md): > > https://github.com/operatorequals/httpimport > > > My proposal is that this module can become a core Python feature, > providing a way to load modules even from Github.com repositories, > without the need to "git clone - setup.py install" them. > > > Other languages, like golang, provide this functionality from their > early days (day one?). Python development can be greatly improved if a > "try before pip installing" mechanism gets in place, as it will add a > lot to the REPL nature of the testing/experimenting process. As a core feature? No no no no no no no no. Absolutely do NOT WANT THIS. This is a security bug magnet; can you imagine trying to ensure that malicious code is not executed, in an arbitrary execution context? As an explicitly-enabled feature, it's a lot less hairy than a permanently-active one (can you IMAGINE how terrifying that would be?), but even so, trying to prove that addRemoteRepo (not a PEP8-compliant name, btw) is getting the correct code is not going to be easy. You have to (a) drop HTTP altogether and mandate SSL and (b) be absolutely sure that your certificate chains are 100% dependable, which - as we've seen recently - is a nontrivial task. The easiest way to add remote code is pip. For most packages, that's what you want to be using: pip install requests will make "import requests" functional. I don't see pip mentioned anywhere in your README, but you do mention the testing of pull requests, so at very least, this wants some explanatory screed. But I'm not entirely sure I want to support this. You're explicitly talking about using this with the creation of backdoors... in what, exactly? What are you actually getting at here? ChrisA From phd at phdru.name Wed Aug 23 13:13:04 2017 From: phd at phdru.name (Oleg Broytman) Date: Wed, 23 Aug 2017 19:13:04 +0200 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> Message-ID: <20170823171304.GA29208@phdru.name> Hi! On Wed, Aug 23, 2017 at 07:55:00PM +0300, John Torakis wrote: > Hello all! > > Today I opened an issue in bugs.python.org > (http://bugs.python.org/issue31264) proposing a module I created for > remote package/module imports through standard HTTP/S. The issue is so big IMO it requires a PEP, not just an issue. Anyway I'm -1000 for reasons of security, connectivity (not all hosts are connected), traffic cost and speed. > The concept is that, if a directory is served through HTTP/S (the way > SimpleHTTPServer module serves directories), a Finder/Loader object can > fetch Python files from that directory using HTTP requests, and finally > load them as modules (or packages) in the running namespace. > > The repo containing a primitive (but working) version of the > Finder/Loader, also contains self explanatory examples (in the README.md): > > https://github.com/operatorequals/httpimport > > > My proposal is that this module can become a core Python feature, > providing a way to load modules even from Github.com repositories, > without the need to "git clone - setup.py install" them. > > > Other languages, like golang, provide this functionality from their AFAIK Go downloads modules at compile time, not run time. This is a major distiction with Python. > early days (day one?). Python development can be greatly improved if a > "try before pip installing" mechanism gets in place, as it will add a > lot to the REPL nature of the testing/experimenting process. > > > > Thank you for your time, > > John Torakis, IT Security Researcher > > > > P.S: It is my first time in this mailing list and generally Python > contribution. Please be tolerant! Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From john.torakis at gmail.com Wed Aug 23 13:37:11 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 20:37:11 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> Message-ID: <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> On 23/08/2017 20:36, John Torakis wrote: > Yeah, I am a security researcher, I am keen on backdoor programming and > staging and all that! It is my official job and research topic! I go to > the office and code such stuff! I am not a blackhat, nor a security > enthusiast, it is my job. > > > First of all, let's all agree that if someone can run Python code in > your computer you are 100% hacked! It is irrelevant if "httpimport" is a > core python feature or not in that case. > > Now, I agree that this can be exploited if used under plain HTTP, it is > a MiTM -> Remote code execution case. I admit that this is not bright. > But I mention that this can be used in testing. > > On the topic of HTTPS, man-in-the-middle is not possible without > previous Trusted Certificate compromise. Github can be trusted 100% > percent for example. A certificate check has to take place in the HTTPS > remote loading for sure! > > When I said a "core feature" I meant that the "httpimport" module would > deliver with the core modules. Not that the Finder/Loader has to be in > the list of Finders/Loaders that are used by default! For god sake, I > wouldn't like my PC to start probing for modules just because I mistyped > an import line! > > I know that pip works nicely, especially when paired with virtual > environments, but ad-hoc importing is another another thing. It isn't > meant for delivering real projects. Just for testing modules without the > need to download them, maybe install them, and all. > > > Thank you for your time, > John Torakis > > > On 23/08/2017 20:17, Chris Angelico wrote: >> On Thu, Aug 24, 2017 at 2:55 AM, John Torakis wrote: >>> Hello all! >>> >>> Today I opened an issue in bugs.python.org >>> (http://bugs.python.org/issue31264) proposing a module I created for >>> remote package/module imports through standard HTTP/S. >>> >>> The concept is that, if a directory is served through HTTP/S (the way >>> SimpleHTTPServer module serves directories), a Finder/Loader object can >>> fetch Python files from that directory using HTTP requests, and finally >>> load them as modules (or packages) in the running namespace. >>> >>> The repo containing a primitive (but working) version of the >>> Finder/Loader, also contains self explanatory examples (in the README.md): >>> >>> https://github.com/operatorequals/httpimport >>> >>> >>> My proposal is that this module can become a core Python feature, >>> providing a way to load modules even from Github.com repositories, >>> without the need to "git clone - setup.py install" them. >>> >>> >>> Other languages, like golang, provide this functionality from their >>> early days (day one?). Python development can be greatly improved if a >>> "try before pip installing" mechanism gets in place, as it will add a >>> lot to the REPL nature of the testing/experimenting process. >> As a core feature? No no no no no no no no. Absolutely do NOT WANT >> THIS. This is a security bug magnet; can you imagine trying to ensure >> that malicious code is not executed, in an arbitrary execution >> context? As an explicitly-enabled feature, it's a lot less hairy than >> a permanently-active one (can you IMAGINE how terrifying that would >> be?), but even so, trying to prove that addRemoteRepo (not a >> PEP8-compliant name, btw) is getting the correct code is not going to >> be easy. You have to (a) drop HTTP altogether and mandate SSL and (b) >> be absolutely sure that your certificate chains are 100% dependable, >> which - as we've seen recently - is a nontrivial task. >> >> The easiest way to add remote code is pip. For most packages, that's >> what you want to be using: >> >> pip install requests >> >> will make "import requests" functional. I don't see pip mentioned >> anywhere in your README, but you do mention the testing of pull >> requests, so at very least, this wants some explanatory screed. >> >> But I'm not entirely sure I want to support this. You're explicitly >> talking about using this with the creation of backdoors... in what, >> exactly? What are you actually getting at here? >> >> ChrisA >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > From john.torakis at gmail.com Wed Aug 23 13:49:12 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 20:49:12 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> References: <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> Message-ID: Bounced back on list -------- Forwarded Message -------- ????: Re: [Python-ideas] Remote package/module imports through HTTP/S ??????????: Wed, 23 Aug 2017 20:36:19 +0300 ???: John Torakis ????: Chris Angelico Yeah, I am a security researcher, I am keen on backdoor programming and staging and all that! It is my official job and research topic! I go to the office and code such stuff! I am not a blackhat, nor a security enthusiast, it is my job. First of all, let's all agree that if someone can run Python code in your computer you are 100% hacked! It is irrelevant if "httpimport" is a core python feature or not in that case. Now, I agree that this can be exploited if used under plain HTTP, it is a MiTM -> Remote code execution case. I admit that this is not bright. But I mention that this can be used in testing. On the topic of HTTPS, man-in-the-middle is not possible without previous Trusted Certificate compromise. Github can be trusted 100% percent for example. A certificate check has to take place in the HTTPS remote loading for sure! When I said a "core feature" I meant that the "httpimport" module would deliver with the core modules. Not that the Finder/Loader has to be in the list of Finders/Loaders that are used by default! For god sake, I wouldn't like my PC to start probing for modules just because I mistyped an import line! I know that pip works nicely, especially when paired with virtual environments, but ad-hoc importing is another another thing. It isn't meant for delivering real projects. Just for testing modules without the need to download them, maybe install them, and all. Thank you for your time, John Torakis On 23/08/2017 20:17, Chris Angelico wrote: > On Thu, Aug 24, 2017 at 2:55 AM, John Torakis wrote: >> Hello all! >> >> Today I opened an issue in bugs.python.org >> (http://bugs.python.org/issue31264) proposing a module I created for >> remote package/module imports through standard HTTP/S. >> >> The concept is that, if a directory is served through HTTP/S (the way >> SimpleHTTPServer module serves directories), a Finder/Loader object can >> fetch Python files from that directory using HTTP requests, and finally >> load them as modules (or packages) in the running namespace. >> >> The repo containing a primitive (but working) version of the >> Finder/Loader, also contains self explanatory examples (in the README.md): >> >> https://github.com/operatorequals/httpimport >> >> >> My proposal is that this module can become a core Python feature, >> providing a way to load modules even from Github.com repositories, >> without the need to "git clone - setup.py install" them. >> >> >> Other languages, like golang, provide this functionality from their >> early days (day one?). Python development can be greatly improved if a >> "try before pip installing" mechanism gets in place, as it will add a >> lot to the REPL nature of the testing/experimenting process. > As a core feature? No no no no no no no no. Absolutely do NOT WANT > THIS. This is a security bug magnet; can you imagine trying to ensure > that malicious code is not executed, in an arbitrary execution > context? As an explicitly-enabled feature, it's a lot less hairy than > a permanently-active one (can you IMAGINE how terrifying that would > be?), but even so, trying to prove that addRemoteRepo (not a > PEP8-compliant name, btw) is getting the correct code is not going to > be easy. You have to (a) drop HTTP altogether and mandate SSL and (b) > be absolutely sure that your certificate chains are 100% dependable, > which - as we've seen recently - is a nontrivial task. > > The easiest way to add remote code is pip. For most packages, that's > what you want to be using: > > pip install requests > > will make "import requests" functional. I don't see pip mentioned > anywhere in your README, but you do mention the testing of pull > requests, so at very least, this wants some explanatory screed. > > But I'm not entirely sure I want to support this. You're explicitly > talking about using this with the creation of backdoors... in what, > exactly? What are you actually getting at here? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Aug 23 13:49:17 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 24 Aug 2017 03:49:17 +1000 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: On Thu, Aug 24, 2017 at 3:37 AM, John Torakis wrote: > > > On 23/08/2017 20:36, John Torakis wrote: >> Yeah, I am a security researcher, I am keen on backdoor programming and >> staging and all that! It is my official job and research topic! I go to >> the office and code such stuff! I am not a blackhat, nor a security >> enthusiast, it is my job. >> >> >> First of all, let's all agree that if someone can run Python code in >> your computer you are 100% hacked! It is irrelevant if "httpimport" is a >> core python feature or not in that case. >> >> Now, I agree that this can be exploited if used under plain HTTP, it is >> a MiTM -> Remote code execution case. I admit that this is not bright. >> But I mention that this can be used in testing. >> >> On the topic of HTTPS, man-in-the-middle is not possible without >> previous Trusted Certificate compromise. Github can be trusted 100% >> percent for example. A certificate check has to take place in the HTTPS >> remote loading for sure! Right, but that just pushes the problem one level further out: you need to have a 100% dependable certificate chain. And that means absolutely completely trusting all of your root certificates, and it also means either not needing to add any _more_ root certificates, or being able to configure the cert store. As we've seen elsewhere, this is nontrivial. >> When I said a "core feature" I meant that the "httpimport" module would >> deliver with the core modules. Not that the Finder/Loader has to be in >> the list of Finders/Loaders that are used by default! For god sake, I >> wouldn't like my PC to start probing for modules just because I mistyped >> an import line! Glad we agree about that! I have seen people wanting all sorts of things to become core features (usually for the sake of interactive work), and a lot of it is MUCH better handled as a non-core feature. Though a lot of what you're saying here - especially this: >> I know that pip works nicely, especially when paired with virtual >> environments, but ad-hoc importing is another another thing. It isn't >> meant for delivering real projects. Just for testing modules without the >> need to download them, maybe install them, and all. could be equally well handled by pip-installing httpimport itself, and using that to bootstrap your testing procedures. Unless, of course, you're wanting to httpimport httpimport, in which case you're going to run into bootstrapping problems whichever way you do it :) I think we're on the same page here, but it definitely needs some more text in the README to explain this - particularly how this is not a replacement for pip. For example, my first thought on seeing this was "wow, that's going to be abysmally slow unless it has a cache", but the answer to that is: if you need a cache, you probably should be using pip to install things properly. Still -1 on this becoming a stdlib package, as there's nothing I've yet seen that can't be done as a third-party package. But it's less scary than I thought it was :) ChrisA From bruce at leban.us Wed Aug 23 14:04:41 2017 From: bruce at leban.us (Bruce Leban) Date: Wed, 23 Aug 2017 11:04:41 -0700 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: On Wed, Aug 23, 2017 at 10:37 AM, John Torakis wrote: > > Github can be trusted 100% percent for example. This isn't even remotely close to true. While I'd agree with the statement that the SSL cert on github is reasonably trustworthy, the *content* on github is NOT trustworthy and that's where the security risk is. I agree that this is a useful feature and there is no way it should be on by default. The right way IMHO to do this is to have a command line option something like this: python --http-import somelib=https://github.com/someuser/somelib which then redefines the import somelib command to import from that source. Along with your scenario, it allows people, for example, to replace a library with a different version without modifying source or installing a different version. That's pretty useful. --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Aug 23 14:11:26 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 24 Aug 2017 04:11:26 +1000 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: On Thu, Aug 24, 2017 at 4:04 AM, Bruce Leban wrote: > > On Wed, Aug 23, 2017 at 10:37 AM, John Torakis > wrote: >> >> >> Github can be trusted 100% percent for example. > > > This isn't even remotely close to true. While I'd agree with the statement > that the SSL cert on github is reasonably trustworthy, the *content* on > github is NOT trustworthy and that's where the security risk is. > > I agree that this is a useful feature and there is no way it should be on by > default. The right way IMHO to do this is to have a command line option > something like this: > > python --http-import somelib=https://github.com/someuser/somelib If you read his README, it's pretty explicit about URLs; the risk is that "https://github.com/someuser/somelib" can be intercepted, not that "someuser" is malicious. If you're worried about the latter, don't use httpimport. ChrisA From john.torakis at gmail.com Wed Aug 23 14:11:32 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 21:11:32 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: <941db7b3-7037-1144-5cdb-5485c30ccb3f@gmail.com> On 23/08/2017 21:04, Bruce Leban wrote: > > On Wed, Aug 23, 2017 at 10:37 AM, John Torakis > wrote: > > > Github can be trusted 100% percent for example. > > > This isn't even remotely close to true. While I'd agree with the > statement that the SSL cert on github is reasonably trustworthy, the > *content* on github is NOT trustworthy and that's where the security > risk is. Do we trust code on github? Do we trust code on PyPI? This is why I **don't** want it ON by default. You have to explicitly point the Finder/Loader to a repo that you created or you trust. And provide a list of available modules/packages to import from that URL too. If the developer isn't sure about the code she/he is importing then it is her/his fault... Same goes for pip installing though... > > I agree that this is a useful feature and there is no way it should be > on by default. The right way IMHO to do this is to have a command line > option something like this: > > python --http-import somelib=https://github.com/someuser/somelib > > > which then redefines the import somelib command to import from that > source. Along with your scenario, it allows people, for example, to > replace a library with a different version without modifying source or > installing a different version. That's pretty useful. That's what I am thinking too! just provide the module so someone can "python -m" it, or start a REPL in the context that some packages/modules are available from a URL. > > --- Bruce John Torakis -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.torakis at gmail.com Wed Aug 23 14:13:33 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 21:13:33 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: On 23/08/2017 20:49, Chris Angelico wrote: > On Thu, Aug 24, 2017 at 3:37 AM, John Torakis wrote: >> >> On 23/08/2017 20:36, John Torakis wrote: >>> Yeah, I am a security researcher, I am keen on backdoor programming and >>> staging and all that! It is my official job and research topic! I go to >>> the office and code such stuff! I am not a blackhat, nor a security >>> enthusiast, it is my job. >>> >>> >>> First of all, let's all agree that if someone can run Python code in >>> your computer you are 100% hacked! It is irrelevant if "httpimport" is a >>> core python feature or not in that case. >>> >>> Now, I agree that this can be exploited if used under plain HTTP, it is >>> a MiTM -> Remote code execution case. I admit that this is not bright. >>> But I mention that this can be used in testing. >>> >>> On the topic of HTTPS, man-in-the-middle is not possible without >>> previous Trusted Certificate compromise. Github can be trusted 100% >>> percent for example. A certificate check has to take place in the HTTPS >>> remote loading for sure! > Right, but that just pushes the problem one level further out: you > need to have a 100% dependable certificate chain. And that means > absolutely completely trusting all of your root certificates, and it > also means either not needing to add any _more_ root certificates, or > being able to configure the cert store. As we've seen elsewhere, this > is nontrivial. The centralized PKI as we know it is a pain altogether. Please let me reference this XKCD strip here: https://xkcd.com/1200/ Running code to your computer is among the low Impact things that can happen to you if you have a compromised certificate store. Trust me on that! In other words, if you can't trust your certificate store now, and you are afraid of Remote code execution through HTTPS, stop using pip altogether: https://github.com/pypa/pip/issues/1168 And most Package Managers anyway.. >>> When I said a "core feature" I meant that the "httpimport" module would >>> deliver with the core modules. Not that the Finder/Loader has to be in >>> the list of Finders/Loaders that are used by default! For god sake, I >>> wouldn't like my PC to start probing for modules just because I mistyped >>> an import line! > Glad we agree about that! I have seen people wanting all sorts of > things to become core features (usually for the sake of interactive > work), and a lot of it is MUCH better handled as a non-core feature. > > Though a lot of what you're saying here - especially this: > >>> I know that pip works nicely, especially when paired with virtual >>> environments, but ad-hoc importing is another another thing. It isn't >>> meant for delivering real projects. Just for testing modules without the >>> need to download them, maybe install them, and all. > could be equally well handled by pip-installing httpimport itself, and > using that to bootstrap your testing procedures. Unless, of course, > you're wanting to httpimport httpimport, in which case you're going to > run into bootstrapping problems whichever way you do it :) I will never open an issue for 'httpimporting the httpimport itself' It is a promise! > > I think we're on the same page here, but it definitely needs some more > text in the README to explain this - particularly how this is not a > replacement for pip. For example, my first thought on seeing this was > "wow, that's going to be abysmally slow unless it has a cache", but > the answer to that is: if you need a cache, you probably should be > using pip to install things properly. > > Still -1 on this becoming a stdlib package, as there's nothing I've > yet seen that can't be done as a third-party package. But it's less > scary than I thought it was :) The reason I though that this could serve greatly as an stdlib package is that it will broaden the horizon for importing arbitrary stuff just to see if they are working as expected. "Testing" is the world! > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From john.torakis at gmail.com Wed Aug 23 14:15:29 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 21:15:29 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: On 23/08/2017 21:11, Chris Angelico wrote: > On Thu, Aug 24, 2017 at 4:04 AM, Bruce Leban wrote: >> On Wed, Aug 23, 2017 at 10:37 AM, John Torakis >> wrote: >>> >>> Github can be trusted 100% percent for example. >> >> This isn't even remotely close to true. While I'd agree with the statement >> that the SSL cert on github is reasonably trustworthy, the *content* on >> github is NOT trustworthy and that's where the security risk is. >> >> I agree that this is a useful feature and there is no way it should be on by >> default. The right way IMHO to do this is to have a command line option >> something like this: >> >> python --http-import somelib=https://github.com/someuser/somelib > If you read his README, it's pretty explicit about URLs; the risk is > that "https://github.com/someuser/somelib" can be intercepted, not > that "someuser" is malicious. If you're worried about the latter, > don't use httpimport. Again, if https://github.com/someuser/somelib can be intercepted, https://pypi.python.org/pypi can too. If HTTPS is intercepted so easily (when not used from browsers) we are f**ed... > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Wed Aug 23 14:24:40 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 23 Aug 2017 19:24:40 +0100 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: On 23 August 2017 at 18:49, Chris Angelico wrote: > Still -1 on this becoming a stdlib package, as there's nothing I've > yet seen that can't be done as a third-party package. But it's less > scary than I thought it was :) IMO, this would make a great 3rd party package (I note that it's not yet published on PyPI). It's possible that it would end up being extremely popular, and recognised as sufficiently secure - at which point it may be worth considering for core inclusion. But it's also possible that it remains niche, and/or people aren't willing to take the security risks that it implies, in which case it's still useful to those who do like it. One aspect that hasn't been mentioned yet - as a 3rd party module, the user (or the organisation's security team) can control whether or not the ability to import over the web is available by controlling whether the module is allowed to be installed - whereas with a core module, it's there, like it or not, and *all* Python code has to be audited on the assumption that it might be used. I could easily imagine cases where the httpimport module was allowed on development machines and CI servers, but forbidden on production (and pre-production) systems. That option simply isn't available if the feature is in the core. Paul From guido at python.org Wed Aug 23 14:41:26 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2017 11:41:26 -0700 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: This isn't ever going to be a standard feature. It's available as a third-party package and that's fine. I'd like to add a historic note -- this was first proposed around 1995 by Michael McLay. (Sorry, I don't have an email sitting around, but I'm sure he brought this up at or around the first Python workshop at NIST in 1995 -- I was his guest at NIST for several months at the time.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.torakis at gmail.com Wed Aug 23 14:41:57 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 21:41:57 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: <292e25ca-ca97-ee3b-04c9-d812cc43fd60@gmail.com> On 23/08/2017 21:24, Paul Moore wrote: > On 23 August 2017 at 18:49, Chris Angelico wrote: >> Still -1 on this becoming a stdlib package, as there's nothing I've >> yet seen that can't be done as a third-party package. But it's less >> scary than I thought it was :) > IMO, this would make a great 3rd party package (I note that it's not > yet published on PyPI). It's possible that it would end up being > extremely popular, and recognised as sufficiently secure - at which > point it may be worth considering for core inclusion. But it's also > possible that it remains niche, and/or people aren't willing to take > the security risks that it implies, in which case it's still useful to > those who do like it. PyPI upload is scheduled when some more testing and commenting takes place. > One aspect that hasn't been mentioned yet - as a 3rd party module, the > user (or the organisation's security team) can control whether or not > the ability to import over the web is available by controlling whether > the module is allowed to be installed - whereas with a core module, > it's there, like it or not, and *all* Python code has to be audited on > the assumption that it might be used. True! But you can urlopen()->exec() anything out there anyway! A ">>>" prompt is all you need. > I could easily imagine cases > where the httpimport module was allowed on development machines and CI > servers, but forbidden on production (and pre-production) systems. > That option simply isn't available if the feature is in the core. I agree that there are circumstances that this module should not be used (regardless of security implications). In a released product for example. Depending on the UP-ness of a remote repository (e.g github), not to even mention the API backward-compatibility of an upstream package, is just **BAD** for a ready-released-deliverable product! This is why we have virtual environments! But it remains an option to use it or not! I, for example, find myself REPLing more than scripting. When REPLing for something you plan to implement sometime-somehow, this module is really what you need! But when I finally create a script, I won't disable its offline functionality just to use httpimport. That would be suicidal! When I finally come with a working thing I will finally land the used packages to disk and to a virtual environment. My argument is that this module will add greatly to the Python's ad-hoc testing capabilities! I find it elegant for such feature to be in the stdlib of a language. I don't doubt that it can survive as a 3rd party module, though. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From john.torakis at gmail.com Wed Aug 23 14:44:15 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 21:44:15 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: <9dddff31-5d87-2be0-1d97-a662d5a1a784@gmail.com> On 23/08/2017 21:41, Guido van Rossum wrote: > This isn't ever going to be a standard feature. It's available as a > third-party package and that's fine. > > I'd like to add a historic note -- this was first proposed around 1995 > by Michael McLay. (Sorry, I don't have an email sitting around, but > I'm sure he brought this up at or around the first Python workshop at > NIST in 1995 -- I was his guest at NIST for several months at the time.) > Woah! I was 2 years old at that time! Little did I know! Can I ask why it got rejected the first time? > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Aug 23 14:48:57 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2017 11:48:57 -0700 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <9dddff31-5d87-2be0-1d97-a662d5a1a784@gmail.com> References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> <9dddff31-5d87-2be0-1d97-a662d5a1a784@gmail.com> Message-ID: For security reasons. AFAIK HTTPS wasn't even invented at the time. On Wed, Aug 23, 2017 at 11:44 AM, John Torakis wrote: > > > On 23/08/2017 21:41, Guido van Rossum wrote: > > This isn't ever going to be a standard feature. It's available as a > third-party package and that's fine. > > I'd like to add a historic note -- this was first proposed around 1995 by > Michael McLay. (Sorry, I don't have an email sitting around, but I'm sure > he brought this up at or around the first Python workshop at NIST in 1995 > -- I was his guest at NIST for several months at the time.) > > Woah! I was 2 years old at that time! Little did I know! > Can I ask why it got rejected the first time? > > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.torakis at gmail.com Wed Aug 23 15:04:37 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 22:04:37 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> <9dddff31-5d87-2be0-1d97-a662d5a1a784@gmail.com> Message-ID: Dark times... So is it a "case closed", or is there any improvement that will make it worth it to be an stdlib module? I mean, times have changed from 1995, and I am not referring to HTTPS invention. This is the reason that makes httpimport just tolerable security-wise. I'm talking about the need to rapidly test public code. I insist that testing code available on Github (or other repos), without the venv/clone/install hassle is a major improvement in my (and most sec researchers' I know) Python workflow. It makes REPL prototyping million times smoother. We all have created small scripts that auto load modules from URLs anyway. That's why I thought that this modules falls under the second category of 20.2.1 in https://docs.python.org/devguide/stdlibchanges.html (I did my homework before getting to mail in this list). So, if there is something that would make this module acceptable for stdlib, please let me know! I'd more than happily reform it and make it comply with Python stdlib requirements. John Torakis On 23/08/2017 21:48, Guido van Rossum wrote: > For security reasons. AFAIK HTTPS wasn't even invented at the time. > > On Wed, Aug 23, 2017 at 11:44 AM, John Torakis > wrote: > > > > On 23/08/2017 21:41, Guido van Rossum wrote: >> This isn't ever going to be a standard feature. It's available as >> a third-party package and that's fine. >> >> I'd like to add a historic note -- this was first proposed around >> 1995 by Michael McLay. (Sorry, I don't have an email sitting >> around, but I'm sure he brought this up at or around the first >> Python workshop at NIST in 1995 -- I was his guest at NIST for >> several months at the time.) >> > Woah! I was 2 years old at that time! Little did I know! > Can I ask why it got rejected the first time? >> -- >> --Guido van Rossum (python.org/~guido ) >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ Python-ideas > mailing list Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > Code of > Conduct: http://python.org/psf/codeofconduct/ > > > -- > --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Aug 23 15:06:37 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 24 Aug 2017 05:06:37 +1000 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> <9dddff31-5d87-2be0-1d97-a662d5a1a784@gmail.com> Message-ID: On Thu, Aug 24, 2017 at 5:04 AM, John Torakis wrote: > Dark times... > > So is it a "case closed", or is there any improvement that will make it > worth it to be an stdlib module? > > I mean, times have changed from 1995, and I am not referring to HTTPS > invention. This is the reason that makes httpimport just tolerable > security-wise. > > I'm talking about the need to rapidly test public code. I insist that > testing code available on Github (or other repos), without the > venv/clone/install hassle is a major improvement in my (and most sec > researchers' I know) Python workflow. It makes REPL prototyping million > times smoother. > We all have created small scripts that auto load modules from URLs anyway. > That's why I thought that this modules falls under the second category of > 20.2.1 in https://docs.python.org/devguide/stdlibchanges.html (I did my > homework before getting to mail in this list). > > So, if there is something that would make this module acceptable for stdlib, > please let me know! I'd more than happily reform it and make it comply with > Python stdlib requirements. Why can't people just "pip install httpimport" to make use of it? Why does it need to be in the stdlib? ChrisA From john.torakis at gmail.com Wed Aug 23 15:21:18 2017 From: john.torakis at gmail.com (John Torakis) Date: Wed, 23 Aug 2017 22:21:18 +0300 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> <9dddff31-5d87-2be0-1d97-a662d5a1a784@gmail.com> Message-ID: <6cf767cc-e42f-23c5-3932-653aa285a4df@gmail.com> On 23/08/2017 22:06, Chris Angelico wrote: > On Thu, Aug 24, 2017 at 5:04 AM, John Torakis wrote: >> Dark times... >> >> So is it a "case closed", or is there any improvement that will make it >> worth it to be an stdlib module? >> >> I mean, times have changed from 1995, and I am not referring to HTTPS >> invention. This is the reason that makes httpimport just tolerable >> security-wise. >> >> I'm talking about the need to rapidly test public code. I insist that >> testing code available on Github (or other repos), without the >> venv/clone/install hassle is a major improvement in my (and most sec >> researchers' I know) Python workflow. It makes REPL prototyping million >> times smoother. >> We all have created small scripts that auto load modules from URLs anyway. >> That's why I thought that this modules falls under the second category of >> 20.2.1 in https://docs.python.org/devguide/stdlibchanges.html (I did my >> homework before getting to mail in this list). >> >> So, if there is something that would make this module acceptable for stdlib, >> please let me know! I'd more than happily reform it and make it comply with >> Python stdlib requirements. > Why can't people just "pip install httpimport" to make use of it? Why > does it need to be in the stdlib? It doesn't, strictly speaking, *need* to be in stdlib. Of course it doesn't! Python is good enough without it. But, as it seems like it is a very big feature (to me at least), it feels right to be "officially" a Python feature. It feels right to be officially supported and not just another module, as it extends core import functionality. Just like zipimport does. Zipimport could just be a 3rd party module too. But it is in the core, and I can see why. Anyway, I will post it to PyPI when I finalize Github support and extend the testing a little bit. I will then shoot a mail again and repropose the module when it reaches full maturity. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ Thank you all for your time! John Torakis From bruce at leban.us Wed Aug 23 17:19:48 2017 From: bruce at leban.us (Bruce Leban) Date: Wed, 23 Aug 2017 14:19:48 -0700 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: On Wed, Aug 23, 2017 at 11:11 AM, Chris Angelico wrote: > > > If you read his README, it's pretty explicit about URLs; the risk is > that "https://github.com/someuser/somelib" can be intercepted, not > that "someuser" is malicious. If you're worried about the latter, > don't use httpimport. I don't see the word "security" or "risk" in the readme. The risk is not just that someuser is malicious but the risk that they, their github credentials or their code have been compromised. The reason that if this feature were to be implemented, I would want it outside the source code (command line option) is that that puts the control in the hands of the person running the code. This is appropriate for the stated scenarios. There's no possibility of a hidden live github dependency. --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Aug 23 18:20:48 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 23 Aug 2017 22:20:48 +0000 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: On Wed, Aug 23, 2017 at 3:31 AM Nick Coghlan wrote: > On 23 August 2017 at 08:20, rymg19 at gmail.com wrote: > > TBH you're completely right. Every time I see someone using unittest > > andItsHorriblyUnpythonicNames, I want to kill a camel. > > > > Sometimes, though, I feel like part of the struggle is the alternative. > If > > you dislike unittest, but pytest is too "magical" for you, what do you > use? > > Many Python testing tools like nose are just test *runners*, so you still > > need something else. In the end, many just end up back at unittest, maybe > > with nose on top. > > A snake_case helper API for unittest that I personally like is > hamcrest, since that also separates out the definition of testing > assertions from being part of a test case: > https://pypi.python.org/pypi/PyHamcrest > > Introducing such a split natively into unittest is definitely > attractive, but would currently be difficult due to the way that some > features like self.maxDiff and self.subTest work. > > However, PEP 550's execution contexts may provide a way to track the > test state reliably that's independent of being a method on a test > case instance, in which case it would become feasible to offer a more > procedural interface in addition to the current visibly > object-oriented one. > If you have time, could you expand on that a little bit? > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/cF_4IlJq698/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Aug 23 20:14:53 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 23 Aug 2017 17:14:53 -0700 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: On Tue, Aug 22, 2017 at 7:05 PM, Neil Girdhar wrote: > Like you, I used nose and then switched to pytest. The reason I proposed > this for unittest is because pytest and nose and (I think) most of the > other testing frameworks inherit from unittest, > not really -- they extend unittest -- in the sense that their test runners can be used with unittest TestCases -- but they don't depend on unitest. so improving unittest has downstream benefits. > only to those using unittest -- a lot of folks do use pytest or nose primarily as a test runner, so those folks would benefit. I may nevertheless propose this to the pytest people if this doesn't make > it into unittest. > Anyway, I'm just being a curmudgeon -- if folks think it would be useful and not disruptive, then why not? -CHB > On Tue, Aug 22, 2017 at 8:26 PM Chris Barker > wrote: > >> On Tue, Aug 22, 2017 at 5:19 PM, Chris Barker >> wrote: >> >>> anyway, that's enough ranting..... >>> >> >> Got carried away with the ranting, and didn't flesh out my point. >> >> My point is that unittest is a very static, not very pythonic framework >> -- if you are productive with it, great, but I don't think it's worth >> trying to add more pythonic niceties to. Chances are pytest (Or nose2?) may >> already have them, or, if not, the simpler structure of pytest tests make >> them easier to write yourself. >> >> -CHB >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/ >> topic/python-ideas/cF_4IlJq698/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> -- >> >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/ >> topic/python-ideas/cF_4IlJq698/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe at googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 23 20:26:19 2017 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 23 Aug 2017 17:26:19 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: On Wed, Aug 23, 2017 at 8:41 AM, Guido van Rossum wrote: > If we're extending the analogy with thread-locals we should at least > consider making each instantiation return a namespace rather than something > holding a single value. We have > > log_state = threading.local() > log_state.verbose = False > > def action(x): > if log_state.verbose: > print(x) > > def make_verbose(): > log_state.verbose = True > > It would be nice if we could upgrade this to make it PEP 550-aware so that > only the first line needs to change: > > log_state = sys.AsyncLocal("log state") > # The rest is the same You can mostly implement this on top of the current PEP 550. Something like: _tombstone = object() class AsyncLocal: def __getattribute__(self, name): # if this raises AttributeError, we let it propagate key = object.__getattribute__(self, name) value = key.get() if value is _tombstone: raise AttributeError(name) return value def __setattr__(self, name, value): try: key = object.__getattribute__(self, name) except AttributeError: with some_lock: # double-checked locking pattern try: key = object.__getattribute__(self, name) except AttributeError: key = new_context_key() object.__setattr__(self, name, key) key.set(value) def __delattr__(self, name): self.__setattr__(name, _tombstone) def __dir__(self): # filter out tombstoned values return [name for name in object.__dir__(self) if hasattr(self, name)] Issues: Minor problem: On threading.local you can use .__dict__ to get the dict. That doesn't work here. But this could be done by returning a mapping proxy type, or maybe it's better not to support at all -- I don't think it's a big issue. Major problem: An attribute setting/getting API doesn't give any way to solve the save/restore problem [1]. PEP 550 v3 doesn't have a solution to this yet either, but we know we can do it by adding some methods to context-key. Supporting this in AsyncLocal is kinda awkward, since you can't use methods on the object -- I guess you could have some staticmethods, like AsyncLocal.save_state(my_async_local, name) and AsyncLocal.restore_state(my_async_local, name, value)? In any case this kinda spoils the sense of like "oh it's just an object with attributes, I already know how this works". Major problem: There are two obvious implementations. The above uses a separate ContextKey for each entry in the dict; the other way would be to have a single ContextKey that holds a dict. They have subtly different semantics. Suppose you have a generator and inside it you assign to my_async_local.a but not to my_async_local.b, then yield, and then the caller assigns to my_async_local.b. Is this visible inside the generator? In the ContextKey-holds-an-attribute approach, the answer is "yes": each AsyncLocal is a bag of independent attributes. In the ContextKey-holds-a-dict approach, the answer is "no": each AsyncLocal is a single container holding a single piece of (complex) state. It isn't obvious to me which of these semantics is preferable ? maybe it is if you're Dutch :-). But there's a danger that either option leaves a bunch of people confused. (Tangent: in the ContextKey-holds-a-dict approach, currently you have to copy the dict before mutating it every time, b/c PEP 550 currently doesn't provide a way to tell whether the value returned by get() came from the top of the stack, and thus is private to you and can be mutated in place, or somewhere deeper, and thus is shared and shouldn't be mutated. But we should fix that anyway, and anyway copy-the-mutate is a viable approach.) Observation: I don't think there's any simpler way to implement AsyncLocal other than to start with machinery like what PEP 550 already proposes, and then layer something like the above on top of it. We could potentially hide the layers inside the interpreter and only expose AsyncLocal, but I don't think it really simplifies the implementation any. Observation: I feel like many users of threading.local -- possibly the majority -- only put a single attribute on each object anyway, so for those users a raw ContextKey API is actually more natural and faster. For example, looking through the core django repo, I see thread locals in - django.utils.timezone._active - django.utils.translation.trans_real._active - django.urls.base._prefixes - django.urls.base._urlconfs - django.core.cache._caches - django.urls.resolvers.RegexURLResolver._local - django.contrib.gis.geos.prototypes.threadsafe.thread_context - django.contrib.gis.geos.prototypes.io.thread_context - django.db.utils.ConnectionHandler._connections Of these 9 thread-local objects, 7 of them have only a single attribute; only the last 2 use multiple attributes. For the first 4, that attribute is even called "value", which seems like a pretty clear indication that the authors found the whole local-as-namespace thing a nuisance to work around rather than something helpful. I also looked at asyncio; it has 2 threading.locals, and they each contain 2 attributes. But the two attributes are always read/written together; to me it would feel more natural to model this as a single ContextKey holding a small dict or tuple instead of something like AsyncLocal. So tl;dr: I think PEP 550 should just focus on a single object per key, and the subgroup of users who want to convert that to a more threading.local-style interface can do that themselves as efficiently as we could, once they've decided how they want to resolve the semantic issues. -n [1] https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.py -- Nathaniel J. Smith -- https://vorpus.org From yselivanov.ml at gmail.com Wed Aug 23 20:36:38 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 23 Aug 2017 20:36:38 -0400 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: There's another "major" problem with theading.local()-like API for PEP 550: C API. threading.local() in C right now is PyThreadState_GetDict(), which returns a dictionary for the current thread, that can be queried/modified with PyDict_* functions. For PEP 550 this would not work. The advantage of the current ContextKey solution is that the Python API and C API are essentially the same: [1] Another advantage, is that ContextKey implements a better caching, because it can have only one value cached in it, see [2] for details. [1] https://www.python.org/dev/peps/pep-0550/#new-apis [2] https://www.python.org/dev/peps/pep-0550/#contextkey-get-cache Yury From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 23 22:13:53 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 24 Aug 2017 11:13:53 +0900 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> Message-ID: <22942.13921.914647.971532@turnbull.sk.tsukuba.ac.jp> Chris Angelico writes: > If you're worried about the latter, don't use httpimport. I guarantee you that in my (university) environment, if httpimport is in the stdlib, its use will be rampant (and not just by students, but by security-oblivious faculty). I want to be able to walk up to a student, say "may I?" and type "python -m httpimport" to determine if that particular risky behavior is a worry. Because *I'm* liable for my students' PCs' behavior on the network. Personally speaking, +1 on PyPI, -100 on stdlib. Steve From rosuav at gmail.com Wed Aug 23 22:23:31 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 24 Aug 2017 12:23:31 +1000 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <22942.13921.914647.971532@turnbull.sk.tsukuba.ac.jp> References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> <22942.13921.914647.971532@turnbull.sk.tsukuba.ac.jp> Message-ID: On Thu, Aug 24, 2017 at 12:13 PM, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > If you're worried about the latter, don't use httpimport. > > I guarantee you that in my (university) environment, if httpimport is > in the stdlib, its use will be rampant (and not just by students, but > by security-oblivious faculty). I want to be able to walk up to a > student, say "may I?" and type "python -m httpimport" to determine if > that particular risky behavior is a worry. Because *I'm* liable for > my students' PCs' behavior on the network. > > Personally speaking, +1 on PyPI, -100 on stdlib. Agreed, and a VERY good reason for this to be an explicitly-installed package. By its nature, it won't be a dependency of other packages, so keeping it out of the stdlib pretty much guarantees that it'll only be available if it's been called for by name. ChrisA From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Aug 23 22:26:26 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 24 Aug 2017 11:26:26 +0900 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: <6cf767cc-e42f-23c5-3932-653aa285a4df@gmail.com> References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> <9dddff31-5d87-2be0-1d97-a662d5a1a784@gmail.com> <6cf767cc-e42f-23c5-3932-653aa285a4df@gmail.com> Message-ID: <22942.14674.486061.623216@turnbull.sk.tsukuba.ac.jp> John Torakis writes: > But, as it seems like it is a very big feature (to me at least), And "pip install httpimport" seems like it is a very small burden (to me at least). I agree with Paul Moore. Putting this in the stdlib seems both unnecessary, given pip, and an attractive nuisance for naive users. >From the point of view of the blue team, checking for mere presence of httpimport in the environment is indicative of danger if it's pip-able, useless if it's in the stdlib. With respect to "it just makes exec(urlopen()) easier", any code must be audited for application of exec() to user input anyway, regardless of whether it fetches stuff off the Internet. Adding httpimport use to the checklist adds a little bit of complexity to *every* security check, and a fair amount of danger in security-oblivious environments such as many university labs, and I would imagine many corporate development groups as well. YMMV, but from the point of view of the larger, security-conscious organization, I would say -1. It's an attractive nuisance unless you're a security person, and then pip is not a big deal. Steve From guido at python.org Wed Aug 23 23:42:03 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Aug 2017 20:42:03 -0700 Subject: [Python-ideas] PEP 550 v2 In-Reply-To: References: Message-ID: OK, I get it now. I really liked the analysis of existing uses in Django. So no worries about this. On Wed, Aug 23, 2017 at 5:36 PM, Yury Selivanov wrote: > There's another "major" problem with theading.local()-like API for PEP > 550: C API. > > threading.local() in C right now is PyThreadState_GetDict(), which > returns a dictionary for the current thread, that can be > queried/modified with PyDict_* functions. For PEP 550 this would not > work. > > The advantage of the current ContextKey solution is that the Python > API and C API are essentially the same: [1] > > Another advantage, is that ContextKey implements a better caching, > because it can have only one value cached in it, see [2] for details. > > [1] https://www.python.org/dev/peps/pep-0550/#new-apis > [2] https://www.python.org/dev/peps/pep-0550/#contextkey-get-cache > > Yury > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Aug 24 05:20:48 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Aug 2017 19:20:48 +1000 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: On 24 August 2017 at 08:20, Neil Girdhar wrote: > On Wed, Aug 23, 2017 at 3:31 AM Nick Coghlan wrote: >> However, PEP 550's execution contexts may provide a way to track the >> test state reliably that's independent of being a method on a test >> case instance, in which case it would become feasible to offer a more >> procedural interface in addition to the current visibly >> object-oriented one. > > If you have time, could you expand on that a little bit? unittest.TestCase provides a few different "config setting" type attributes that affect how failures are reported: - self.maxDiff (length limit for rich diffs) - self.failureException (exception used to report errors) - self.longMessage (whether custom messages replace or supplement the default ones) There are also introspection methods about the currently running test: - self.id() (currently running test ID) - self.shortDescription() (test description) And some stateful utility functions: - self.addSubTest() (tracks subtest results) - self.addCleanup() (tracks resource cleanup requests) At the moment, these are all passed in to test methods as a piece of explicit context (the "self" attribute), and that's what makes it hard to refactor unittest to support standalone top-level test functions and standalone assertion functions: there's currently no way to implicitly make those settings and operations available implicitly instead. That all changes if there's a robust way for the unittest module to track the "active test case" that owns the currently running test method without passing the test case reference around explicitly: - existing assertion & helper methods can be wrapped with independently importable snake_case functions that look for the currently active test case and call the relevant methods on it - new assertion functions can be added to separate modules rather than adding yet more methods to TestCase (see https://bugs.python.org/issue18054 for some discussion of that) - given the above enhancements, the default test loader could usefully gain support for top level function definitions (by wrapping them in autogenerated test case instances) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Aug 24 05:47:24 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Aug 2017 19:47:24 +1000 Subject: [Python-ideas] Remote package/module imports through HTTP/S In-Reply-To: References: <58b95461-9c9e-23bf-3167-548545584fdf@gmail.com> <801b5892-8e74-2cd8-93a9-4f082e2cfe3b@gmail.com> <572c0d31-1cfe-ac01-cfd4-0a64dadf88f9@gmail.com> <9dddff31-5d87-2be0-1d97-a662d5a1a784@gmail.com> Message-ID: On 24 August 2017 at 05:04, John Torakis wrote: > Dark times... > > So is it a "case closed", or is there any improvement that will make it > worth it to be an stdlib module? Not really, as even aside from the security concerns, there are simply too many ways that it can fail that are outside of our control, but would potentially lead to folks filing bug reports against CPython without realising that the problem actually lies somewhere else (e.g. with their network configuration). For a third party module, that's not a problem: - folks have to find out httpimport exists - folks have to decide "I want this" - folks have to explicitly install & enable it - folks still get to keep all the very shiny pieces when it breaks unexpectedly, but they also already know where to go for help :) Being a third party utility means you can also update it on your own timeline, rather than being limited to the standard library's relatively slow update and rollout cycles. >From a compatibility point of view, we also *like* having sophisticated import system plugins like httpimport out in the wild, as it means: - it actually makes sense to define & maintain the import plugin APIs that make it possible - there's additional integration testing of those APIs happening beyond our own test suite Putting away my import system co-maintainer hat and donning my commercial redistributor hat: it already bothers some of our (and our customers') security folks that we ship package installation tools that access unfiltered third party package repositories by default (e.g. pip defaulting to querying PyPI). As a result, I'm pretty sure that even if upstream said "httpimport is in the Python standard library now!", we'd get explicit requests asking us to take it out of our redistributed version and make it at most an optional install (similar to what we do with IDLE and Tcl/Tk support in general). Cheers, Nick. P.S. As a potentially useful point of reference: "it's hard to debug when it breaks" is the main reason we resisted adding native lazy import support for so long, and that's just a matter of moving import errors away from the import statement and instead raising them as a side effect of an attribute access. It's also why we moved reload() *out* of the builtins in the move to Python 3: while module reloading is a fully supported operation, it also has a lot of subtleties that make it easy to get wrong. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Thu Aug 24 05:50:05 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 24 Aug 2017 09:50:05 +0000 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: Makes sense. Thanks! On Thu, Aug 24, 2017 at 5:20 AM Nick Coghlan wrote: > On 24 August 2017 at 08:20, Neil Girdhar wrote: > > On Wed, Aug 23, 2017 at 3:31 AM Nick Coghlan wrote: > >> However, PEP 550's execution contexts may provide a way to track the > >> test state reliably that's independent of being a method on a test > >> case instance, in which case it would become feasible to offer a more > >> procedural interface in addition to the current visibly > >> object-oriented one. > > > > If you have time, could you expand on that a little bit? > > unittest.TestCase provides a few different "config setting" type > attributes that affect how failures are reported: > > - self.maxDiff (length limit for rich diffs) > - self.failureException (exception used to report errors) > - self.longMessage (whether custom messages replace or supplement the > default ones) > > There are also introspection methods about the currently running test: > > - self.id() (currently running test ID) > - self.shortDescription() (test description) > > And some stateful utility functions: > > - self.addSubTest() (tracks subtest results) > - self.addCleanup() (tracks resource cleanup requests) > > At the moment, these are all passed in to test methods as a piece of > explicit context (the "self" attribute), and that's what makes it hard > to refactor unittest to support standalone top-level test functions > and standalone assertion functions: there's currently no way to > implicitly make those settings and operations available implicitly > instead. > > That all changes if there's a robust way for the unittest module to > track the "active test case" that owns the currently running test > method without passing the test case reference around explicitly: > > - existing assertion & helper methods can be wrapped with > independently importable snake_case functions that look for the > currently active test case and call the relevant methods on it > - new assertion functions can be added to separate modules rather than > adding yet more methods to TestCase (see > https://bugs.python.org/issue18054 for some discussion of that) > - given the above enhancements, the default test loader could usefully > gain support for top level function definitions (by wrapping them in > autogenerated test case instances) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Aug 24 10:04:58 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Aug 2017 10:04:58 -0400 Subject: [Python-ideas] PEP 550 dumbed down In-Reply-To: References: Message-ID: Jim J. Jewett wrote: > I know I'm not the only one who is confused by at least some of the > alternative terminology choices. I suspect I'm not the only one who > sometimes missed part of the argument because I was distracted > figuring out what the objects were, and forgot to verify what was > being done and why. I also suspect that it could be much simpler to > follow if the API were designed in the abstract, with the > implementation left for later. You're definitely not alone! I think I get the gist of the proposal, and its motivation, but I'm definitely confused by the terminology. As I stated elsewhere, the word "context" has a well-established meaning in Python, with context managers, their protocols, and contextlib. When talking with another Pythonista three years from now, I don't want to have to resolve which context they're talking about based on context. ;) I think you have a point too about designing the abstract behavior and API first, and then worry about implementation details (in fact, maybe take implementation discussions out of the PEP for now, and maybe hash that out in a PR). I also think you're on to something when you suggest that sys may not be the best place for these new APIs. sys is already a mishmash of lots of random stuff, and the concepts defined in PEP 550 are advanced enough that many Python developers will never need to worry about them. Putting them in sys leads to cognitive overload. I'm not sure I'd put them in builtins either, but a new module makes a lot of sense to me. Plus, it means that we can choose more natural names for the APIs since they'll be namespaced away in a separate module. Cheers, -Barry From stefan at bytereef.org Thu Aug 24 10:53:16 2017 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 24 Aug 2017 16:53:16 +0200 Subject: [Python-ideas] PEP 550 dumbed down In-Reply-To: References: Message-ID: <20170824145316.GA2654@bytereef.org> On Thu, Aug 24, 2017 at 10:04:58AM -0400, Barry Warsaw wrote: > Jim J. Jewett wrote: > > I know I'm not the only one who is confused by at least some of the > > alternative terminology choices. I suspect I'm not the only one who > > sometimes missed part of the argument because I was distracted > > figuring out what the objects were, and forgot to verify what was > > being done and why. I also suspect that it could be much simpler to > > follow if the API were designed in the abstract, with the > > implementation left for later. > > You're definitely not alone! I think I get the gist of the proposal, > and its motivation, but I'm definitely confused by the terminology. As > I stated elsewhere, the word "context" has a well-established meaning in > Python, with context managers, their protocols, and contextlib. When > talking with another Pythonista three years from now, I don't want to > have to resolve which context they're talking about based on context. ;) I'm not happy about "context" either. I'd prefer something more pedantic, like: TaskLocalStorage, TaskLocalStorageStack, even when generators aren't tasks. At least that's what people are used to from ThreadLocalStorage. The .NET termiology is explained here: https://blogs.msdn.microsoft.com/pfxteam/2012/06/15/executioncontext-vs-synchronizationcontext/ But that is more of an OO approach --- there are more "subclasses" of ExecutionContexts like SecurityContext, HostExecutionContext, CallContext and there's colorful terminology like "flowing the Execution Context". Stefan Krah From robertc at robertcollins.net Fri Aug 25 01:28:22 2017 From: robertc at robertcollins.net (Robert Collins) Date: Fri, 25 Aug 2017 17:28:22 +1200 Subject: [Python-ideas] Please consider adding context manager versions of setUp/tearDown to unittest.TestCase In-Reply-To: References: Message-ID: So (wearing my maintainer hat for unittest) - very happy to consider proposals and patches; I'd very much like to fix some structural APIs in unittest, but I don't have the bandwidth to do so myself at this point. And what you're asking about is largely a structural issue because of the interactions with test reporting, with class/module setup. As Ned says though, the specific asked question is best solved by using the contextmanager protocol and manually entering and exiting: addCleanup is ideal (literally designed for this) for managing that. The fixtures library uses this to make use of fixtures (which are merely enhanced context managers) trivial. We should add an adapter there I think. If I get time I'll put this on stackexchange but: ``` import unittest import fixtures class ContextFixture(fixtures.Fixture): def __init__(self, cm): super().init() self._cm = cm def _setUp(self): self.addCleanup(self._cm.__exit__, None, None, None) self._cm.__enter__() class Example(fixtures.TestWithFixtures): def setUp(self): super().setUp() self._cm_reference_if_I_need_it = self.useFixture(ContextFixture(MyContextManager())) def test_fred(self): 1/0 ``` should (I haven't tested it :P) do the right thing I've written about maintainability in unittest previously [1] [2], and those experiments have worked very well. Your post has reminded me of the stalled work in this space. In particular avoiding inheritance for code reuse has much better maintenance properties. I think we learnt enough to sensibly propose it as an evolution for core unittest though some discussion is needed: for instance, the MIME attachment aspect weirds some folk out, though its very very very useful in the cases where it matters, and pretty ignorable in the cases where it doesn't. Another related thing is getting testresources awkward bits fixed so that it becomes a joy to use - its a much better approach than classsetup and modulesetup if for no other reason than it is concurrency friendly [partition and execute] whereas what was put into unittest isn't unless you also isolate the modules and class instances, which effectively requires processes. Lastly the broad overarching refactor I'd like to do is twofold: - completely separate the different users for testcase: the test executor, report, and the test author are in the same namespace today and its super awkward. Adding a new executor only interface at e.g. case._executor would allow test authors to have much more freedom about what they override and don't without worrying about interactions with the test running framework. Moving all the reporting back up to the executor as a thunk would decouple the reporting logic from the internals of the test case allowing for the elimination of placeholder objects for glue between different test systems. - tweaking the existing pseudo-streaming contracts for the executor to be more purely forward-flow only, aware of concurrency, and more detailed - e.g. provide a means for tests to emit metrics like 'setting up this database took 10 seconds' and have that discarded-or-captured-if-the-reporter-supports it would be very useful in larger test systems. Right now everyone that does this does it in a bespoke fashion. re: hamcrest - love it. Thats what testtools.matchers were inspired by. But we go a bit further I think, in useful ways. Lastly, pytest - its beautiful, great community, some bits that I will never see eye to eye on :). Use it and enjoy, or not - whatever works for you :) -Rob 1: https://rbtcollins.wordpress.com/2010/05/10/maintainable-pyunit-test-suites/ 2: https://rbtcollins.wordpress.com/2010/09/18/maintainable-pyunit-test-suites-fixtures/ On 24 August 2017 at 21:50, Neil Girdhar wrote: > Makes sense. Thanks! > > On Thu, Aug 24, 2017 at 5:20 AM Nick Coghlan wrote: >> >> On 24 August 2017 at 08:20, Neil Girdhar wrote: >> > On Wed, Aug 23, 2017 at 3:31 AM Nick Coghlan wrote: >> >> However, PEP 550's execution contexts may provide a way to track the >> >> test state reliably that's independent of being a method on a test >> >> case instance, in which case it would become feasible to offer a more >> >> procedural interface in addition to the current visibly >> >> object-oriented one. >> > >> > If you have time, could you expand on that a little bit? >> >> unittest.TestCase provides a few different "config setting" type >> attributes that affect how failures are reported: >> >> - self.maxDiff (length limit for rich diffs) >> - self.failureException (exception used to report errors) >> - self.longMessage (whether custom messages replace or supplement the >> default ones) >> >> There are also introspection methods about the currently running test: >> >> - self.id() (currently running test ID) >> - self.shortDescription() (test description) >> >> And some stateful utility functions: >> >> - self.addSubTest() (tracks subtest results) >> - self.addCleanup() (tracks resource cleanup requests) >> >> At the moment, these are all passed in to test methods as a piece of >> explicit context (the "self" attribute), and that's what makes it hard >> to refactor unittest to support standalone top-level test functions >> and standalone assertion functions: there's currently no way to >> implicitly make those settings and operations available implicitly >> instead. >> >> That all changes if there's a robust way for the unittest module to >> track the "active test case" that owns the currently running test >> method without passing the test case reference around explicitly: >> >> - existing assertion & helper methods can be wrapped with >> independently importable snake_case functions that look for the >> currently active test case and call the relevant methods on it >> - new assertion functions can be added to separate modules rather than >> adding yet more methods to TestCase (see >> https://bugs.python.org/issue18054 for some discussion of that) >> - given the above enhancements, the default test loader could usefully >> gain support for top level function definitions (by wrapping them in >> autogenerated test case instances) >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From francismb at email.de Sat Aug 26 14:25:41 2017 From: francismb at email.de (francismb) Date: Sat, 26 Aug 2017 20:25:41 +0200 Subject: [Python-ideas] Unittest error message failure context lazy creation Message-ID: <154eb60a-1a97-8ddd-e6ee-fd3f6d72b997@email.de> Hi all, while using `unittest` I see the pattern of creating an error message with the test context for the case that some `assert...` methods fails (to get a good error message). On the lines: class Test...(unittest.TestCase): longMessage = True def test_(self): ... for a, b, c ... in zip(A, B, C, ..): * call the function under test and get the result msg = "Some headline: {}{} ...".format(a, b, c,..) self.assert...( ,msg) The `msg` is just used in case the assert fails but its creation takes time and adds up. What is the best practice/pattern you use here? Or, are there ideas to have a lazy mechanism to avoid that creation and just infer them in the case the assert failed? Thanks in advance! --francis From rosuav at gmail.com Sat Aug 26 22:03:57 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 27 Aug 2017 12:03:57 +1000 Subject: [Python-ideas] Unittest error message failure context lazy creation In-Reply-To: <154eb60a-1a97-8ddd-e6ee-fd3f6d72b997@email.de> References: <154eb60a-1a97-8ddd-e6ee-fd3f6d72b997@email.de> Message-ID: On Sun, Aug 27, 2017 at 4:25 AM, francismb wrote: > Hi all, > while using `unittest` I see the pattern of creating an error message > with the test context for the case that some `assert...` methods fails > (to get a good error message). On the lines: > > class Test...(unittest.TestCase): > > longMessage = True > > def test_(self): > ... > for a, b, c ... in zip(A, B, C, ..): > * call the function under test and get the result > msg = "Some headline: {}{} ...".format(a, b, c,..) > self.assert...( ,msg) > > The `msg` is just used in case the assert fails but its creation takes > time and adds up. Have you measured it, eg by replacing the message with a constant? By what percentage does it speed up a successful test run? ChrisA From steve at pearwood.info Sat Aug 26 22:19:12 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 27 Aug 2017 12:19:12 +1000 Subject: [Python-ideas] Unittest error message failure context lazy creation In-Reply-To: <154eb60a-1a97-8ddd-e6ee-fd3f6d72b997@email.de> References: <154eb60a-1a97-8ddd-e6ee-fd3f6d72b997@email.de> Message-ID: <20170827021911.GC9671@ando.pearwood.info> On Sat, Aug 26, 2017 at 08:25:41PM +0200, francismb wrote: > Hi all, > while using `unittest` I see the pattern of creating an error message > with the test context for the case that some `assert...` methods fails > (to get a good error message). On the lines: [...] > The `msg` is just used in case the assert fails but its creation takes > time and adds up. > > What is the best practice/pattern you use here? I think the best practice here is: The Rules of Optimization are simple. Rule 1: Don?t do it. Rule 2 (for experts only): Don?t do it yet. -- Michael A. Jackson, "Principles of Program Design" Personally, I doubt that the time creating the error message will be anything more than an insignificant fraction of the total time. Perhaps as much as 0.1% of the total time? But I've just plucked that number out of thin air, so it's probably wrong. If you want to profile unittest and see just how much time is spent creating error messages for tests which pass, go right ahead and I'll be happy to be proven wrong. Until somebody actually profiles the tests, and demonstrates that delaying creating of the error messages has the potential to speed up unit testing by, oh, at least 5%, I'm sticking with "don't do it yet". -- Steve From francismb at email.de Sun Aug 27 08:28:09 2017 From: francismb at email.de (francismb) Date: Sun, 27 Aug 2017 14:28:09 +0200 Subject: [Python-ideas] Unittest error message failure context lazy creation In-Reply-To: <154eb60a-1a97-8ddd-e6ee-fd3f6d72b997@email.de> References: <154eb60a-1a97-8ddd-e6ee-fd3f6d72b997@email.de> Message-ID: Hi Chris, Hi Steven, > Have you measured it, eg by replacing the message with a constant? By > what percentage does it speed up a successful test run? How far is significant for `one` test? Please exchange the zip method against `itertools.product` and some message creation with concatenation (using ''.join() over some levels of testing statements (contexts). It just adds up. But I understand you question/claim. >> The Rules of Optimization are simple. Rule 1: Don?t do it. >> Rule 2 (for experts only): Don?t do it yet. >> -- Michael A. Jackson, "Principles of Program Design" I already follow that rules and measure first ;-) and yes may be I have had to formulate the question in a more general way: What is the current status quo for lazy evaluation in the language or the current ideas to avoid this type of cases? Why should one calculate something that is in not going to be needed? Is there a possibility to "mark somehow" a calculation + context to tell the interpreter: "not now" ? the only thing that comes in mind to me is to create some class that captures it and later on the `__str__` method does the calculation for that case, but of course it makes the situation just more complex. Thanks for your feedback! --francis From robert at shajil.de Tue Aug 29 04:38:58 2017 From: robert at shajil.de (Robert Schindler) Date: Tue, 29 Aug 2017 10:38:58 +0200 Subject: [Python-ideas] argparse.ArgumentParser: include arguments from files with relative paths Message-ID: <20170829083858.wcznqngks4dfbkzj@efficiosoft.com> Hello, I'm often using ArgumentParser in my projects, as well as its ability to read argument lists from files. However, the problem is that nested includes of such argument files have to specify paths relative to os.getcwd(), no matter where the file containing the include statement is located. Currently, this can be circumvented by always using absolute paths. But imho that is not a practical solution, due to the obvious portability issues it causes. I suggest adding a new parameter to argparse.ArgumentParser that controls the behaviour: * fromfile_parent_relative_ - Whether to treat paths of included argument files as relative to the location of the file they are specified in (``True``) or to the current working directory (``False``) (default: ``False``) Doing so would allow users to choose between the two different strategies while keeping backwards compatibility. I made a pull request [1] which adds the functionality + docs to demonstrate a possible solution. What do you think about this enhancement? Please note this is my first contribution to cpython. I now know that I should have presented it to python-ideas before starting a pull request. Sorry for doing it the wrong way around. Best regards Robert [1] https://github.com/python/cpython/pull/1698 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From blaine.w.rogers at gmail.com Tue Aug 29 04:58:51 2017 From: blaine.w.rogers at gmail.com (Blaine Rogers) Date: Tue, 29 Aug 2017 09:58:51 +0100 Subject: [Python-ideas] Signature Literals Message-ID: The current syntax for Callable types is unwieldy, particularly when extended to include varargs and keyword args as in http://mypy.readthedocs.io/en/latest/kinds_of_types.html# extended-callable-types. Why not introduce a signature literal? Proposed syntax: >>> from inspect import Signature, Parameter > >>> () -> > Signature() > >>> (arg0, arg1, arg2=None, arg3=None) -> > Signature( > [Parameter('arg0', Parameter.POSITIONAL_OR_KEYWORD), > Parameter('arg1', Parameter.POSITIONAL_OR_KEYWORD), > Parameter('arg2', Parameter.POSITIONAL_OR_KEYWORD, default=None), > Parameter('arg3', Parameter.POSITIONAL_OR_KEYWORD, default=None)], > return_annotation=str > ) > >>> (arg0, arg1: int, arg2=None, arg3: float=None) -> str > Signature( > [Parameter('arg0', Parameter.POSITIONAL_OR_KEYWORD), > Parameter('arg1', Parameter.POSITIONAL_OR_KEYWORD, annotation=int), > Parameter('arg2', Parameter.POSITIONAL_OR_KEYWORD, default=None), > Parameter('arg3', Parameter.POSITIONAL_OR_KEYWORD, annotation=float, > default=None)], > return_annotation=str > ) > >>> (:, :, :, arg1, *, arg2) -> > Signature( > [Parameter('', Parameter.POSITIONAL_ONLY), > Parameter('', Parameter.POSITIONAL_ONLY), > Parameter('', Parameter.POSITIONAL_ONLY), > Parameter('arg1', Parameter.POSITIONAL_OR_KEYWORD), > Parameter('arg2', Parameter.KEYWORD_ONLY)] > ) > >>> (:int, :float, *, keyword: complex) -> str > Signature( > [Parameter('', Parameter.POSITIONAL_ONLY, annotation=int), > Parameter('', Parameter.POSITIONAL_ONLY, annotation=float), > Parameter('keyword', Parameter.KEYWORD_ONLY, annotation=complex)], return_annotation=str ) Compare the above to their equivalents using Callable (and the experimental extension to Mypy): >>> Callable[[], Any] > >>> Callable[[Arg(Any, 'arg0'), Arg(int, 'arg1'), DefaultArg(Any, 'arg2'), > DefaultArg(float, 'kwarg3')], str] > >>> Callable[[Arg(), Arg(), Arg(), Arg(Any, 'arg1'), NamedArg(Any, > 'arg2')], Any] > >>> Callable[[int, float, NamedArg(complex, 'keyword')], Any] The proposed signature literal syntax is shorter, just as clear and imo nicer to read. Here is what it looks like in annotations: from typing import TypeVar, Callable > > A = TypeVar('A') > def apply_successor(func: Callable[[A], A], init: A, n_applications: int) > -> A: ... > def apply_successor(func: (:A) -> A, init: A, n_applications: int) -> A: > ... > > import tensorflow as tf > import numpy as np > > def run(policy: Callable[[np.ndarray, Arg(Dict[tf.Tensor, np.ndarray], > 'updated_feeds')], np.ndarray]) -> bool: ... > def run(policy: (:np.ndarray, updated_feeds: Dict[tf.Tensor, np.ndarray]) > -> np.ndarray) -> bool: ... > # If Mypy accepted literals for container types (dict, set, list, tuple, > etc) this would be nicer still > def run(policy: (:np.ndarray, updated_feeds: {tf.Tensor: np.ndarray}) -> > np.ndarray) -> bool: ... Initial thoughts: - () -> is ugly, but the -> would be necessary to distinguish it from the empty tuple (). Actually, it can be difficult to tell the difference between the proposed signature literals and tuples, especially for long signatures with no annotations or defaults. An alternative would be to prefix the arguments with an @ or other uncommon symbol (maybe &). () -> becomes @(), and it is clear from the start that you're reading a signature. - Supposing the syntax for function definitions was changed to match the proposed signature literals, one could make something like the following possible: >>> def add(:, :): ... arg0, arg1 = __call_signature__.args ... return arg0.value + arg1.value >>> add(1, 2) 3 >>> add('hello', 'world') 'helloworld' Where __call_signature__ is a magic name that evaluates to an inspect.BoundArguments instance representing the signature of the function call. I'm not sure why you'd want functions with positional-only arguments, but now you could have them. - You could further extend the function definition syntax to allow an expression that evaluates to a signature instead of a literal >>> signature = (:, :) -> >>> def add signature: ... arg0, arg1 = __call_signature__.args ... return arg0 + arg1 Again, not sure how useful this would be. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Aug 29 06:06:43 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Aug 2017 20:06:43 +1000 Subject: [Python-ideas] Signature Literals In-Reply-To: References: Message-ID: On 29 August 2017 at 18:58, Blaine Rogers wrote: > The current syntax for Callable types is unwieldy, particularly when > extended to include varargs and keyword args as in > http://mypy.readthedocs.io/en/latest/kinds_of_types.html#extended-callable-types. > Why not introduce a signature literal? While a more concise spelling for that is desirable, it doesn't need to be a literal, as it can be handled by updating the Callable item lookup to accept a string literal that static type checkers know how to parse. The standard library already contains a runtime introspection variant of this to pass function signature details from Argument Clinic up to inspect.Signature as __text_signature__ attributes: https://github.com/python/cpython/blob/master/Lib/inspect.py#L1938 The main reason "create signature object from text string" isn't a public API yet is because it includes support for positional-only and variable signatures that aren't supported by pure Python function definitions (while https://www.python.org/dev/peps/pep-0457/ covers the details of how that works, my recollection is that Guido was wary of accepting an approved syntax in the absence of actual syntactic support for them in function definitions) from inspect import Signature, _signature_fromstr def str_signature(sig): parsed_sig = _signature_fromstr(Signature, (lambda: None), sig) for param in parsed_sig.parameters.values(): print("{}: {}".format(param, param.kind.name)) >>> str_signature("(a, b, /, c, d)") a: POSITIONAL_ONLY b: POSITIONAL_ONLY c: POSITIONAL_OR_KEYWORD d: POSITIONAL_OR_KEYWORD Using PEP 457 syntax and a string literal, the MyPy extended callable syntax would look like: func(__a: int, # This convention is for nameless arguments b: int, c: int = 0, *args: int, d: int, e: int = 0, **kwargs: int) -> int: ... F = Callable[ """( a: int, /, b: int, c: int = 0, *args: int, d: int, e: int = 0, **kwargs: int )""", int] Ideally, the runtime implementation of that would *skip* parsing the signature, and instead just keep the string around for on-demand parsing. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From levkivskyi at gmail.com Wed Aug 30 07:38:55 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 30 Aug 2017 13:38:55 +0200 Subject: [Python-ideas] Signature Literals In-Reply-To: References: Message-ID: Hi Blaine, A similar idea has been discussed at the typing tracker, see https://github.com/python/typing/issues/239, but finally we went with the current syntax. It has several advantages such as: * It does not require a new syntax, i.e. can be backported to older Python versions * Possibility to define generic aliases without too much metaclass magic * Easier to search and ask questions on stackoverflow etc. In general, there is quite high bar to modify Python syntax, so that if there other options are available they will be preferred. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Wed Aug 30 08:31:05 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 30 Aug 2017 15:31:05 +0300 Subject: [Python-ideas] Signature Literals In-Reply-To: References: Message-ID: On Tue, Aug 29, 2017 at 1:06 PM, Nick Coghlan wrote: > On 29 August 2017 at 18:58, Blaine Rogers > wrote: > > The current syntax for Callable types is unwieldy, particularly when > > extended to include varargs and keyword args as in > > http://mypy.readthedocs.io/en/latest/kinds_of_types.html#ext > ended-callable-types. > > Why not introduce a signature literal? > > While a more concise spelling for that is desirable, it doesn't need > to be a literal, as it can be handled by updating the Callable item > lookup to accept a string literal that static type checkers know how > to parse. > > The standard library already contains a runtime introspection variant > of this to pass function signature details from Argument Clinic up to > inspect.Signature as __text_signature__ attributes: > https://github.com/python/cpython/blob/master/Lib/inspect.py#L1938 > > The main reason "create signature object from text string" isn't a > public API yet is because it includes support for positional-only and > variable signatures that aren't supported by pure Python function > definitions (while https://www.python.org/dev/peps/pep-0457/ covers > the details of how that works, my recollection is that Guido was wary > of accepting an approved syntax in the absence of actual syntactic > support for them in function definitions) > > ?How about: def func(a: int?, b: str) -> float: ... ? ;-) ??Koos ? > from inspect import Signature, _signature_fromstr > > def str_signature(sig): > parsed_sig = _signature_fromstr(Signature, (lambda: None), sig) > for param in parsed_sig.parameters.values(): print("{}: > {}".format(param, param.kind.name)) > > >>> str_signature("(a, b, /, c, d)") > a: POSITIONAL_ONLY > b: POSITIONAL_ONLY > c: POSITIONAL_OR_KEYWORD > d: POSITIONAL_OR_KEYWORD > > Using PEP 457 syntax and a string literal, the MyPy extended callable > syntax would look like: > > func(__a: int, # This convention is for nameless arguments > b: int, > c: int = 0, > *args: int, > d: int, > e: int = 0, > **kwargs: int) -> int: > ... > > F = Callable[ > """( > a: int, /, > b: int, c: int = 0, *args: int, > d: int, e: int = 0, **kwargs: int > )""", > int] > > Ideally, the runtime implementation of that would *skip* parsing the > signature, and instead just keep the string around for on-demand > parsing. > > Cheers, > Nick. > > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: