From flying-sheep at web.de Tue Sep 1 10:00:21 2015 From: flying-sheep at web.de (Philipp A.) Date: Tue, 01 Sep 2015 08:00:21 +0000 Subject: [Python-ideas] Add appdirs module to stdlib Message-ID: When defining a place for config files, cache files, and so on, people usually hack around in a OS-dependent, misinformed, and therefore wrong way. Thanks to the tempfile API we at least don?t see people hardcoding /tmp/ too much. There is a beautiful little module that does things right and is easy to use: appdirs TI think this is a *really* good candidate for the stdlib since this functionality is useful for everything that needs a cache or config (so not only GUI and CLI applications, but also scripts that download and cache stuff from the internet for faster re-running) probably we should build the API around pathlib, since i found myself not touching os.path with a barge pole since pathlib exists. i?ll write a PEP about this soon :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Sep 1 10:56:17 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 1 Sep 2015 18:56:17 +1000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On 1 September 2015 at 18:00, Philipp A. wrote: > When defining a place for config files, cache files, and so on, people > usually hack around in a OS-dependent, misinformed, and therefore wrong way. > > Thanks to the tempfile API we at least don?t see people hardcoding /tmp/ too > much. > > There is a beautiful little module that does things right and is easy to > use: appdirs > > TI think this is a *really* good candidate for the stdlib since this > functionality is useful for everything that needs a cache or config (so not > only GUI and CLI applications, but also scripts that download and cache > stuff from the internet for faster re-running) > > probably we should build the API around pathlib, since i found myself not > touching os.path with a barge pole since pathlib exists. > > i?ll write a PEP about this soon :) This sounds like a reasonable idea to me, and we can point folks to the original appdirs if they need a version-independent alternative. Depending on the amount of code involved, we could potentially consider providing this as an API *in* pathlib, rather than needing an entire new module for the standard library version of it. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From bussonniermatthias at gmail.com Tue Sep 1 11:09:26 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Tue, 1 Sep 2015 11:09:26 +0200 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: <25F8FDAA-ACEF-48BA-A8D9-DC0FCDD2F197@gmail.com> > > This sounds like a reasonable idea to me, and we can point folks to > the original appdirs if they need a version-independent alternative. > > Depending on the amount of code involved, we could potentially > consider providing this as an API *in* pathlib, rather than needing an > entire new module for the standard library version of it. > > Regards, > Nick. +1, If this get into python, it would be nice to have a `python -m ` that return to user the config dirs. One of the most challenging issues we have with user is ?where is my config/cache/...? and it?s always hard start the response by ?It depends of...?. The ?run this command to know? works better. -- M > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Tue Sep 1 11:26:50 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 1 Sep 2015 10:26:50 +0100 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <25F8FDAA-ACEF-48BA-A8D9-DC0FCDD2F197@gmail.com> References: <25F8FDAA-ACEF-48BA-A8D9-DC0FCDD2F197@gmail.com> Message-ID: On 1 September 2015 at 10:09, Matthias Bussonnier wrote: >> This sounds like a reasonable idea to me, and we can point folks to >> the original appdirs if they need a version-independent alternative. >> >> Depending on the amount of code involved, we could potentially >> consider providing this as an API *in* pathlib, rather than needing an >> entire new module for the standard library version of it. >> >> Regards, >> Nick. > > +1, > > If this get into python, it would be nice to have a `python -m ` that return to user the > config dirs. One of the most challenging issues we have with user is ?where is my config/cache/...? > and it?s always hard start the response by ?It depends of...?. The ?run this command to know? works better. +1 to all of the above. Paul From rosuav at gmail.com Tue Sep 1 11:29:59 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 1 Sep 2015 19:29:59 +1000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On Tue, Sep 1, 2015 at 6:00 PM, Philipp A. wrote: > When defining a place for config files, cache files, and so on, people > usually hack around in a OS-dependent, misinformed, and therefore wrong way. > > There is a beautiful little module that does things right and is easy to > use: appdirs > > TI think this is a *really* good candidate for the stdlib... Who maintains appdirs? Is s/he willing to maintain it on the stdlib's release schedule? If so, I'd be +1 on this; Python has a strong precedent for papering over OS differences and providing a consistent platform. ChrisA From abarnert at yahoo.com Tue Sep 1 11:42:50 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 1 Sep 2015 02:42:50 -0700 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On Sep 1, 2015, at 01:00, Philipp A. wrote: > > When defining a place for config files, cache files, and so on, people usually hack around in a OS-dependent, misinformed, and therefore wrong way. > > Thanks to the tempfile API we at least don?t see people hardcoding /tmp/ too much. > > There is a beautiful little module that does things right and is easy to use: appdirs Is appdirs compatible with the OS X recommendations (as required by the App Store). Apple only gives you cache and app data directories; prefs are supposed to use NSDefaults API or emulate the file names and formats properly, and you have to be sensitive to the sandbox.) If so, definitely +1, because that's a pain to do with anything but Qt (or of course PyObjC). If not, -0.5, because making it easier to do it wrong is probably not beneficial, even if that's what many *nix apps end up writing a lot of code to get wrong on Mac... -------------- next part -------------- An HTML attachment was scrubbed... URL: From ian.team.python at gmail.com Tue Sep 1 12:24:15 2015 From: ian.team.python at gmail.com (Ian) Date: Tue, 1 Sep 2015 20:24:15 +1000 Subject: [Python-ideas] ideas for type hints for variable: beyond comments Message-ID: <55E57CCF.9030909@gmail.com> mypy currently inspects the comment on the line of first assignment for the variables to be type hinted. It is logical that at some time python language will add support to allow these type hints to move from comments to the code as has happened for 'def' signatures. One logical syntax would be to move from i = 1 # Infer type int for i to i:int = 1 # no comment needed, but does not look attractive The first question that arises is 'is the type inference legal for the additional uses. Having a 'second use' flagged by warning or error by either an external typechecker or even the language itself could pick up on accidental reuse of a name, but in practice accidentally creating a new variable through a typo can be more common. In python today the first use is the same as every other, so this change just does not feel comfortable. The other question is 'what about globals and nonlocals?'. Currently globals and nonlocals need a 'global' or 'nonlocal' statement to allow assignment, but what if these values are not assigned in scope? What if we allowed global i:int or nonlocal i:int and even local i:int Permitting a new keyword 'local' to me might bring far more symmetry between different cases. It would also allow type hinting to be collected near the function definition and keep the type hinting clear of the main code. Use of the 'local' keyword in the global namespace could indicate a value not accessible in other namespaces. Personally I would like to go even further and allow some syntax to allow (or disable) flagging the use of new variables without type hinting as possible typos I have a syntax in mind, but the idea is the discussion point, not the specific syntax. Possibly what is here already is too much of a change of direction to consider for ideas already in progress? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ian.team.python at gmail.com Tue Sep 1 06:03:33 2015 From: ian.team.python at gmail.com (ia n) Date: Mon, 31 Aug 2015 21:03:33 -0700 (PDT) Subject: [Python-ideas] ideas for type hints for variable: beyond comments Message-ID: <77cf2ecb-cbf8-4e80-a5d7-022a27e9a8a7@googlegroups.com> mypy currently inspects the comment on the line of first assignment for the variables to be type hinted. It is logical that at some time python language will add support to allow these type hints to move from comments to the code as has happened for 'def' signatures. One logical syntax would be to move from i = 1 # Infer type int for i to i:int = 1 # no comment needed, but does not look attractive The first question that arises is 'is the type inference legal for the additional uses. Having a 'second use' flagged by warning or error by either an external typechecker or even the language itself could pick up on accidental reuse of a name, but in practice accidentally creating a new variable through a typo can be more common. In python today the first use is the same as every other, so this change just does not feel comfortable. The other question is 'what about globals and nonlocals?'. Currently globals and nonlocals need a 'global' or 'nonlocal' statement to allow assignment, but what if these values are not assigned in scope? What if we allowed global i:int or nonlocal i:int and even local i:int Permitting a new keyword 'local' to me might bring far more symmetry between different cases. It would also allow type hinting to be collected near the function definition and keep the type hinting clear of the main code. Use of the 'local' keyword in the global namespace could indicate a value not accessible in other namespaces. Personally I would like to go even further and allow some syntax to allow (or disable) flagging the use of new variables without type hinting as possible typos I have a syntax in mind, but the idea is the discussion point, not the specific syntax. Possibly what is here already is too much of a change of direction to consider for ideas already in progress? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Sep 1 13:19:29 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 1 Sep 2015 21:19:29 +1000 Subject: [Python-ideas] ideas for type hints for variable: beyond comments In-Reply-To: <55E57CCF.9030909@gmail.com> References: <55E57CCF.9030909@gmail.com> Message-ID: On Tue, Sep 1, 2015 at 8:24 PM, Ian wrote: > mypy currently inspects the comment on the line of first assignment for the > variables to be type hinted. > > It is logical that at some time python language will add support to allow > these type hints to move from comments to the code as has happened for 'def' > signatures. Potential problem: Function annotations are supported all the way back to Python 3.0, but any new syntax would be 3.6+ only. That's going to severely limit its value for quite some time. That doesn't mean new syntax can't be added (otherwise none ever would), but the bar is that much higher - you'll need an extremely compelling justification. > The other question is 'what about globals and nonlocals?'. Currently > globals and nonlocals need a 'global' or 'nonlocal' statement to allow > assignment, but what if these values are not assigned in scope? Not sure what you're talking about here. If they're not assigned in this scope, then presumably they have the same value they had from some other scope. You shouldn't need to declare that "len" is a function, inside every function that calls it. Any type hints should go where it's assigned, and nowhere else. > What if we allowed > global i:int > > or > > nonlocal i:int > > and even > > local i:int > > Permitting a new keyword 'local' to me might bring far more symmetry between > different cases. Hey, if you want C, you know where to find it :) > Use of the 'local' keyword in the global namespace could indicate a value > not accessible in other namespaces. I'm not sure what "not accessible" would mean. If someone imports your module, s/he gains access to all your globals. Do you mean that it's "not intended for external access" (normally notated with a single leading underscore)? Or is this a new feature - some way of preventing other modules from using these? That might be useful, but that's a completely separate proposal. > Personally I would like to go even further and allow some syntax to allow > (or disable) flagging the use of new variables without type hinting as > possible typos If you're serious about wanting all your variables to be declared, then I think you want a language other than Python. There are such languages around (and maybe even compiling to Python byte-code, I'm not sure), but Python isn't built that way. Type hinting is NOT variable declaration, and never will be. (Though that's famous last words, I know, and I'm not the BDFL or even anywhere close to that. If someone pulls up this email in ten years and laughs in my face, so be it. It'd not be the first time I've been utterly confidently wrong!) ChrisA From steve at pearwood.info Tue Sep 1 15:03:54 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 1 Sep 2015 23:03:54 +1000 Subject: [Python-ideas] ideas for type hints for variable: beyond comments In-Reply-To: References: <55E57CCF.9030909@gmail.com> Message-ID: <20150901130354.GE19373@ando.pearwood.info> On Tue, Sep 01, 2015 at 09:19:29PM +1000, Chris Angelico wrote: > On Tue, Sep 1, 2015 at 8:24 PM, Ian wrote: > > > > mypy currently inspects the comment on the line of first assignment for the > > variables to be type hinted. > > > > It is logical that at some time python language will add support to allow > > these type hints to move from comments to the code as has happened for 'def' > > signatures. > > Potential problem: Function annotations are supported all the way back > to Python 3.0, but any new syntax would be 3.6+ only. That's going to > severely limit its value for quite some time. That doesn't mean new > syntax can't be added (otherwise none ever would), but the bar is that > much higher - you'll need an extremely compelling justification. PEP 484 says: "No first-class syntax support for explicitly marking variables as being of a specific type is added by this PEP. To help with type inference in complex cases, a comment of the following format may be used: ..." https://www.python.org/dev/peps/pep-0484/ I recall that in the discussions prior to the PEP, I got the strong impression that Guido was open to the concept of annotating variables in principle, but didn't think it was very important (for the most part, the type checker should be able to infer the variable type), and he didn't want to delay the PEP for the sake of agreement on a variable declaration syntax when a simple comment will do the job. So in principle, if we agree that type declarations for variables should look like (let's say) `str s = some_function(arg)` then the syntax may be added in the future, but it's a low priority. > > The other question is 'what about globals and nonlocals?'. Currently > > globals and nonlocals need a 'global' or 'nonlocal' statement to allow > > assignment, but what if these values are not assigned in scope? > > Not sure what you're talking about here. If they're not assigned in > this scope, then presumably they have the same value they had from > some other scope. But they will be assigned in the scope, otherwise there's no need to declare them global. def spam(*args): global eggs eggs = len(args) process(something, eggs) That's a case where the type-checker should be able to infer that eggs will be an int. But what if the type inference engine cannot work that out? The developer may choose to add a hint. eggs = len(args) # type:int will work according to PEP 484 (although, I guess that's a quality of implementation issue for the actual type checker). Or we could steal syntax from some other language and make it "official" that type checkers have to look at this: eggs:int # (Pascal, Swift, Ada, F#, Scala) int eggs # (Java, C, Perl6) eggs int # (Go) eggs as int # (RealBasic) Hence, for example: global eggs:int cheese:int, ham:str = 23, "foo" A big question would be, what runtime effect (if any) would this have? If the default Python compiler ignored the type hint at both compile-time and run-time, it would be hard to justify making it syntax. But perhaps the current namespace could get a magic variable __annotations__ = {name: hint} similar to the __annotations__ attribute of functions. Again, the default compiler would simply record the annotation and ignore it, the same as for functions, leaving any actual type-checking to third-party tools. [...] > > Use of the 'local' keyword in the global namespace could indicate a value > > not accessible in other namespaces. That won't work without a *major* change to Python's design. Currently, module namespaces are regular dicts, and there is no way to prevent others from looking up names in that dict/namespace. If you (Ian) want to change that, you should raise it as a completely separate PEP. -- Steve From rosuav at gmail.com Tue Sep 1 15:13:51 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 1 Sep 2015 23:13:51 +1000 Subject: [Python-ideas] ideas for type hints for variable: beyond comments In-Reply-To: <20150901130354.GE19373@ando.pearwood.info> References: <55E57CCF.9030909@gmail.com> <20150901130354.GE19373@ando.pearwood.info> Message-ID: On Tue, Sep 1, 2015 at 11:03 PM, Steven D'Aprano wrote: > On Tue, Sep 01, 2015 at 09:19:29PM +1000, Chris Angelico wrote: > >> On Tue, Sep 1, 2015 at 8:24 PM, Ian wrote: >> > >> > mypy currently inspects the comment on the line of first assignment for the >> > variables to be type hinted. >> > >> > It is logical that at some time python language will add support to allow >> > these type hints to move from comments to the code as has happened for 'def' >> > signatures. >> >> Potential problem: Function annotations are supported all the way back >> to Python 3.0, but any new syntax would be 3.6+ only. That's going to >> severely limit its value for quite some time. That doesn't mean new >> syntax can't be added (otherwise none ever would), but the bar is that >> much higher - you'll need an extremely compelling justification. > > PEP 484 says: > > "No first-class syntax support for explicitly marking variables as being > of a specific type is added by this PEP. To help with type inference in > complex cases, a comment of the following format may be used: ..." > > https://www.python.org/dev/peps/pep-0484/ > > I recall that in the discussions prior to the PEP, I got the strong > impression that Guido was open to the concept of annotating variables in > principle, but didn't think it was very important (for the most part, > the type checker should be able to infer the variable type), and he > didn't want to delay the PEP for the sake of agreement on a variable > declaration syntax when a simple comment will do the job. > > So in principle, if we agree that type declarations for variables should > look like (let's say) `str s = some_function(arg)` then the syntax may > be added in the future, but it's a low priority. Right, it's low priority, and a non-backward-compatible one. Backporting typing.py to any 3.x Python will make all the annotations "succeed" (given that success, at run time, doesn't require any sort of actual checking); it's not possible to backport a syntax change. It's like using 'yield from' for coroutines - it instantly stops you from running on anything older than 3.3. Maybe that'll be worthwhile, but the complaint that "comments are ugly" isn't enough justification IMO. If there were some serious run-time value for these annotations, then I could see more reason for adding them. At the moment, though, I'm distinctly -1. ChrisA From ian.team.python at gmail.com Tue Sep 1 17:01:29 2015 From: ian.team.python at gmail.com (Ian) Date: Wed, 2 Sep 2015 01:01:29 +1000 Subject: [Python-ideas] ideas for type hints for variable: beyond comments Message-ID: <55E5BDC9.7020807@gmail.com> Chris Angelico wrote: " > It is logical that at some time python language will add support to allow > these type hints to move from comments to the code as has happened for 'def' > signatures. Potential problem: Function annotations are supported all the way back to Python 3.0, but any new syntax would be 3.6+ only. That's going to severely limit its value for quite some time. That doesn't mean new syntax can't be added (otherwise none ever would), but the bar is that much higher - you'll need an extremely compelling justification. " My intent must not have been clear. I am not suggesting changing function annotations. I think function annotations as they are represent an addition that has been well thought out and is a very useful step which extends what is possible in a very useful way. Pep 484 as introduced in 3.5 allows this to be taken further. I am suggesting building on PEP 484 with a complimentary extensions for variables. If extensions are made in the same manner as function annotations, then the actual python code simply has hints added. Generating warnings or other steps is the domain of separate off-line checkers. I feel it is clear that at some time an extension to allow type-hints for variables, complimenting current function annotations will be added. I am just providing food for thought on how annotations for variables can be added. > The other question is 'what about globals and nonlocals?'. Currently > globals and nonlocals need a 'global' or 'nonlocal' statement to allow > assignment, but what if these values are not assigned in scope? "Not sure what you're talking about here. If they're not assigned in this scope, then presumably they have the same value they had from some other scope. You shouldn't need to declare that "len" is a function, inside every function that calls it. Any type hints should go where it's assigned, and nowhere else. " These are hints. Not a 'need'. The type hints may be desired in the code referencing the 'globals' or 'nonlocals', but not desired in the original context. The idea is to allow this, NOT to require or need a declaration. Hope this helps clarify what I am trying to suggest. > What if we allowed > global i:int > > or > > nonlocal i:int > > and even > > local i:int > > Permitting a new keyword 'local' to me might bring far more symmetry between > different cases. Hey, if you want C, you know where to find it :) > Use of the 'local' keyword in the global namespace could indicate a value > not accessible in other namespaces. "I'm not sure what "not accessible" would mean. If someone imports your module, s/he gains access to all your globals. Do you mean that it's "not intended for external access" (normally notated with a single leading underscore)? Or is this a new feature - some way of preventing other modules from using these? That might be useful, but that's a completely separate proposal. " Good point, the single _already does what I was thinking. I never think of using it in this specific case. I tend to associate it with hinting that an identifier is for internal use within a class. Not to get warnings about use from outside the global namespace of globals. > Personally I would like to go even further and allow some syntax to allow > (or disable) flagging the use of new variables without type hinting as > possible typos "If you're serious about wanting all your variables to be declared, then I think you want a language other than Python. There are such languages around (and maybe even compiling to Python byte-code, I'm not sure), but Python isn't built that way. Type hinting is NOT variable declaration, and never will be. (Though that's famous last words, I know, and I'm not the BDFL or even anywhere close to that. If someone pulls up this email in ten years and laughs in my face, so be it. It'd not be the first time I've been utterly confidently wrong!) " No, I am not serious about wanting all variables to be declared under normal circumstances. Again, as you say, this is about hinting and getting warnings. I think there are circumstances where hinting may be sufficiently useful that getting a warning for a missing hint would be desirable. This is not a suggestion to change python. But a suggestion to allow for specific situations without a change of how things normally happen. Thank you for taking the time to comment. It is appreciated and I hope I have been able to use your feedback to clarify. From gokoproject at gmail.com Tue Sep 1 18:19:21 2015 From: gokoproject at gmail.com (John Wong) Date: Tue, 1 Sep 2015 12:19:21 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: But is appdirs only useful if you are running something that's more toward system package / desktop application? A lot of projects today create their own directory to save data, many use $HOME/DOTCUSTOM_DIR. So the use case of appdirs should be addressed. On Tue, Sep 1, 2015 at 5:42 AM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > On Sep 1, 2015, at 01:00, Philipp A. wrote: > > When defining a place for config files, cache files, and so on, people > usually hack around in a OS-dependent, misinformed, and therefore wrong way. > > Thanks to the tempfile API we at least don?t see people hardcoding /tmp/ > too much. > > There is a beautiful little module that does things right and is easy to > use: appdirs > > > Is appdirs compatible with the OS X recommendations (as required by the > App Store). Apple only gives you cache and app data directories; prefs are > supposed to use NSDefaults API or emulate the file names and formats > properly, and you have to be sensitive to the sandbox.) > > If so, definitely +1, because that's a pain to do with anything but Qt (or > of course PyObjC). If not, -0.5, because making it easier to do it wrong is > probably not beneficial, even if that's what many *nix apps end up writing > a lot of code to get wrong on Mac... > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Sep 1 18:58:49 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Sep 2015 02:58:49 +1000 Subject: [Python-ideas] ideas for type hints for variable: beyond comments In-Reply-To: <55E5BDC9.7020807@gmail.com> References: <55E5BDC9.7020807@gmail.com> Message-ID: On Wed, Sep 2, 2015 at 1:01 AM, Ian wrote: > Chris Angelico wrote: > Potential problem: Function annotations are supported all the way back > to Python 3.0, but any new syntax would be 3.6+ only. That's going to > severely limit its value for quite some time. That doesn't mean new > syntax can't be added (otherwise none ever would), but the bar is that > much higher - you'll need an extremely compelling justification. " > > My intent must not have been clear. I am not suggesting changing function > annotations. I think function annotations as they are represent an addition > that has been well thought out and is a very useful step which extends what > is possible in a very useful way. Pep 484 as introduced in 3.5 allows this > to be taken further. I understand that, but the difference here is that PEP 484 adds meaning to something that's already been syntactically valid. If you pull up a Python 3.1 and run this code, it will work: def do_nothing() -> None: pass The special names List and Optional and so on are not available by default, but they're imported from typing.py anyway; it's easy enough to make sure that typing.py works on older Pythons (maybe as a pypi dependency). In contrast, you're suggesting completely new syntax. That means that any program that uses them will simply *fail to run* on any Python older than their introduction (same as those using function annotations can't run on Python 2). As a general rule, the bar for new syntax is a lot higher than the bar for a new function, module, etc, that can be implemented with existing syntax. It's certainly possible; you just need to convince everyone that it's worth adding syntax for. >> The other question is 'what about globals and nonlocals?'. Currently >> globals and nonlocals need a 'global' or 'nonlocal' statement to allow >> assignment, but what if these values are not assigned in scope? > > "Not sure what you're talking about here. If they're not assigned in > this scope, then presumably they have the same value they had from > some other scope. You shouldn't need to declare that "len" is a > function, inside every function that calls it. Any type hints should > go where it's assigned, and nowhere else. " > > These are hints. Not a 'need'. The type hints may be desired in the code > referencing the 'globals' or 'nonlocals', > but not desired in the original context. The idea is to allow this, NOT to > require or need a declaration. Okay, I think I understand you here. It's for cases like this: # big_module.py _cache = {} # way further down def function_with_annotations(thing: str) -> str if thing not in _cache: _cache[thing] = frobnicate(thing) return _cache[thing] Inside this brand new function, you want to tell the type hinter that _cache is a dict, even though you don't declare it, don't assign to it, or anything like that. That's reasonable, but it isn't all that common a use case; generally, if you're adding code to a module somewhere, you can edit other places in the module to add those type hints, or else you can simply forego the type hint for that one thing. > Hope this helps clarify what I am trying to suggest. Yes, thank you. I think I get what you're saying there. >> Use of the 'local' keyword in the global namespace could indicate a value >> not accessible in other namespaces. > > "I'm not sure what "not accessible" would mean. If someone imports your > module, s/he gains access to all your globals. Do you mean that it's > "not intended for external access" (normally notated with a single > leading underscore)? Or is this a new feature - some way of preventing > other modules from using these? That might be useful, but that's a > completely separate proposal. " > > Good point, the single _already does what I was thinking. I never think of > using it in this specific case. I tend to associate it with hinting that an > identifier is for internal use within a class. Not to get warnings about > use from outside the global namespace of globals. Yeah, it comes to the same thing though. I'm not sure if any linters would pick up on "module._identifier" usages, but code reviewers certainly could. >> Personally I would like to go even further and allow some syntax to allow >> (or disable) flagging the use of new variables without type hinting as >> possible typos > > "If you're serious about wanting all your variables to be declared, > then I think you want a language other than Python. There are such > languages around (and maybe even compiling to Python byte-code, I'm > not sure), but Python isn't built that way. Type hinting is NOT > variable declaration, and never will be. (Though that's famous last > words, I know, and I'm not the BDFL or even anywhere close to that. If > someone pulls up this email in ten years and laughs in my face, so be > it. It'd not be the first time I've been utterly confidently wrong!) " > > No, I am not serious about wanting all variables to be declared under normal > circumstances. Even under abnormal circumstances, requiring all variables to be declared would not be Python's way. There are plenty of ways of handling the global vs local problem. PHP says "declare all your globals, apart from functions and magic stuff the compiler gives you for free"; C says "declare all your locals, anything undeclared will be searched for in progressively larger scopes - everything has to be declared somewhere"; Python says "declare all the globals that you assign to, anything else assigned to is local, and anything not assigned to is searched for at run time". I don't know of *any* language that says "declare everything", and it certainly wouldn't be Python. Even "declare everything you assign to" would be unnecessary overhead. Still -1 on this proposal. ChrisA From storchaka at gmail.com Tue Sep 1 19:55:29 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 1 Sep 2015 20:55:29 +0300 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On 01.09.15 11:00, Philipp A. wrote: > When defining a place for config files, cache files, and so on, people > usually hack around in a OS-dependent, misinformed, and therefore wrong way. > > Thanks to the tempfile API we at least don?t see people hardcoding /tmp/ > too much. > > There is a beautiful little module that does things right and is easy to > use: appdirs > > TI think this is a *really* good candidate for the stdlib since this > functionality is useful for everything that needs a cache or config (so > not only GUI and CLI applications, but also scripts that download and > cache stuff from the internet for faster re-running) > > probably we should build the API around pathlib, since i found myself > not touching os.path with a barge pole since pathlib exists. > > i?ll write a PEP about this soon :) site_data_dir() returns a string. It contains multiple paths separated with path delimiter if multipath=True. I think that a function that returns a list of paths, including user dir, would be more helpful and Pythonic. See also PyXDG (http://www.freedesktop.org/wiki/Software/pyxdg/). From p.f.moore at gmail.com Tue Sep 1 23:04:41 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 1 Sep 2015 22:04:41 +0100 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On 1 September 2015 at 17:19, John Wong wrote: > But is appdirs only useful if you are running something that's more toward > system package / desktop application? A lot of projects today create their > own directory to save data, many use $HOME/DOTCUSTOM_DIR. So the use case of > appdirs should be addressed. But that is not appropriate on Windows. Appdirs gives the above on Unix, but %APPDATA%\Appname on Windows, which conforms properly to platform standards. Paul From donald at stufft.io Tue Sep 1 23:12:28 2015 From: donald at stufft.io (Donald Stufft) Date: Tue, 1 Sep 2015 17:12:28 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On September 1, 2015 at 5:05:14 PM, Paul Moore (p.f.moore at gmail.com) wrote: > On 1 September 2015 at 17:19, John Wong wrote: > > But is appdirs only useful if you are running something that's more toward > > system package / desktop application? A lot of projects today create their > > own directory to save data, many use $HOME/DOTCUSTOM_DIR. So the use case of > > appdirs should be addressed. > > But that is not appropriate on Windows. Appdirs gives the above on > Unix, but %APPDATA%\Appname on Windows, which conforms properly to > platform standards. > > I forget why, but we forked appdirs when we added it to pip because of something about how it treated Windows think. Appdirs also is opinionated in situations that there isn?t a platform standard so we?d want to make sure that we agree with those opinions on all platforms. I?m +1 on it though. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From p.f.moore at gmail.com Tue Sep 1 23:15:18 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 1 Sep 2015 22:15:18 +0100 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On 1 September 2015 at 22:12, Donald Stufft wrote: > I forget why, but we forked appdirs when we added it to pip because of something about how it treated Windows think. Appdirs also is opinionated in situations that there isn?t a platform standard so we?d want to make sure that we agree with those opinions on all platforms. Certainly. I think the key point here is "let's have something in the stdlib that makes deciding where your app stores its files work correctly by default". Paul From njs at pobox.com Tue Sep 1 23:22:23 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 1 Sep 2015 14:22:23 -0700 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On Tue, Sep 1, 2015 at 2:42 AM, Andrew Barnert via Python-ideas wrote: > On Sep 1, 2015, at 01:00, Philipp A. wrote: > > When defining a place for config files, cache files, and so on, people > usually hack around in a OS-dependent, misinformed, and therefore wrong way. > > Thanks to the tempfile API we at least don?t see people hardcoding /tmp/ too > much. > > There is a beautiful little module that does things right and is easy to > use: appdirs > > > Is appdirs compatible with the OS X recommendations (as required by the App > Store). Apple only gives you cache and app data directories; prefs are > supposed to use NSDefaults API or emulate the file names and formats > properly, and you have to be sensitive to the sandbox.) No, AFAICT it doesn't get this right -- it just hard-codes the OS X directories. It also didn't quite implement the XDG spec correctly (there's some fallback behavior you're supposed to do if the magic envvars don't make sense that it skips -- very unusual that this will matter). And windows I'm not sure about -- the logic in appdirs looked reasonable to me when I was reviewing this a few months ago, but there seem to be a bunch of semi-contradictory standards and so it's hard to know what's even "correct" in the tricky cases. All of this is probably as much an argument *for* providing the correct functionality as a standard thing as it is against, but any PEP here probably needs to be thorough about citing the research to show that it's actually getting the various platform standards correct. What makes it particularly difficult is that if you "fix a bug" in a library like appdirs, so that it starts suddenly returning different results on some computer somewhere, then what it looks like to the end user is that their data/settings/whatever have suddenly evaporated and whatever disk space was being used for caches never gets cleaned up and so forth. Generally when applications change how they compute these directories, they also include tricky migration logic to check both the old and new names, move stuff over if needed, but I'm not sure how a low-level library like this can support that usefully... -n -- Nathaniel J. Smith -- http://vorpus.org From yselivanov.ml at gmail.com Tue Sep 1 23:25:26 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 1 Sep 2015 17:25:26 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: <55E617C6.9000708@gmail.com> On 2015-09-01 5:22 PM, Nathaniel Smith wrote: [..] > All of this is probably as much an argument*for* providing the correct > functionality as a standard thing as it is against, but any PEP here > probably needs to be thorough about citing the research to show that > it's actually getting the various platform standards correct. > > What makes it particularly difficult is that if you "fix a bug" in a > library like appdirs, so that it starts suddenly returning different > results on some computer somewhere, then what it looks like to the end > user is that their data/settings/whatever have suddenly evaporated and > whatever disk space was being used for caches never gets cleaned up > and so forth. Generally when applications change how they compute > these directories, they also include tricky migration logic to check > both the old and new names, move stuff over if needed, but I'm not > sure how a low-level library like this can support that usefully... +1 on all points. We really need a PEP for this kind of functionality in the standard library. Yury From abarnert at yahoo.com Tue Sep 1 23:47:22 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 1 Sep 2015 14:47:22 -0700 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Responses below, but first, another issue: Things like app data and prefs aren't a single directory. XDG has a notion of a search path rather than a single directory; Windows has a notion of separate search domains; OS X makes things as fun as possible by having both. For writing a new file, you're usually fine just writing to the first path in the default domain (as long as it exists or you can create it), but for reading files you're supposed to look in /etc or All Users or whatever if it's not found there. Most cross-platform wrappers I've used in the past didn't deal with this automatically, and a lot of them didn't even make it easy to do manually. On Sep 1, 2015, at 14:19, Nathaniel Smith wrote: > > On Sep 1, 2015 02:45, "Andrew Barnert via Python-ideas" > wrote: >> >>> On Sep 1, 2015, at 01:00, Philipp A. wrote: >>> >>> When defining a place for config files, cache files, and so on, people usually hack around in a OS-dependent, misinformed, and therefore wrong way. >>> >>> Thanks to the tempfile API we at least don?t see people hardcoding /tmp/ too much. >>> >>> There is a beautiful little module that does things right and is easy to use: appdirs >> >> >> Is appdirs compatible with the OS X recommendations (as required by the App Store). Apple only gives you cache and app data directories; prefs are supposed to use NSDefaults API or emulate the file names and formats properly, and you have to be sensitive to the sandbox.) > > No, AFAICT it doesn't get this right -- it just hard-codes the OS X > directories. The biggest problem with most of the cross-platform libraries I've seen is that they assume there is a prefs directory, and on OS X, that really isn't true. If your app explicitly opens the exact same file that NSDefaults would have opened, you're breaking the rules. (Since the mandatory sandbox went into effect, this usually doesn't get you rejected from the App Store anymore, but before that it did.) And taking a quick look at appdirs, it has a user_config_dir that seems to mean exactly that. So, how can a stdlib library handle that? Meanwhile, it looks like appdirs expects you to give it an app name and company name to construct the paths. What happens if you give it names that don't match the ones in your bundle? You're opening files that belong to another app. Which is again violating Apple's rules. One more thing: I don't know if it's guaranteed that the right way of doing things on OS X (whether via Cocoa or CoreFoundation) won't spawn a background thread for you. After all, the APIs can talk to the sandbox service and sometimes even the iCloud service. Is that a problem for something in the stdlib? > It also didn't quite implement the XDG spec correctly > (there's some fallback behavior you're supposed to do if the magic > envvars don't make sense that it skips -- very unusual that this will > matter). And windows I'm not sure about -- the logic in appdirs looked > reasonable to me when I was reviewing this a few months ago, but there > seem to be a bunch of semi-contradictory standards and so it's hard to > know what's even "correct" in the tricky cases. > > All of this is probably as much an argument *for* providing the > functionality as a standard thing as it is against, > but any PEP here > probably needs to be thorough about citing the research to show that > it's actually getting the various platform standards correct. > > What makes it particularly difficult is that if you "fix a bug" in a > library like appdirs, so that it starts suddenly returning different > results on some computer somewhere, then what it looks like to the end > user is that their data/settings/whatever have suddenly evaporated and > whatever disk space was being used for caches never gets cleaned up > and so forth. Generally when applications change how they compute > these directories, they also include tricky migration logic to check > both the old and new names, move stuff over if needed, but I'm not > sure how a low-level library like this can support that usefully... And that's especially true in the case of Apple's standards, which also include specific rules about how you're supposed to do such a migration, and doing so requires stuff that can't be done from entirely inside the code. From p.f.moore at gmail.com Wed Sep 2 01:05:20 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 2 Sep 2015 00:05:20 +0100 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: On 1 September 2015 at 22:47, Andrew Barnert via Python-ideas wrote: > Things like app data and prefs aren't a single directory. XDG has a notion of a search path rather than a single directory; Windows has a notion of separate search domains; OS X makes things as fun as possible by having both. For writing a new file, you're usually fine just writing to the first path in the default domain (as long as it exists or you can create it), but for reading files you're supposed to look in /etc or All Users or whatever if it's not found there. Most cross-platform wrappers I've used in the past didn't deal with this automatically, and a lot of them didn't even make it easy to do manually. This is a fair point. But it's also worth noting that the current state of affairs for many apps is to just bung stuff in ~/whatever. While appdirs may not get things totally right, at least it improves things. And if it (or something similar) were in the stdlib, it would at least provide a level of uniformity. So, in my view: 1. We should have something that provides the functionality of appdirs in the stdlib. 2. It probably needs a PEP to get the corner cases right. 3. The behaviour of appdirs is a good baseline default - even if it isn't 100% compliant with platform standards it'll be better than what someone unfamiliar with the platform will invent. 4. We shouldn't abandon the idea just because a perfect solution is unattainable. There are complex cases to consider (search paths, for example, and even worse how search paths interact with the app writing config data rather than just reading it, or migration when a scheme changes). The PEP should at least mention these cases, but it's not unreasonable to simply declare them out of scope of the module (most applications don't need anything this complex). Paul From rosuav at gmail.com Wed Sep 2 02:47:43 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Sep 2015 10:47:43 +1000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: On Wed, Sep 2, 2015 at 9:05 AM, Paul Moore wrote: > So, in my view: > > 1. We should have something that provides the functionality of appdirs > in the stdlib. > 2. It probably needs a PEP to get the corner cases right. > 3. The behaviour of appdirs is a good baseline default - even if it > isn't 100% compliant with platform standards it'll be better than what > someone unfamiliar with the platform will invent. > 4. We shouldn't abandon the idea just because a perfect solution is > unattainable. +1 > There are complex cases to consider (search paths, for example, and > even worse how search paths interact with the app writing config data > rather than just reading it, or migration when a scheme changes). The > PEP should at least mention these cases, but it's not unreasonable to > simply declare them out of scope of the module (most applications > don't need anything this complex). Might be worth starting with something simple: ask for one directory (the default or most obvious place), or ask for a full list of plausible directories to try. Then a config manager could be built on top of that which would handle write location selection, migration, etc, and that would be a separate proposal that makes use of the appdata module for the cross-platform stuff. ChrisA From ncoghlan at gmail.com Wed Sep 2 06:01:25 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Sep 2015 14:01:25 +1000 Subject: [Python-ideas] ideas for type hints for variable: beyond comments In-Reply-To: <20150901130354.GE19373@ando.pearwood.info> References: <55E57CCF.9030909@gmail.com> <20150901130354.GE19373@ando.pearwood.info> Message-ID: On 1 September 2015 at 23:03, Steven D'Aprano wrote: > PEP 484 says: > > "No first-class syntax support for explicitly marking variables as being > of a specific type is added by this PEP. To help with type inference in > complex cases, a comment of the following format may be used: ..." > > https://www.python.org/dev/peps/pep-0484/ > > I recall that in the discussions prior to the PEP, I got the strong > impression that Guido was open to the concept of annotating variables in > principle, but didn't think it was very important (for the most part, > the type checker should be able to infer the variable type), and he > didn't want to delay the PEP for the sake of agreement on a variable > declaration syntax when a simple comment will do the job. > > So in principle, if we agree that type declarations for variables should > look like (let's say) `str s = some_function(arg)` then the syntax may > be added in the future, but it's a low priority. The main case where it's potentially useful is when we want to initialise a variable to None, but constrain permitted rebindings (from a typechecker's perspective) to a particular type. When we initialise a variable to an actual value, then type inference can usually handle it. Using the typing module as it exists today, I believe this should work for that purpose (although I haven't actually tried it with mypy or any other typechecker): from typing import TypeVar, Generic, Optional T = TypeVar("T") class Var(Generic[T]): def __new__(cls, value:Optional[T] = None) -> Optional[T]: return None i = Var[int]() Unless I've misunderstood the likely outcome of type inference completely, the value of i here will be None, but it's inferred type would be Optional[int]. At runtime, you could still rebind "i" to whatever you want, but a typechecker would complain if it was to anything other than None or an integer. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Sep 2 06:05:12 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Sep 2015 14:05:12 +1000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On 2 September 2015 at 07:22, Nathaniel Smith wrote: > All of this is probably as much an argument *for* providing the correct > functionality as a standard thing as it is against, but any PEP here > probably needs to be thorough about citing the research to show that > it's actually getting the various platform standards correct. We'd also want to state up front that non-compliance with the relevant platform standards *is* considered a bug, so it may change in maintenance releases in order to support changes in the platform standards. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From me at the-compiler.org Wed Sep 2 06:18:46 2015 From: me at the-compiler.org (Florian Bruhin) Date: Wed, 2 Sep 2015 06:18:46 +0200 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: <20150902041846.GH10941@tonks> * Philipp A. [2015-09-01 08:00:21 +0000]: > There is a beautiful little module that does things right and is easy to > use: appdirs > > TI think this is a *really* good candidate for the stdlib since this > functionality is useful for everything that needs a cache or config (so not > only GUI and CLI applications, but also scripts that download and cache > stuff from the internet for faster re-running) +1 from me as well. Another source for inspirations might be the QStandardPaths class from the Qt library (which is C++, but I'm using QStandardPaths in my PyQt applicaiton): http://doc.qt.io/qt-5/qstandardpaths.html They have a QStandardPath::writableLocation which gives you exactly one path to write to, a QStandardPath::standardLocations which gives you a list of paths, and a QStandardPath::locate which locates your config based on a name. They also had the issue with changing standards such as Local/Roaming appdata on Windows, and solved it by introducing more enum values to the StandardLocation enum: QStandardPaths::DataLocation Returns the same value as AppLocalDataLocation. This enumeration value is deprecated. Using AppDataLocation is preferable since on Windows, the roaming path is recommended. QStandardPaths::AppDataLocation Returns a directory location where persistent application data can be stored. This is an application-specific directory. To obtain a path to store data to be shared with other applications, use QStandardPaths::GenericDataLocation. The returned path is never empty. On the Windows operating system, this returns the roaming path. This enum value was added in Qt 5.4. QStandardPaths::AppLocalDataLocation Returns the local settings path on the Windows operating system. On all other platforms, it returns the same value as AppDataLocation. This enum value was added in Qt 5.4. Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From tjreedy at udel.edu Wed Sep 2 06:31:33 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 2 Sep 2015 00:31:33 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On 9/1/2015 5:04 PM, Paul Moore wrote: > On 1 September 2015 at 17:19, John Wong wrote: >> But is appdirs only useful if you are running something that's more toward >> system package / desktop application? A lot of projects today create their >> own directory to save data, many use $HOME/DOTCUSTOM_DIR. So the use case of >> appdirs should be addressed. > > But that is not appropriate on Windows. Appdirs gives the above on > Unix, but %APPDATA%\Appname on Windows, which conforms properly to > platform standards. The problem with Windows is that the standard is to put things in an invisible directory, which makes it difficult to tell people, especially non-experts, to edit a file in the directory. Games that expect people to edit .ini files put them in the game directory. -- Terry Jan Reedy From robertc at robertcollins.net Wed Sep 2 06:38:05 2015 From: robertc at robertcollins.net (Robert Collins) Date: Wed, 2 Sep 2015 16:38:05 +1200 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: On 2 September 2015 at 11:05, Paul Moore wrote: > On 1 September 2015 at 22:47, Andrew Barnert via Python-ideas > wrote: >> Things like app data and prefs aren't a single directory. XDG has a notion of a search path rather than a single directory; Windows has a notion of separate search domains; OS X makes things as fun as possible by having both. For writing a new file, you're usually fine just writing to the first path in the default domain (as long as it exists or you can create it), but for reading files you're supposed to look in /etc or All Users or whatever if it's not found there. Most cross-platform wrappers I've used in the past didn't deal with this automatically, and a lot of them didn't even make it easy to do manually. > > This is a fair point. But it's also worth noting that the current > state of affairs for many apps is to just bung stuff in ~/whatever. > While appdirs may not get things totally right, at least it improves > things. And if it (or something similar) were in the stdlib, it would > at least provide a level of uniformity. In about 5 years time. Maybe, The adoption curve for something that works on all Pythons is able to be much much higher than that for something which is only in the stdlib 6 months (or more) from now. Unless we do a rolling backport of it. And if we're going to do that... why? Why not just provide a documentation link to the thing and say 'pip install this' and/or 'setuptools install_require this'. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From random832 at fastmail.us Wed Sep 2 07:21:56 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 02 Sep 2015 01:21:56 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> On Wed, Sep 2, 2015, at 00:31, Terry Reedy wrote: > The problem with Windows is that the standard is to put things in an > invisible directory, which makes it difficult to tell people, especially > non-experts, to edit a file in the directory. I'm not sure you _should_ be telling non-experts to find a file to edit. Why doesn't your app provide a UI for it, or at least a button that pops up the file in the text editor (Minecraft, for example, has a button to pop up the folder you're expected to drop downloaded texture packs into), if editing it as free form text is something that end users _really_ should be expected to do? Plus, it's not really any harder to find than a "Hidden" directory beginning with a dot - in either case you have to either type the name or enable showing hidden files, and neither platform makes this easier than the other. From ncoghlan at gmail.com Wed Sep 2 07:57:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Sep 2015 15:57:18 +1000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: On 2 September 2015 at 14:38, Robert Collins wrote: > On 2 September 2015 at 11:05, Paul Moore wrote: >> This is a fair point. But it's also worth noting that the current >> state of affairs for many apps is to just bung stuff in ~/whatever. >> While appdirs may not get things totally right, at least it improves >> things. And if it (or something similar) were in the stdlib, it would >> at least provide a level of uniformity. > > In about 5 years time. Maybe, > > The adoption curve for something that works on all Pythons is able to > be much much higher than that for something which is only in the > stdlib 6 months (or more) from now. Unless we do a rolling backport of > it. > > And if we're going to do that... why? Why not just provide a > documentation link to the thing and say 'pip install this' and/or > 'setuptools install_require this'. My perspective on that has been shifting in recent years, to the point where I view this kind of standard library modules primarily as a tool to helps folks learn how the computing ecosystem works in practice. Consider PEP 3144, for example, and the differences between ipaddress, and its inspiration, ipaddr. The standard library one is significantly stricter about correctly using networking terminology, so you can actually study the ipaddress module as a way of learning how IP addressing works. The original ipaddr, by contrast, is only easy to use if you already know all the terms, and can say "Oh, OK, they're using that term somewhat loosely, but I can see what they mean". I think this is a case where a similar approach would make sense - like ipaddr before it, appdirs represents an actual cross-version production module, put together by a company (in this case ActiveState rather than Google) for their own use, but made available to the wider community Python through PyPI. As such, we know it's feature coverage is likely to be good, but the API design is likely to be optimised for experienced developers that already understand the concepts, and just want a library to handle the specific technical details. A standard library API would shift the emphasis slightly, and take into account the perspective of the *beginning* programmer, who may have only first learned about the command line and files and directories in the course of learning Python, and is now venturing into the realm of designing full desktop (and mobile?) applications. Regards, Nick. P.S. The statistics module is another example of being able to use the Python standard as a teaching tool to help learn something else: for many production use cases, you'd reach for something more sophisticated (like the NumPy stack), but if what you're aiming to do is to learn (or teach) basic statistical concepts, it covers the essentials. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Wed Sep 2 09:46:48 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 2 Sep 2015 08:46:48 +0100 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: On 2 September 2015 at 06:57, Nick Coghlan wrote: > A standard library API would shift the emphasis slightly, and take > into account the perspective of the *beginning* programmer, who may > have only first learned about the command line and files and > directories in the course of learning Python, and is now venturing > into the realm of designing full desktop (and mobile?) applications. Agreed. In this case, the focus should be on providing "correct" cross-platform defaults, and assisting (and encouraging) users to understand the choices and constraints applicable to other platforms, which may not be relevant on theirs. This may feel like a nuisance for single-platform developers (a Unix-only program doesn't need to care about local or roaming preferences, because they don't have the concept of a domain account), so it's important to get the defaults right so that people who don't need to care, don't have to. But the options should be accessible, so that people can learn to make the right choices (for example, pip puts preferences in the roaming profile, but the cache in the local profile, because you don't want to bloat the roaming profile with a cache, but you do want the user's preferences to be available on all the machines they use). Paul From p.f.moore at gmail.com Wed Sep 2 09:52:06 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 2 Sep 2015 08:52:06 +0100 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: On 2 September 2015 at 05:31, Terry Reedy wrote: > The problem with Windows is that the standard is to put things in an > invisible directory, which makes it difficult to tell people, especially > non-experts, to edit a file in the directory. As you say, that's an issue with Windows, not with the library (if indeed it *is* an issue with Windows - the users I've dealt with don't have huge problems with things in appdata, although they would usually expect the program to offer a config dialog rather than making them edit the file by hand - but that's off-topic for this thread). > Games that expect people to edit .ini files put them in the game directory. That's a common choice on Windows, certainly (relating to historical issues where the official standards used to be a lot more user-hostile). It may well-be that the appdirs module should offer an "app-local" (or "portable", if you prefer) scheme in addition to the default scheme. Paul From flying-sheep at web.de Wed Sep 2 10:10:29 2015 From: flying-sheep at web.de (Philipp A.) Date: Wed, 02 Sep 2015 08:10:29 +0000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: OK, original poster here. thanks for the positive reception! so there are some issues which i?ll address 1. *appdirs doesn?t get everything right* ? in order not to have inconsistencies, we could abandon the ?appdirs as a fallback? approach and create our own API which returns more correct results. this also frees us to make other changes where we see fit, e.g. see next point 2. *there is a search path instead of a single directory sometimes* ? appdirs provides a multipath keyword argument which returns a (colon-delimited) ?list? of paths. we should provide sth. similar, only with python lists. maybe also convenience functions for getting a list of all files matching some subpath similar to [d / subpath for d in site_data_dir(appname, appauthor, all=True) if (d / subpath).exists()] 3. *some platforms don?t have some specific subset of the functionality (e.g. no config dir on OSX)* ? provide a warning in the stdlib docs about that and refer to another settings API. unfortunately i can?t find a good NSDefaults library in python right now. i think the API should still return some directory that works for storing settings on OSX in case people want to use platform-independent config files. 4. *it?s hard to tell newbies where their files are* ? unfortunately that?s how things are. the existing standards are confusing, but should be honored nonetheless. we can provide an api like python -m stddirs [--config|--data|...] which prints a selection of standard directories. did i miss anything? and yeah: i also think that something that?s a little opinionated in case of ambiguities is vastly better than everyone hacking their own impromptu platform-dependent alternative. ~/.appname stopped being right on linux long ago and never was right on other platforms, which we should teach people. best, phil ? Nick Coghlan schrieb am Mi., 2. Sep. 2015 um 07:57 Uhr: > On 2 September 2015 at 14:38, Robert Collins > wrote: > > On 2 September 2015 at 11:05, Paul Moore wrote: > >> This is a fair point. But it's also worth noting that the current > >> state of affairs for many apps is to just bung stuff in ~/whatever. > >> While appdirs may not get things totally right, at least it improves > >> things. And if it (or something similar) were in the stdlib, it would > >> at least provide a level of uniformity. > > > > In about 5 years time. Maybe, > > > > The adoption curve for something that works on all Pythons is able to > > be much much higher than that for something which is only in the > > stdlib 6 months (or more) from now. Unless we do a rolling backport of > > it. > > > > And if we're going to do that... why? Why not just provide a > > documentation link to the thing and say 'pip install this' and/or > > 'setuptools install_require this'. > > My perspective on that has been shifting in recent years, to the point > where I view this kind of standard library modules primarily as a tool > to helps folks learn how the computing ecosystem works in practice. > Consider PEP 3144, for example, and the differences between ipaddress, > and its inspiration, ipaddr. The standard library one is significantly > stricter about correctly using networking terminology, so you can > actually study the ipaddress module as a way of learning how IP > addressing works. The original ipaddr, by contrast, is only easy to > use if you already know all the terms, and can say "Oh, OK, they're > using that term somewhat loosely, but I can see what they mean". > > I think this is a case where a similar approach would make sense - > like ipaddr before it, appdirs represents an actual cross-version > production module, put together by a company (in this case ActiveState > rather than Google) for their own use, but made available to the wider > community Python through PyPI. As such, we know it's feature coverage > is likely to be good, but the API design is likely to be optimised for > experienced developers that already understand the concepts, and just > want a library to handle the specific technical details. > > A standard library API would shift the emphasis slightly, and take > into account the perspective of the *beginning* programmer, who may > have only first learned about the command line and files and > directories in the course of learning Python, and is now venturing > into the realm of designing full desktop (and mobile?) applications. > > Regards, > Nick. > > P.S. The statistics module is another example of being able to use the > Python standard as a teaching tool to help learn something else: for > many production use cases, you'd reach for something more > sophisticated (like the NumPy stack), but if what you're aiming to do > is to learn (or teach) basic statistical concepts, it covers the > essentials. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Sep 2 10:19:41 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 2 Sep 2015 09:19:41 +0100 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: On 2 September 2015 at 05:38, Robert Collins wrote: >> This is a fair point. But it's also worth noting that the current >> state of affairs for many apps is to just bung stuff in ~/whatever. >> While appdirs may not get things totally right, at least it improves >> things. And if it (or something similar) were in the stdlib, it would >> at least provide a level of uniformity. > > In about 5 years time. Maybe, Most of the programs I write are for Python 3.4 at the moment, and will be for Python 3.5 as soon as it comes out. They won't be distributed outside of my group at work, if indeed anyone but me uses them. They won't care about older versions of Python. I don't even care about cross-platform. But I do want to set up a cache directory, or save some settings, without thinking *too* hard about where to put them. I don't want to have an external dependency because I'm forever running these things from whatever virtualenv I currently have active (sloppy work habits, I know, but that's the point - not everything is a nicely structured development project). For programs that need to support older versions of Python, a backport should be trivial - I can't see that this would need any particularly "modern" Python features. Paul From cs at zip.com.au Wed Sep 2 10:27:53 2015 From: cs at zip.com.au (Cameron Simpson) Date: Wed, 2 Sep 2015 18:27:53 +1000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <55E617C6.9000708@gmail.com> References: <55E617C6.9000708@gmail.com> Message-ID: <20150902082753.GA10839@cskk.homeip.net> On 01Sep2015 17:25, Yury Selivanov wrote: >>What makes it particularly difficult is that if you "fix a bug" in a >>library like appdirs, so that it starts suddenly returning different >>results on some computer somewhere, then what it looks like to the end >>user is that their data/settings/whatever have suddenly evaporated [...] >>Generally when applications change how they compute >>these directories, they also include tricky migration logic to check >>both the old and new names, move stuff over if needed, but I'm not >>sure how a low-level library like this can support that usefully... If it were me I'd want to keep a little state recording what choices were made for these things and whether those were the defaults. Then you can check that on next run: if the choice was a default and the default no longer matches, migrate. Of course, if the default location for this state changes... Not pretending I'm offering a comprehensive solution, I remain, Cameron Simpson "waste cycles drawing trendy 3D junk" - Mac Eudora v3 config option From mal at egenix.com Wed Sep 2 11:30:33 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 02 Sep 2015 11:30:33 +0200 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: <55E6C1B9.2030506@egenix.com> On 02.09.2015 10:10, Philipp A. wrote: > ~/.appname stopped being right on linux long ago and never was right on > other platforms, which we should teach people. Looking at the my home dir on Linux, there doesn't seem to be one standard, but rather a whole set of them and the good old ~/.appname is still a popular one (e.g. pip and ansible from Python land still use it; as do many other non-Python applications such as ncftp, emacs, svn, git, gpg, etc.). ~/.config/ does get some use, but mostly for GUI applications, not so much for command line ones. ~/.local/lib/ only appears to be used by Python :-) ~/.local/share/ is mostly used by desktops to register application shortcuts ~/.cache/ is being used by just a handful of tools, pip being one of them. appdirs seems to rely on the XDG Base Directory Specification for a lot of things on Linux (http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html). That's probably fine for desktop GUI apps (the standard was apparently built for this use case), but doesn't apply at all for command line tools or applications like daemons or services which don't interact with the desktop, e.g. you typically won't find global config files for command line tools under /etc/xdg/, but instead under /etc/. For Windows, the CSIDL_* values have also been replaced with new ones under FOLDERID_* (the APIs have also evolved): Values: https://msdn.microsoft.com/en-us/library/windows/desktop/bb762494%28v=vs.85%29.aspx https://msdn.microsoft.com/en-us/library/windows/desktop/dd378457%28v=vs.85%29.aspx APIs: https://msdn.microsoft.com/en-us/library/windows/desktop/bb762181%28v=vs.85%29.aspx https://msdn.microsoft.com/en-us/library/windows/desktop/bb762188%28v=vs.85%29.aspx BTW: I wonder why the Windows functions in appdirs don't use the environment for much easier access to e.g. APPDATA and USERPROFILE. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 02 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-08-27: Released eGenix mx Base 3.2.9 ... http://egenix.com/go83 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Wed Sep 2 12:58:07 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 2 Sep 2015 06:58:07 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <55E6C1B9.2030506@egenix.com> References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> <55E6C1B9.2030506@egenix.com> Message-ID: > On Sep 2, 2015, at 5:30 AM, M.-A. Lemburg wrote: > > Looking at the my home dir on Linux, there doesn't seem to be > one standard, but rather a whole set of them and the good old > ~/.appname is still a popular one (e.g. pip and ansible from > Python land still use it; as do many other non-Python applications > such as ncftp, emacs, svn, git, gpg, etc.). Just to be clear: pip supports and prefers the XDG spec, the old locations are just still supported for backwards compatibility. Though we did deviate from XDG in that we use /etc/ instead of /etc/xdg/. From mal at egenix.com Wed Sep 2 13:14:37 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 02 Sep 2015 13:14:37 +0200 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> <55E6C1B9.2030506@egenix.com> Message-ID: <55E6DA1D.1030105@egenix.com> I just commented on the ticket that references this discussion: http://bugs.python.org/issue7175 In essence, Python already has an installation scheme which is defined in sysconfig.py (see the top of the file) and has had this ever since distutils got added to the stdlib. It just lacks explicit entries for "config" and "cache" files, so adding those would be more in line with coming up with yet another standard, e.g. for posix_user: 'posix_user': { 'stdlib': '{userbase}/lib/python{py_version_short}', 'platstdlib': '{userbase}/lib/python{py_version_short}', 'purelib': '{userbase}/lib/python{py_version_short}/site-packages', 'platlib': '{userbase}/lib/python{py_version_short}/site-packages', 'include': '{userbase}/include/python{py_version_short}', 'scripts': '{userbase}/bin', 'config': '{userbase}/etc', 'cache': '{userbase}/var', 'data': '{userbase}', }, ({userbase} is set by looking at PYTHONUSERBASE and defaults to ~/.local/) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 02 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-08-27: Released eGenix mx Base 3.2.9 ... http://egenix.com/go83 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From steve.dower at python.org Wed Sep 2 15:19:15 2015 From: steve.dower at python.org (Steve Dower) Date: Wed, 2 Sep 2015 06:19:15 -0700 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <55E6C1B9.2030506@egenix.com> References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> <55E6C1B9.2030506@egenix.com> Message-ID: "BTW: I wonder why the Windows functions in appdirs don't use the environment for much easier access to e.g. APPDATA and USERPROFILE." The environment can become corrupted more easily and it's really difficult to diagnose that from a bug report that says "my config is corrupt". I assume appdirs using ctypes now, but I'd be happy to add the call into the os module to avoid that. Cheers, Steve Top-posted from my Windows Phone -----Original Message----- From: "M.-A. Lemburg" Sent: ?9/?2/?2015 2:31 To: "Philipp A." ; "Nick Coghlan" ; "Robert Collins" Cc: "Nathaniel Smith" ; "python-ideas at python.org" Subject: Re: [Python-ideas] Add appdirs module to stdlib On 02.09.2015 10:10, Philipp A. wrote: > ~/.appname stopped being right on linux long ago and never was right on > other platforms, which we should teach people. Looking at the my home dir on Linux, there doesn't seem to be one standard, but rather a whole set of them and the good old ~/.appname is still a popular one (e.g. pip and ansible from Python land still use it; as do many other non-Python applications such as ncftp, emacs, svn, git, gpg, etc.). ~/.config/ does get some use, but mostly for GUI applications, not so much for command line ones. ~/.local/lib/ only appears to be used by Python :-) ~/.local/share/ is mostly used by desktops to register application shortcuts ~/.cache/ is being used by just a handful of tools, pip being one of them. appdirs seems to rely on the XDG Base Directory Specification for a lot of things on Linux (http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html). That's probably fine for desktop GUI apps (the standard was apparently built for this use case), but doesn't apply at all for command line tools or applications like daemons or services which don't interact with the desktop, e.g. you typically won't find global config files for command line tools under /etc/xdg/, but instead under /etc/. For Windows, the CSIDL_* values have also been replaced with new ones under FOLDERID_* (the APIs have also evolved): Values: https://msdn.microsoft.com/en-us/library/windows/desktop/bb762494%28v=vs.85%29.aspx https://msdn.microsoft.com/en-us/library/windows/desktop/dd378457%28v=vs.85%29.aspx APIs: https://msdn.microsoft.com/en-us/library/windows/desktop/bb762181%28v=vs.85%29.aspx https://msdn.microsoft.com/en-us/library/windows/desktop/bb762188%28v=vs.85%29.aspx BTW: I wonder why the Windows functions in appdirs don't use the environment for much easier access to e.g. APPDATA and USERPROFILE. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 02 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-08-27: Released eGenix mx Base 3.2.9 ... http://egenix.com/go83 ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Wed Sep 2 15:36:27 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 02 Sep 2015 09:36:27 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: <1441200987.2912104.372747265.63D1E82F@webmail.messagingengine.com> On Wed, Sep 2, 2015, at 04:10, Philipp A. wrote: > *there is a search path instead of a single directory sometimes* > ? appdirs provides a multipath keyword argument which returns a > (colon-delimited) ?list? of paths. Why isn't it a real list? Paths on any platform can contain a colon; paths on Windows commonly do. From flying-sheep at web.de Wed Sep 2 16:30:31 2015 From: flying-sheep at web.de (Philipp A.) Date: Wed, 02 Sep 2015 14:30:31 +0000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <55E6C1B9.2030506@egenix.com> References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> <55E6C1B9.2030506@egenix.com> Message-ID: hi marc-andre, you seem some misconceptions and a *very* different experience from mine: M.-A. Lemburg mal at egenix.com schrieb am Mi., 2. Sep. 2015 um 11:30 Uhr: Looking at the my home dir on Linux, there doesn't seem to be > one standard, but rather a whole set of them can you please link to the document from a standards authority describing those other stadards? ? and the good old ~/.appname is still a popular one (e.g. pip and ansible > from > Python land still use it; as do many other non-Python applications > such as ncftp, emacs, svn, git, gpg, etc.). > as hinted at by my tongue-in-cheek comment from above: that?s not a standard but an old convention. git uses ${XDG_CONFIG_DIR-$HOME/.config}/git/config (try it!), as does pip. the other ones are old as dirt so this is excusable. regarding newly developed programs i?ll only approve it for some rare exceptions like shells, where ~/.${SHELL}rc really is the only expected place for a config file. (and not that me approving it means something) ~/.config/ does get some use, but mostly for GUI applications, > not so much for command line ones. > where do you get this figure from? the xdg standard doesn?t say anything about only a kind of application being targeted by the standard, and as my fontconfig example shows, even libraries follow it. ~/.local/lib/ only appears to be used by Python :-) > yeah, on my system it doesn?t even exist! but that has a reson which you can read in the standard: only ~/.local/share is pointed at by the default for a standard dir. ~/.local/lib is probably an invention by whoever included that path in the default $PYTHONPATH ~/.local/share/ is mostly used by desktops to register application shortcuts > that?s a big understatement, there?s all kinds of stuff in there. i have 56 dirs and files in there. ~/.cache/ is being used by just a handful of tools, pip being one of them. > hah! various parts of KDE, matplotlib, gstreamer, chromium, fontconfig, atom, shall i go on? -- > Marc-Andre Lemburg > eGenix.com > i hope i could convince you that this isn?t ?some contender among many?, but really *the* standard for some directories on linux. best, phil ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Wed Sep 2 16:40:42 2015 From: flying-sheep at web.de (Philipp A.) Date: Wed, 02 Sep 2015 14:40:42 +0000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <1441200987.2912104.372747265.63D1E82F@webmail.messagingengine.com> References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> <1441200987.2912104.372747265.63D1E82F@webmail.messagingengine.com> Message-ID: schrieb am Mi., 2. Sep. 2015 um 15:36 Uhr: > On Wed, Sep 2, 2015, at 04:10, Philipp A. wrote: > > *there is a search path instead of a single directory sometimes* > > ? appdirs provides a multipath keyword argument which returns a > > (colon-delimited) ?list? of paths. > > Why isn't it a real list? Paths on any platform can contain a colon; > paths on Windows commonly do. no idea why, but once we start defining our own API, this will be the most important cange from appdirs. maybe they use semicolons on windows, but i really don?t see why we shouldn?t just use python lists. best, philipp -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericfahlgren at gmail.com Wed Sep 2 20:31:04 2015 From: ericfahlgren at gmail.com (Eric Fahlgren) Date: Wed, 2 Sep 2015 11:31:04 -0700 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: <012901d0e5ad$87a64c70$96f2e550$@gmail.com> > ~/.appname stopped being right on linux long ago and never was right on other platforms, which we should teach people. Ah, yes. I count 17 of those on my Windows machine (!) right now, including .idlerc, .ipython, .matplotlib, .ipylint.d etc., so we've got a ways to go. :) From tjreedy at udel.edu Wed Sep 2 20:34:08 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 2 Sep 2015 14:34:08 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> Message-ID: On 9/2/2015 1:21 AM, random832 at fastmail.us wrote: > On Wed, Sep 2, 2015, at 00:31, Terry Reedy wrote: >> The problem with Windows is that the standard is to put things in an >> invisible directory, which makes it difficult to tell people, especially >> non-experts, to edit a file in the directory. > > I'm not sure you _should_ be telling non-experts to find a file to edit. > Why doesn't your app provide a UI for it, I added one, mostly written by Tal Einat, a year ago, but older versions of Idle have not disappeared (and the user config files are global to all versions, for a particular user). And there is not yet an installer for 3rd party extensions. > Plus, it's not really any harder to find than a "Hidden" directory > beginning with a dot Quit the contrary. Files beginning with a '.' are not hidden on Windows Explorer (or Command Prompt dir, for that matter). I do not know of any way to enable showing hidden files with Explorer. (If you know of one, tell me.) The secret to getting to one is to click the directory sequence bar to the right of the last entry to get a directory path string, click again to unselect it, then add the name. In this particular case, enter 'AppData/Roaming' or '%APPDATA%'. It is intentionally difficult for user to access these files. -- Terry Jan Reedy From ckaynor at zindagigames.com Wed Sep 2 20:54:16 2015 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Wed, 2 Sep 2015 11:54:16 -0700 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> Message-ID: On Wed, Sep 2, 2015 at 11:34 AM, Terry Reedy wrote: >> Plus, it's not really any harder to find than a "Hidden" directory >> beginning with a dot > > Quit the contrary. Files beginning with a '.' are not hidden on Windows > Explorer (or Command Prompt dir, for that matter). I do not know of any way > to enable showing hidden files with Explorer. (If you know of one, tell me.) > The secret to getting to one is to click the directory sequence bar to the > right of the last entry to get a directory path string, click again to > unselect it, then add the name. In this particular case, enter > 'AppData/Roaming' or '%APPDATA%'. It is intentionally difficult for user to > access these files. There is an option in the GUI for it: On Windows 7, from an explorer window: Tools->Folder Options->View Tab then in the Advanced Settings list: "Show hidden files, folders, and drives". On Windows 7, the menus are hidden by default, and you need to hold Alt for them to show up. I think Vista uses the same options, and the menus are shown by default, while XP uses a slightly different layout, but its in roughly the same location. I do not know where it is on Windows 8 or 10, as I have never really used either of those. Chris From srkunze at mail.de Wed Sep 2 22:01:25 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 02 Sep 2015 22:01:25 +0200 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> Message-ID: <55E75595.1010807@mail.de> On 02.09.2015 20:54, Chris Kaynor wrote: > On Wed, Sep 2, 2015 at 11:34 AM, Terry Reedy wrote: >>> Plus, it's not really any harder to find than a "Hidden" directory >>> beginning with a dot >> Quit the contrary. Files beginning with a '.' are not hidden on Windows >> Explorer (or Command Prompt dir, for that matter). I do not know of any way >> to enable showing hidden files with Explorer. (If you know of one, tell me.) >> The secret to getting to one is to click the directory sequence bar to the >> right of the last entry to get a directory path string, click again to >> unselect it, then add the name. In this particular case, enter >> 'AppData/Roaming' or '%APPDATA%'. It is intentionally difficult for user to >> access these files. > There is an option in the GUI for it: > On Windows 7, from an explorer window: Tools->Folder Options->View Tab > then in the Advanced Settings list: "Show hidden files, folders, and > drives". On Windows 7, the menus are hidden by default, and you need > to hold Alt for them to show up. I think Vista uses the same options, > and the menus are shown by default, while XP uses a slightly different > layout, but its in roughly the same location. > > I do not know where it is on Windows 8 or 10, as I have never really > used either of those. Same as Win 7. > Chris > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From python at mrabarnett.plus.com Wed Sep 2 22:15:38 2015 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 2 Sep 2015 21:15:38 +0100 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> Message-ID: <55E758EA.8000607@mrabarnett.plus.com> On 2015-09-02 19:54, Chris Kaynor wrote: > On Wed, Sep 2, 2015 at 11:34 AM, Terry Reedy wrote: >>> Plus, it's not really any harder to find than a "Hidden" directory >>> beginning with a dot >> >> Quit the contrary. Files beginning with a '.' are not hidden on Windows >> Explorer (or Command Prompt dir, for that matter). I do not know of any way >> to enable showing hidden files with Explorer. (If you know of one, tell me.) >> The secret to getting to one is to click the directory sequence bar to the >> right of the last entry to get a directory path string, click again to >> unselect it, then add the name. In this particular case, enter >> 'AppData/Roaming' or '%APPDATA%'. It is intentionally difficult for user to >> access these files. > > There is an option in the GUI for it: > On Windows 7, from an explorer window: Tools->Folder Options->View Tab > then in the Advanced Settings list: "Show hidden files, folders, and > drives". On Windows 7, the menus are hidden by default, and you need > to hold Alt for them to show up. I think Vista uses the same options, > and the menus are shown by default, while XP uses a slightly different > layout, but its in roughly the same location. > > I do not know where it is on Windows 8 or 10, as I have never really > used either of those. > On Windows 10 it's on the File menu: File->Change folder options and search options From random832 at fastmail.us Wed Sep 2 23:40:47 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 02 Sep 2015 17:40:47 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> Message-ID: <1441230047.3965673.373219713.06790441@webmail.messagingengine.com> On Wed, Sep 2, 2015, at 14:34, Terry Reedy wrote: > I do not know of any > way to enable showing hidden files with Explorer. (If you know of one, > tell me.) Didn't notice this on my first reply. Go into the Folder Options window, go to the "View" tab, and you will see an option labeled "Show hidden files, folders, and drives". This option is specific to a folder, there is a button to apply it to all folders. The Folder Options dialog box is available from the "Tools" menu in the classic UI (until Windows XP or maybe Vista), and [still hidden there, but more visible in] the "Organize" dropdown as "Folder and search options". My point stands that putting a dot in front of a filename shows _intent_ to have it hidden from ordinary users, since it has that effect on Unix, and makes it no easier to find it on Unix than finding the AppData folder is on Windows. From random832 at fastmail.us Wed Sep 2 23:36:23 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 02 Sep 2015 17:36:23 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> Message-ID: <1441229783.3964747.373218457.1E5AF003@webmail.messagingengine.com> On Wed, Sep 2, 2015, at 14:34, Terry Reedy wrote: > Quit the contrary. Files beginning with a '.' are not hidden on Windows > Explorer I was talking about dot files *on Unix* vs AppData on Windows. Being hidden from the normal view is clearly desired, or people wouldn't use the dot at all. From tjreedy at udel.edu Thu Sep 3 00:33:30 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 2 Sep 2015 18:33:30 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> Message-ID: On 9/2/2015 2:54 PM, Chris Kaynor wrote: > On Wed, Sep 2, 2015 at 11:34 AM, Terry Reedy wrote: >> I do not know of any way >> to enable showing hidden files with Explorer. (If you know of one, tell me.) > There is an option in the GUI for it: > On Windows 7, from an explorer window: Tools->Folder Options->View Tab > then in the Advanced Settings list: "Show hidden files, folders, and > drives". On Windows 7, the menus are hidden by default, and you need > to hold Alt for them to show up. Thank you. I am still using Win 7. They show up on Alt-release. -- Terry Jan Reedy From tjreedy at udel.edu Thu Sep 3 00:38:28 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 2 Sep 2015 18:38:28 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <1441230047.3965673.373219713.06790441@webmail.messagingengine.com> References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> <1441230047.3965673.373219713.06790441@webmail.messagingengine.com> Message-ID: On 9/2/2015 5:40 PM, random832 at fastmail.us wrote: > On Wed, Sep 2, 2015, at 14:34, Terry Reedy wrote: >> I do not know of any >> way to enable showing hidden files with Explorer. (If you know of one, >> tell me.) > > Didn't notice this on my first reply. Go into the Folder Options window, > go to the "View" tab, and you will see an option labeled "Show hidden > files, folders, and drives". This option is specific to a folder, there > is a button to apply it to all folders. > > The Folder Options dialog box is available from the "Tools" menu in the > classic UI (until Windows XP or maybe Vista), and [still hidden there, > but more visible in] the "Organize" dropdown as "Folder and search > options". Thanks. The latter might be the easiest. > My point stands that putting a dot in front of a filename shows _intent_ > to have it hidden from ordinary users, since it has that effect on Unix, > and makes it no easier to find it on Unix than finding the AppData > folder is on Windows. What I remember from 26 years ago is 'ls -a'. Still correct in a console? (No idea how to do the same on GUIs.) This was part of any Unix intro as one was expected to edit the shell configs to one's taste. -- Terry Jan Reedy From stephen at xemacs.org Thu Sep 3 03:21:43 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 03 Sep 2015 10:21:43 +0900 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> <1441230047.3965673.373219713.06790441@webmail.messagingengine.com> Message-ID: <8737yw47rs.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > What I remember from 26 years ago is 'ls -a'. Still correct in a > console? (No idea how to do the same on GUIs.) This was part of any > Unix intro as one was expected to edit the shell configs to one's taste. Yes, "ls -a" (or "ls -A", which omits "." and ".."). This applies to Mac OS X as well, and as of Yosemite I can't find any way to show hidden files in the Finder. (But there may be a way: I use Mac OS X because it's pretty and I can use traditional *nix commands in the terminal, not because I'm a Mac GUI wonk.) From random832 at fastmail.us Thu Sep 3 03:29:33 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 02 Sep 2015 21:29:33 -0400 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <1441171316.956058.372400801.57FECD89@webmail.messagingengine.com> <1441230047.3965673.373219713.06790441@webmail.messagingengine.com> Message-ID: <1441243773.415824.373361338.24A125B2@webmail.messagingengine.com> On Wed, Sep 2, 2015, at 18:38, Terry Reedy wrote: > What I remember from 26 years ago is 'ls -a'. Still correct in a > console? (No idea how to do the same on GUIs.) This was part of any > Unix intro as one was expected to edit the shell configs to one's taste. What is and should be expected of a non-technical user on a modern desktop Unix system is very different from what it was 26 years ago. This is so obvious it should go without saying. And, anyway, Windows has got dir /a too. People who learned DOS 26 years ago probably likewise know it. Anyway, in summary: you have to either use a special option (not *hard* to discover, but not in your face either) to enable hidden files (ls -a, if they use the terminal *at all*, or whatever checkbox performs the same function in gnome/kde/xfce/whatever), or type the exact filename knowing it exists. Neither is much of a burden, but neither is any "better" than on Windows. From j.wielicki at sotecware.net Thu Sep 3 10:43:59 2015 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Thu, 3 Sep 2015 10:43:59 +0200 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: Message-ID: <55E8084F.5020405@sotecware.net> On 01.09.2015 23:22, Nathaniel Smith wrote: > What makes it particularly difficult is that if you "fix a bug" in > a library like appdirs, so that it starts suddenly returning > different results on some computer somewhere, then what it looks > like to the end user is that their data/settings/whatever have > suddenly evaporated and whatever disk space was being used for > caches never gets cleaned up and so forth. Generally when > applications change how they compute these directories, they also > include tricky migration logic to check both the old and new names, > move stuff over if needed, but I'm not sure how a low-level library > like this can support that usefully... A low-level library could provide an API to return "legacy" directories. These could on Freedesktop-compliant systems for example be the fallback directories which are used when the environment-variables are not present. And after bugfixes (see what Nick said), the old, incorrect behaviour could be presented. An application could, after not finding its data in the non-legacy directories, query the legacy directories and start a migration process. regards, jwi From flying-sheep at web.de Thu Sep 3 13:40:16 2015 From: flying-sheep at web.de (Philipp A.) Date: Thu, 03 Sep 2015 11:40:16 +0000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: <012901d0e5ad$87a64c70$96f2e550$@gmail.com> References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> <012901d0e5ad$87a64c70$96f2e550$@gmail.com> Message-ID: Eric Fahlgren schrieb am Mi., 2. Sep. 2015 um 20:31 Uhr: > > ~/.appname stopped being right on linux long ago and never was right on > other platforms, which we should teach people. > > Ah, yes. I count 17 of those on my Windows machine (!) right now, > including .idlerc, .ipython, .matplotlib, .ipylint.d etc., so we've got a > ways to go. :) > ?on windows? wat. oh my god this is horrible. so wrong! if this isn?t the definitte argument why we need this API yesterday? -------------- next part -------------- An HTML attachment was scrubbed... URL: From asweigart at gmail.com Thu Sep 3 22:53:45 2015 From: asweigart at gmail.com (Al Sweigart) Date: Thu, 3 Sep 2015 13:53:45 -0700 Subject: [Python-ideas] Non-English names in the turtle module. Message-ID: I've opened an issue for adding non-English names to the turtle module's function names: https://bugs.python.org/issue24990 This would effectively take this code: import turtle t = turtle.Pen() t.pencolor('green') t.forward(100) ...and have this code in French be completely equivalent: import turtle t = turtle.Plume() t.couleurplume('vert') t.avant(100) (Pardon my google-translate French.) This, of course, is terrible way for a software module to implement internationalization, which usually does not apply to the source code names itself. But turtle is used as a teaching tool. While professional developers are expected to obtain proficiency with English, the same does not apply to school kids who are just taking a small computer programming unit. Having the turtle module available in their native language (even if Python keywords are not) would remove a large barrier and let them focus on the core programming concepts that turtle provides. The popular Scratch tool has a similar internationalized setup and also has LOGO-style commands, so most of the translation work is already done. Are there any design or technical issues I should be aware of before doing this? It seems like a straight forward "Tortuga = Turtle" assignment of names, though I would have a set up so that it is easy to add languages to the source. I have a Google-translated set of translations here: https://github.com/asweigart/idle-reimagined/wiki/Turtle-Translations But of course, a native speaker would have to sign off on it before making it part of the turtle module API. -Al -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Thu Sep 3 23:27:02 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 03 Sep 2015 17:27:02 -0400 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: Message-ID: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> On Thu, Sep 3, 2015, at 16:53, Al Sweigart wrote: > https://github.com/asweigart/idle-reimagined/wiki/Turtle-Translations A couple of downsides I noticed: "The names will later be formatted to fit the code style of the turtle module. " is a bit handwavy, to be honest. Are things like "de la" required or optional? Should an apostrophe/space/hyphen become an underscore or be omitted? What can be abbreviated? What case/inflection/conjugation should be used? These are decisions that have to be made by native speakers based on what will be understandable to beginners who know each language. Your names aren't very consistent - I don't know French that well, but I doubt the module would benefit from mixing "plume", "crayon", and "stylo". What to call a pen needs to be decided at a high level and used globally. Also, things like "set...", "get...", "on...", need to be translated to something consistent throughout the module. For that matter, the naming conventions in the *English* ones are a bit questionable and inconsistent. Maybe this is an opportunity to clean up that API. Also, a bug to report: I know enough Spanish to know that "pantalla clara" means "clear screen" as a noun [screen that is clear], not a verb [clearing the screen]. The English is ambiguous, but most other languages are not. For design questions: Is it important to hide the English names? Is it important to be able to use the classes interchangeably to code that is written to use a different language's method names? Can the same name in one language ever translate to two different names in another language depending on context? From encukou at gmail.com Fri Sep 4 00:55:10 2015 From: encukou at gmail.com (Petr Viktorin) Date: Fri, 4 Sep 2015 00:55:10 +0200 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> Message-ID: On Thu, Sep 3, 2015 at 11:27 PM, wrote: > On Thu, Sep 3, 2015, at 16:53, Al Sweigart wrote: >> https://github.com/asweigart/idle-reimagined/wiki/Turtle-Translations > > A couple of downsides I noticed: > > "The names will later be formatted to fit the code style of the turtle > module. " is a bit handwavy, to be honest. Are things like "de la" > required or optional? Should an apostrophe/space/hyphen become an > underscore or be omitted? What can be abbreviated? What > case/inflection/conjugation should be used? These are decisions that > have to be made by native speakers based on what will be understandable > to beginners who know each language. > > Your names aren't very consistent - I don't know French that well, but I > doubt the module would benefit from mixing "plume", "crayon", and > "stylo". What to call a pen needs to be decided at a high level and used > globally. Also, things like "set...", "get...", "on...", need to be > translated to something consistent throughout the module. > > For that matter, the naming conventions in the *English* ones are a bit > questionable and inconsistent. Maybe this is an opportunity to clean up > that API. > > Also, a bug to report: I know enough Spanish to know that "pantalla > clara" means "clear screen" as a noun [screen that is clear], not a verb > [clearing the screen]. The English is ambiguous, but most other > languages are not. > > For design questions: Is it important to hide the English names? Is it > important to be able to use the classes interchangeably to code that is > written to use a different language's method names? Can the same name in > one language ever translate to two different names in another language > depending on context? I would not be surprised if, among all languages in the world, there'd be a clash in one of turtle's attribute names. Why put everything in the same namespace? It might be better to use more of those ? perhaps something like "from turtle.translations import tortuga" ("tortuga" being Spanish for turtle). That is probably too long, so why not just "import tortuga"? That "tortuga" module could live on PyPI for a while, before it's considered for addition to either the stdlib, or just the installers. From asweigart at gmail.com Fri Sep 4 01:12:51 2015 From: asweigart at gmail.com (Al Sweigart) Date: Thu, 3 Sep 2015 16:12:51 -0700 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> Message-ID: Just to reply to both Petr and Random832 at once: By "formatted to fit the code style of the turtle module" I meant that the names would be pushed together without camelcase or underscores (as they are already done in the turtle module). But you do bring up good points. Whether things like "de la" are omitted or how things can be abbreviated is left entirely up to the native speaker doing the translation. I don't see any way around it. But, as a rough guide, the Scratch tool has already done lots of translation work and translators can piggy back off their wording choices. The turtle module has been around long enough that I already see it's API as set in stone, for better or worse. If the English names should be changed, I see that as a separate issue. I don't see a reason to hide the English names. I want the names to be available without additional setup, i.e. there is no "language" setting that needs to be specified first. The fewer setup steps the better. All names in all languages would be available. I don't see being able to mix, say, English and Spanish names in the same program as a big enough problem that we have to force a fix to prevent it. I'd like to keep the setup as simple and as similar to existing code as possible. There is a slight difference between "import x" and "from x import x" and I'd want to avoid that. From a technical perspective, I don't think the additional names will hinder the maintenance of the turtle module (which rarely changes itself). Though I'll know for sure what the technical issues are, if any, once I produce the patch. The idea for putting these modules on PyPI is interesting. My only hesitation is I don't want "but it's already on PyPI" as an excuse not to include these changes into the standard library turtle module. -Al On Thu, Sep 3, 2015 at 3:55 PM, Petr Viktorin wrote: > On Thu, Sep 3, 2015 at 11:27 PM, wrote: > > On Thu, Sep 3, 2015, at 16:53, Al Sweigart wrote: > >> https://github.com/asweigart/idle-reimagined/wiki/Turtle-Translations > > > > A couple of downsides I noticed: > > > > "The names will later be formatted to fit the code style of the turtle > > module. " is a bit handwavy, to be honest. Are things like "de la" > > required or optional? Should an apostrophe/space/hyphen become an > > underscore or be omitted? What can be abbreviated? What > > case/inflection/conjugation should be used? These are decisions that > > have to be made by native speakers based on what will be understandable > > to beginners who know each language. > > > > Your names aren't very consistent - I don't know French that well, but I > > doubt the module would benefit from mixing "plume", "crayon", and > > "stylo". What to call a pen needs to be decided at a high level and used > > globally. Also, things like "set...", "get...", "on...", need to be > > translated to something consistent throughout the module. > > > > For that matter, the naming conventions in the *English* ones are a bit > > questionable and inconsistent. Maybe this is an opportunity to clean up > > that API. > > > > Also, a bug to report: I know enough Spanish to know that "pantalla > > clara" means "clear screen" as a noun [screen that is clear], not a verb > > [clearing the screen]. The English is ambiguous, but most other > > languages are not. > > > > For design questions: Is it important to hide the English names? Is it > > important to be able to use the classes interchangeably to code that is > > written to use a different language's method names? Can the same name in > > one language ever translate to two different names in another language > > depending on context? > > I would not be surprised if, among all languages in the world, there'd > be a clash in one of turtle's attribute names. > > Why put everything in the same namespace? It might be better to use > more of those ? perhaps something like "from turtle.translations > import tortuga" ("tortuga" being Spanish for turtle). That is probably > too long, so why not just "import tortuga"? That "tortuga" module > could live on PyPI for a while, before it's considered for addition to > either the stdlib, or just the installers. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Sep 4 03:43:02 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 4 Sep 2015 11:43:02 +1000 Subject: [Python-ideas] Add appdirs module to stdlib In-Reply-To: References: <796B0953-FC26-4AC2-AEE1-4BCA5C6F26BF@yahoo.com> Message-ID: <20150904014301.GJ19373@ando.pearwood.info> On Wed, Sep 02, 2015 at 04:38:05PM +1200, Robert Collins wrote: > And if we're going to do that... why? Why not just provide a > documentation link to the thing and say 'pip install this' and/or > 'setuptools install_require this'. I can think of two reasons, one minor, one major: 1. Many people behind corporate or school firewalls cannot just "pip install this". By "firewall" I'm talking more figuratively than literally. Of course there may be an actual firewall blocking access to PyPI, but that's typically very easy to bypass (download the library at home and bring it in on a USB stick). Less easy to bypass is corporate/ school policy which prohibits the installation of unapproved software, often being a firing or expulsion offense. Getting approval may be difficult, slow or downright impossible. However, what makes this a minor reason is that software written under those conditions probably won't be distributed outside of the organisation itself, so who cares whether it complies with the standard locations for application data? 2. More importantly is the stdlib itself. As someone has pointed out, the stdlib already drops config files in completely inappropriate locations on Windows, e.g. $HOME/.idlelib. It would be good if the stdlib itself could consistently use the standard locations for files. -- Steve From rosuav at gmail.com Fri Sep 4 03:51:11 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 4 Sep 2015 11:51:11 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> Message-ID: On Fri, Sep 4, 2015 at 8:55 AM, Petr Viktorin wrote: > Why put everything in the same namespace? It might be better to use > more of those ? perhaps something like "from turtle.translations > import tortuga" ("tortuga" being Spanish for turtle). That is probably > too long, so why not just "import tortuga"? That "tortuga" module > could live on PyPI for a while, before it's considered for addition to > either the stdlib, or just the installers. +1. These modules could simply import a boatload of stuff from "turtle" under new names, which would make them fairly slim. Question: Should they start with "from turtle import *" so the English names are always available? It'd ensure that untranslated names don't get missed out, but it might be confusing. ChrisA From stephen at xemacs.org Fri Sep 4 04:05:51 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 04 Sep 2015 11:05:51 +0900 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> Message-ID: <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> Al Sweigart writes: > The idea for putting these modules on PyPI is interesting. My only > hesitation is I don't want "but it's already on PyPI" as an excuse > not to include these changes into the standard library turtle > module. Exactly backwards, as the first objection is going to be "if it could be on PyPI but isn't, there's no evidence it's ready for the stdlib." From stephen at xemacs.org Fri Sep 4 04:34:41 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 04 Sep 2015 11:34:41 +0900 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> Message-ID: <87twra3oam.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > +1. These modules could simply import a boatload of stuff from > "turtle" under new names, which would make them fairly slim. Question: > Should they start with "from turtle import *" so the English names are > always available? It'd ensure that untranslated names don't get missed > out, but it might be confusing. That would be pretty horrible, and contrary to the point of allowing the new user to learn algorithmic thinking in a small world using intuitively named commands. I would think the sensible thing to do is to is invite participation from traditional translation volunteers with something like import turtle from i18n import _ _translations = { 'turtle' : _('turtle'), ... } for name in dir(turtle): if name in _translations: eval("from turtle import {} as {}".format(name, _translations[name])) elif english_fallbacks_please: eval("from turtle import {}".format(name)) From rosuav at gmail.com Fri Sep 4 04:43:38 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 4 Sep 2015 12:43:38 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <87twra3oam.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87twra3oam.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Sep 4, 2015 at 12:34 PM, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > +1. These modules could simply import a boatload of stuff from > > "turtle" under new names, which would make them fairly slim. Question: > > Should they start with "from turtle import *" so the English names are > > always available? It'd ensure that untranslated names don't get missed > > out, but it might be confusing. > > That would be pretty horrible, and contrary to the point of allowing > the new user to learn algorithmic thinking in a small world using > intuitively named commands. > > I would think the sensible thing to do is to is invite participation > from traditional translation volunteers with something like > > import turtle > from i18n import _ > _translations = { 'turtle' : _('turtle'), ... } > for name in dir(turtle): > if name in _translations: > eval("from turtle import {} as {}".format(name, _translations[name])) > elif english_fallbacks_please: > eval("from turtle import {}".format(name)) Yeah, that'd be better than including all the original English names. And _translations can be generated easily enough: import turtle print("{" + ", ".join("%r: _(%r)" % (n, n) for n in dir(turtle)) + "}") Though the diffs would be disgusting any time anything in turtle changes. ChrisA From steve at pearwood.info Fri Sep 4 04:45:53 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 4 Sep 2015 12:45:53 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150904024552.GL19373@ando.pearwood.info> On Fri, Sep 04, 2015 at 11:05:51AM +0900, Stephen J. Turnbull wrote: > Al Sweigart writes: > > > The idea for putting these modules on PyPI is interesting. My only > > hesitation is I don't want "but it's already on PyPI" as an excuse > > not to include these changes into the standard library turtle > > module. > > Exactly backwards, as the first objection is going to be "if it could > be on PyPI but isn't, there's no evidence it's ready for the stdlib." *cough typing cough* The turtle module has been in Python for many, many years. This proposal doesn't change the functionality, it merely offers a localised API to the same functionality. A bunch of alternate names, nothing more. I would argue that if you consider the user-base of turtle, putting it on PyPI is a waste of time: - Beginners aren't going to know to "pip install whatever". Some of us here seem to think that pip is the answer to everything, but if you look on the python-list mailing list, you will see plenty of evidence that people have trouble using pip. - Schools may have policies against the installation of unapproved software on their desktops, and getting approval to "pip install *" may be difficult, time-consuming or outright impossible. If they are using Python, we know they have approval to use what is in the standard library. Everything else is, at best, a theorectical possibility. One argument against this proposal is that Python is not really designed as a kid-friendly learning language, and we should just abandon that space to languages that do it better, like Scratch. I'd hate to see that argument win, but given our limited resources perhaps we should know when we're beaten. Compared to what Scratch can do, turtle graphics are so very 1970s. But if we think that there is still a place in the Python infrastructure for turtle graphics, then I'm +1 on localising the turtle module. -- Steve From asweigart at gmail.com Fri Sep 4 05:52:30 2015 From: asweigart at gmail.com (Al Sweigart) Date: Thu, 3 Sep 2015 20:52:30 -0700 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <20150904024552.GL19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> Message-ID: Thinking about it some more, yeah, having a separate module on PyPI would just be a waste of time. This isn't changing functionality or experimenting with new features, it's just adding new names to existing functions. And installing stuff with pip is going to be insurmountable barrier for a lot of computer labs. I'd say Python is very much a kid-friendly language. It's definitely much friendlier than BASIC. I'd advise against using the _() function in gettext. That function is for string tables, which is set up to be easily changed and expanded. The turtle API is pretty much set in stone, and dealing with separate .po files and gettext in general would be more of a maintenance headache. It is also dependent on the machine's localization settings. I believe some simple code at the end of turtle.py like this would be good enough: _spanish = {'forward': 'adelante'} # ...and the rest of the translated terms _languages = {'spanish': _spanish} # ...and the rest of the languages def forward(): # this is the original turtle forward() function print('Blah blah blah, this is the forward() function.') for language in _languages: for englishTerm, nonEnglishTerm in _languages[language].items(): locals()[nonEnglishTerm] = locals()[englishTerm] Plus the diff wouldn't look too bad. This doesn't prohibit someone from mixing both English and Non-English names in the same program, but I don't see that as a big problem. I think it's best to have all the languages available without having to setup localization settings. -Al On Thu, Sep 3, 2015 at 7:45 PM, Steven D'Aprano wrote: > On Fri, Sep 04, 2015 at 11:05:51AM +0900, Stephen J. Turnbull wrote: > > Al Sweigart writes: > > > > > The idea for putting these modules on PyPI is interesting. My only > > > hesitation is I don't want "but it's already on PyPI" as an excuse > > > not to include these changes into the standard library turtle > > > module. > > > > Exactly backwards, as the first objection is going to be "if it could > > be on PyPI but isn't, there's no evidence it's ready for the stdlib." > > *cough typing cough* > > > The turtle module has been in Python for many, many years. This proposal > doesn't change the functionality, it merely offers a localised API to > the same functionality. A bunch of alternate names, nothing more. > > I would argue that if you consider the user-base of turtle, putting it > on PyPI is a waste of time: > > - Beginners aren't going to know to "pip install whatever". Some of us > here seem to think that pip is the answer to everything, but if you look > on the python-list mailing list, you will see plenty of evidence that > people have trouble using pip. > > - Schools may have policies against the installation of unapproved > software on their desktops, and getting approval to "pip install *" may > be difficult, time-consuming or outright impossible. If they are using > Python, we know they have approval to use what is in the standard > library. Everything else is, at best, a theorectical possibility. > > One argument against this proposal is that Python is not really designed > as a kid-friendly learning language, and we should just abandon that > space to languages that do it better, like Scratch. I'd hate to see that > argument win, but given our limited resources perhaps we should know > when we're beaten. Compared to what Scratch can do, turtle graphics are > so very 1970s. > > But if we think that there is still a place in the Python infrastructure > for turtle graphics, then I'm +1 on localising the turtle module. > > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 4 09:18:52 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 4 Sep 2015 00:18:52 -0700 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <20150904024552.GL19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> Message-ID: <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> First, does this proposal actually come from a non-English teacher, or someone who's talked to them, or is it just a guess that they might find it nice? Meanwhile: On Sep 3, 2015, at 19:45, Steven D'Aprano wrote: > > - Beginners aren't going to know to "pip install whatever". Some of us > here seem to think that pip is the answer to everything, but if you look > on the python-list mailing list, you will see plenty of evidence that > people have trouble using pip. Of course a sizable chunk of those say "my Python didn't come with pip" and the after a bit of exploration you find that they're using Python 2.7.3 or something, so any feature added to Python 3.6 isn't likely to help them anyway. And that seems like a good argument to add it to PyPI even if it's also added to the stdlib. Sure, for some teachers it'll be easier to just require 3.6 than to require a particular package. But I'm guessing both will be problematic in different cases. For example, if the school is issuing students linux laptops that come with Python 3.4, would explaining apt-get, and getting permission from the IT department for it, is probably harder, not easier, than pip. From stephen at xemacs.org Fri Sep 4 09:23:21 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 04 Sep 2015 16:23:21 +0900 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <20150904024552.GL19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> Message-ID: <87si6u3axi.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > On Fri, Sep 04, 2015 at 11:05:51AM +0900, Stephen J. Turnbull wrote: > > Exactly backwards, as the first objection is going to be "if it could > > be on PyPI but isn't, there's no evidence it's ready for the stdlib." > > *cough typing cough* And? The objection will still be made. And I doubt Guido will agree that typing is a precedent that can be used to justify inclusion of turtle localizations. He might very well be in favor AFAIK, I just doubt he would base that on the precedent of typing. The rest of your post I don't really agree with, but I have no strong counterarguments, either. Here I just wanted to point out that the way these discussions have gone in the past is that without special support from the BDFL, the usual path for these things is through PyPI. Especially since AFAICT we don't actually have an implementation yet. Steve From bussonniermatthias at gmail.com Fri Sep 4 09:26:08 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Fri, 4 Sep 2015 09:26:08 +0200 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> Message-ID: <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> Hi all, Personal opinion, base by a bit of experience: There is one thing worse than programming in a foreign language IMHO (I?m native french) It?s programming in an environment which is half translated and/or mix english and native language. The cognitive load and the context switching it forces the brain to do when 2 languages are present is absolutely astronomical, and i guess translating the Turtle module will not allow to translate the control-flow structure, the docstrings....etc and so on, and so forth if you do simple `Tortue = Turtle` assignment, So while it looks nice 2 liners example you hit this problem pretty quickly. Taking fort given example: import turtle # import is english, should translate to importer, ou importez. turtle should be tortue also. t = turtle.Plume() t.couleurplume('vert?) # plume is a female, couleur should be ?verte?, ?crayon? would be male, so ?vert" t.avant(100) # avance/avancer I can perfectly imagine a menu ?ins?rer use boucle `pour ...`?, that insert a `for ....` in applications, which is confusing is confusing to explain. I also find it much easier to attach a programming meaning to a word that have no previous meaning for a kid (for, range, if, else, print are blank slate for French children), than shoehorn another concept biased by previous experience into it. This in particular make me think of Gibiane[1], which is basically: ?Hey fortran is great let?s make it in french?, which was a really bad idea[2], no it?s not a joke, and yes people do nuclear physics using this language. While I appreciate in general the translation effort, in general most of the translated side of things (MDN, microsoft help pages, Apples ones) are much worse than trying to understand the english originals. So just a warning that the best is the enemy of the good, and despite good intentions[3], trying to translate Turtle module might not be the right thing to do. Thanks, -- Matthias [1]: https://fr.wikipedia.org/wiki/Gibiane [2]: but not the worse IMHO. [3]: http://www.bloombergview.com/articles/2015-08-18/how-a-ban-on-plastic-bags-can-go-wrong > On Sep 4, 2015, at 05:52, Al Sweigart wrote: > > Thinking about it some more, yeah, having a separate module on PyPI would just be a waste of time. This isn't changing functionality or experimenting with new features, it's just adding new names to existing functions. And installing stuff with pip is going to be insurmountable barrier for a lot of computer labs. > > I'd say Python is very much a kid-friendly language. It's definitely much friendlier than BASIC. > > I'd advise against using the _() function in gettext. That function is for string tables, which is set up to be easily changed and expanded. The turtle API is pretty much set in stone, and dealing with separate .po files and gettext in general would be more of a maintenance headache. It is also dependent on the machine's localization settings. > > I believe some simple code at the end of turtle.py like this would be good enough: > > _spanish = {'forward': 'adelante'} # ...and the rest of the translated terms > _languages = {'spanish': _spanish} # ...and the rest of the languages > > def forward(): # this is the original turtle forward() function > print('Blah blah blah, this is the forward() function.') > > for language in _languages: > for englishTerm, nonEnglishTerm in _languages[language].items(): > locals()[nonEnglishTerm] = locals()[englishTerm] > > Plus the diff wouldn't look too bad. > > This doesn't prohibit someone from mixing both English and Non-English names in the same program, but I don't see that as a big problem. I think it's best to have all the languages available without having to setup localization settings. > > -Al > > On Thu, Sep 3, 2015 at 7:45 PM, Steven D'Aprano > wrote: > On Fri, Sep 04, 2015 at 11:05:51AM +0900, Stephen J. Turnbull wrote: > > Al Sweigart writes: > > > > > The idea for putting these modules on PyPI is interesting. My only > > > hesitation is I don't want "but it's already on PyPI" as an excuse > > > not to include these changes into the standard library turtle > > > module. > > > > Exactly backwards, as the first objection is going to be "if it could > > be on PyPI but isn't, there's no evidence it's ready for the stdlib." > > *cough typing cough* > > > The turtle module has been in Python for many, many years. This proposal > doesn't change the functionality, it merely offers a localised API to > the same functionality. A bunch of alternate names, nothing more. > > I would argue that if you consider the user-base of turtle, putting it > on PyPI is a waste of time: > > - Beginners aren't going to know to "pip install whatever". Some of us > here seem to think that pip is the answer to everything, but if you look > on the python-list mailing list, you will see plenty of evidence that > people have trouble using pip. > > - Schools may have policies against the installation of unapproved > software on their desktops, and getting approval to "pip install *" may > be difficult, time-consuming or outright impossible. If they are using > Python, we know they have approval to use what is in the standard > library. Everything else is, at best, a theorectical possibility. > > One argument against this proposal is that Python is not really designed > as a kid-friendly learning language, and we should just abandon that > space to languages that do it better, like Scratch. I'd hate to see that > argument win, but given our limited resources perhaps we should know > when we're beaten. Compared to what Scratch can do, turtle graphics are > so very 1970s. > > But if we think that there is still a place in the Python infrastructure > for turtle graphics, then I'm +1 on localising the turtle module. > > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Fri Sep 4 10:17:36 2015 From: encukou at gmail.com (Petr Viktorin) Date: Fri, 4 Sep 2015 10:17:36 +0200 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> Message-ID: On Fri, Sep 4, 2015 at 9:26 AM, Matthias Bussonnier wrote: > Hi all, > > Personal opinion, base by a bit of experience: > > There is one thing worse than programming in a foreign language IMHO (I?m > native french) > It?s programming in an environment which is half translated and/or mix > english and native language. > > The cognitive load and the context switching it forces the brain to do when > 2 languages > are present is absolutely astronomical, and i guess translating the Turtle > module > will not allow to translate the control-flow structure, the > docstrings....etc and so on, > and so forth if you do simple `Tortue = Turtle` assignment, > So while it looks nice 2 liners example you hit this problem pretty > quickly. > > Taking fort given example: > > import turtle # import is english, should translate to importer, ou > importez. turtle should be tortue also. > t = turtle.Plume() > t.couleurplume('vert?) # plume is a female, couleur should be ?verte?, > ?crayon? would be male, so ?vert" > t.avant(100) # avance/avancer > > > I can perfectly imagine a menu ?ins?rer use boucle `pour ...`?, that insert > a `for ....` in applications, > which is confusing is confusing to explain. > > I also find it much easier to attach a programming meaning to a word that > have no previous meaning for a kid (for, range, if, else, print are blank > slate > for French children), than shoehorn another concept biased by previous > experience into it. > > This in particular make me think of Gibiane[1], which is basically: > ?Hey fortran is great let?s make it in french?, which was a really bad > idea[2], > no it?s not a joke, and yes people do nuclear physics using this language. > > While I appreciate in general the translation effort, in general most of the > translated side of things (MDN, microsoft help pages, Apples ones) are much > worse than trying to understand the english originals. > > > So just a warning that the best is the enemy of the good, and despite good > intentions[3], > trying to translate Turtle module might not be the right thing to do. > Another opinion based on some experience: I use local-language names when teaching beginners. It gives a nice distinction between names provided by Python or a library (in English) and things that can be named arbitrarily. I haven't actually measured if this helps learning, though; and to the turtle module it might not apply at all. From gmludo at gmail.com Fri Sep 4 13:34:16 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Fri, 4 Sep 2015 13:34:16 +0200 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> Message-ID: I'm agree with Matthias: IT world is mostly English based. It's "sad" for non native English speakers like me because you must learn English before to work in IT. However, the positive side effect is that we can speak together in the common language. Ludovic Gasc (GMLudo) http://www.gmludo.eu/ On 4 Sep 2015 09:26, "Matthias Bussonnier" wrote: > Hi all, > > Personal opinion, base by a bit of experience: > > There is one thing worse than programming in a foreign language IMHO (I?m > native french) > It?s programming in an environment which is half translated and/or mix > english and native language. > > The cognitive load and the context switching it forces the brain to do > when 2 languages > are present is absolutely astronomical, and i guess translating the Turtle > module > will not allow to translate the control-flow structure, the > docstrings....etc and so on, > and so forth if you do simple `Tortue = Turtle` assignment, > So while it looks nice 2 liners example you hit this problem pretty > quickly. > > Taking fort given example: > > import turtle # import is english, should translate to importer, ou > importez. turtle should be tortue also. > t = turtle.Plume() > t.couleurplume('vert?) # plume is a female, couleur should be ?verte?, > ?crayon? would be male, so ?vert" > t.avant(100) # avance/avancer > > > I can perfectly imagine a menu ?ins?rer use boucle `pour ...`?, that > insert a `for ....` in applications, > which is confusing is confusing to explain. > > I also find it much easier to attach a programming meaning to a word that > have no previous meaning for a kid (for, range, if, else, print are blank > slate > for French children), than shoehorn another concept biased by previous > experience into it. > > This in particular make me think of Gibiane[1], which is basically: > ?Hey fortran is great let?s make it in french?, which was a really bad > idea[2], > no it?s not a joke, and yes people do nuclear physics using this language. > > While I appreciate in general the translation effort, in general most of > the > translated side of things (MDN, microsoft help pages, Apples ones) are > much > worse than trying to understand the english originals. > > > So just a warning that the best is the enemy of the good, and despite good > intentions[3], > trying to translate Turtle module might not be the right thing to do. > > Thanks, > -- > > Matthias > > [1]: https://fr.wikipedia.org/wiki/Gibiane > [2]: but not the worse IMHO. > [3]: > http://www.bloombergview.com/articles/2015-08-18/how-a-ban-on-plastic-bags-can-go-wrong > > > On Sep 4, 2015, at 05:52, Al Sweigart wrote: > > Thinking about it some more, yeah, having a separate module on PyPI would > just be a waste of time. This isn't changing functionality or experimenting > with new features, it's just adding new names to existing functions. And > installing stuff with pip is going to be insurmountable barrier for a lot > of computer labs. > > I'd say Python is very much a kid-friendly language. It's definitely much > friendlier than BASIC. > > I'd advise against using the _() function in gettext. That function is for > string tables, which is set up to be easily changed and expanded. The > turtle API is pretty much set in stone, and dealing with separate .po files > and gettext in general would be more of a maintenance headache. It is also > dependent on the machine's localization settings. > > I believe some simple code at the end of turtle.py like this would be good > enough: > > _spanish = {'forward': 'adelante'} # ...and the rest of the translated > terms > _languages = {'spanish': _spanish} # ...and the rest of the languages > > def forward(): # this is the original turtle forward() function > print('Blah blah blah, this is the forward() function.') > > for language in _languages: > for englishTerm, nonEnglishTerm in _languages[language].items(): > locals()[nonEnglishTerm] = locals()[englishTerm] > > Plus the diff wouldn't look too bad. > > This doesn't prohibit someone from mixing both English and Non-English > names in the same program, but I don't see that as a big problem. I think > it's best to have all the languages available without having to setup > localization settings. > > -Al > > On Thu, Sep 3, 2015 at 7:45 PM, Steven D'Aprano > wrote: > >> On Fri, Sep 04, 2015 at 11:05:51AM +0900, Stephen J. Turnbull wrote: >> > Al Sweigart writes: >> > >> > > The idea for putting these modules on PyPI is interesting. My only >> > > hesitation is I don't want "but it's already on PyPI" as an excuse >> > > not to include these changes into the standard library turtle >> > > module. >> > >> > Exactly backwards, as the first objection is going to be "if it could >> > be on PyPI but isn't, there's no evidence it's ready for the stdlib." >> >> *cough typing cough* >> >> >> The turtle module has been in Python for many, many years. This proposal >> doesn't change the functionality, it merely offers a localised API to >> the same functionality. A bunch of alternate names, nothing more. >> >> I would argue that if you consider the user-base of turtle, putting it >> on PyPI is a waste of time: >> >> - Beginners aren't going to know to "pip install whatever". Some of us >> here seem to think that pip is the answer to everything, but if you look >> on the python-list mailing list, you will see plenty of evidence that >> people have trouble using pip. >> >> - Schools may have policies against the installation of unapproved >> software on their desktops, and getting approval to "pip install *" may >> be difficult, time-consuming or outright impossible. If they are using >> Python, we know they have approval to use what is in the standard >> library. Everything else is, at best, a theorectical possibility. >> >> One argument against this proposal is that Python is not really designed >> as a kid-friendly learning language, and we should just abandon that >> space to languages that do it better, like Scratch. I'd hate to see that >> argument win, but given our limited resources perhaps we should know >> when we're beaten. Compared to what Scratch can do, turtle graphics are >> so very 1970s. >> >> But if we think that there is still a place in the Python infrastructure >> for turtle graphics, then I'm +1 on localising the turtle module. >> >> >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Sep 4 14:18:37 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 4 Sep 2015 22:18:37 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> Message-ID: <20150904121837.GN19373@ando.pearwood.info> On Fri, Sep 04, 2015 at 01:34:16PM +0200, Ludovic Gasc wrote: > I'm agree with Matthias: IT world is mostly English based. Fortunately for the 95% of the world who speak English as a second language, or not at all, that is changing. For example, StackOverflow has a very successful Brazilian site, and they make the case for non-English speakers well: https://blog.stackexchange.com/2014/02/cant-we-all-be-reasonable-and-speak-english/ Rather than just repeat what they say there, I will just ask everyone to read it. -- Steve From humbert at uni-wuppertal.de Fri Sep 4 14:19:19 2015 From: humbert at uni-wuppertal.de (Prof. Dr. L. Humbert) Date: Fri, 4 Sep 2015 14:19:19 +0200 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> Message-ID: <55E98C47.1070604@uni-wuppertal.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04.09.2015 13:34, Ludovic Gasc wrote: > I'm agree with Matthias: IT world is mostly English based. It's "sad" > for non native English speakers like me because you must learn > English before to work in IT. However, the positive side effect is > that we can speak together in the common language. This argument does not fit for students at K4-level so programming4all will not work. When constructing Ponto.py, (remote control for OpenOffice.org and LibreOffice via Python) we decided to write PontoE.py to enable students with English to use it, but also PontoD.py to use those classes, which are German-based - because of the bavarian textbooks for Informatik at the age of 11, which use German identifiers for classes, attributes and methods. You may take a look at to get a glimpse, how we dealt in automating the process to get the http://www.ham.nw.schule.de/pub/bscw.cgi/100606 The README.txt points out how the process it managed. Our actual approach without ?internationalisierung? for Python3: http://www.ham.nw.schule.de/pub/bscw.cgi/2131956 TNX Ludger - -- https://twitter.com/n770 http://ddi.uni-wuppertal.de/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlXpjEYACgkQJQsN9FQ+jJ9gggCgiCO4V7oDF9QSFcoMkhd3GarW 1S8Ani7a5F7TlPe982q7ggWlGOTy5z0h =MpyW -----END PGP SIGNATURE----- From steve at pearwood.info Fri Sep 4 19:27:10 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 5 Sep 2015 03:27:10 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> Message-ID: <20150904172710.GO19373@ando.pearwood.info> On Fri, Sep 04, 2015 at 12:18:52AM -0700, Andrew Barnert wrote: > On Sep 3, 2015, at 19:45, Steven D'Aprano wrote: > > > > - Beginners aren't going to know to "pip install whatever". Some of us > > here seem to think that pip is the answer to everything, but if you look > > on the python-list mailing list, you will see plenty of evidence that > > people have trouble using pip. > > Of course a sizable chunk of those say "my Python didn't come with > pip" and the after a bit of exploration you find that they're using > Python 2.7.3 or something, so any feature added to Python 3.6 isn't > likely to help them anyway. You say "of course", but did you actually look at the python-list archives? If you do, you will see posts like these two within the last 24 hours: [quote] I am running Python 3.4 on Windows 7 and is facing [Error 13] Permission Denied while installing Python packages... [end quote] and: [quote] Well I have certainly noted more than once that pip is contained in Python 3.4. But I am having the most extreme problems with simply typing "pip" into my command prompt and then getting back the normal information on pip! [end quote] And a random selection of other issues which I just happen to still have visible in my news reader: [quote] Python 2.7.9 and later (on the python2 series), and Python 3.4 and later include pip by default. But I can not find it in python2.7.10 package. What's the matter? How can i install pip on my Embedded device? [end quote] [quote] I've installed a fresh copy of Python 3.5.0b2 and - as recommended - upgraded pip. I don't understand the reason for the permission errors as I am owner and have full control for the temporary directory created. [end quote] [quote] I was fed up with trying to install from pypi to Windows. Setup.py more often than not wouldn't be able to find the VS compiler. So I thought I'd try the direct route to the excellent Christoph Gohlke site at http://www.lfd.uci.edu/~gohlke/pythonlibs/ which is all whl files these days. However as you can see below despite my best efforts I'm still processing the tar.gz file, so what am I doing wrong? [end quote] (Some spelling errors and obvious typos corrected.) Please don't dismiss out of hand the actual experience of real users with pip. At least one of those quotes above is from a long-time Python regular who knows his way around the command line. This is not meant as an anti-pip screed, so please don't read it as such. But it is meant as a reminder that pip is not perfect, and that even experienced Python developers can have trouble installing packages. Children with no experience with the command line or Python can not be expected to install packages from PyPI without assistence, and if they are using school computers, they simply may not be permitted to run "pip install" even if it worked flawlessly. -- Steve From gmludo at gmail.com Fri Sep 4 21:56:13 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Fri, 4 Sep 2015 21:56:13 +0200 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <20150904121837.GN19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> <20150904121837.GN19373@ando.pearwood.info> Message-ID: Thank you for the link, it's interesting. However, my remark it's mainly for the source code: Even if I sincerely think it's better to handle English the most possible you can, I see no issues to discuss about source code in your native language: I speak myself in French when I interact with French developers only in my company. Nevertheless, for the content of the source code or database structure, at least to me, you must write in English: I've already analysed source code in Dutch, it was a lot more complicated to understand the code, I've lost a lot of time for nothing. The world is now global and dev resources are enough rare to avoid to lock your source code content in a local language. See for example the big work of LibreOffice to translate German comments: https://wiki.documentfoundation.org/Development/Easy_Hacks/Translation_Of_Comments With a localized turtle, you should give a bad habit at the beginning. -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2015-09-04 14:18 GMT+02:00 Steven D'Aprano : > On Fri, Sep 04, 2015 at 01:34:16PM +0200, Ludovic Gasc wrote: > > I'm agree with Matthias: IT world is mostly English based. > > Fortunately for the 95% of the world who speak English as a second > language, or not at all, that is changing. For example, StackOverflow > has a very successful Brazilian site, and they make the case for > non-English speakers well: > > > https://blog.stackexchange.com/2014/02/cant-we-all-be-reasonable-and-speak-english/ > > Rather than just repeat what they say there, I will just ask everyone to > read it. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephane at wirtel.be Fri Sep 4 22:02:46 2015 From: stephane at wirtel.be (=?utf-8?q?St=C3=A9phane?= Wirtel) Date: Fri, 04 Sep 2015 22:02:46 +0200 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> <20150904121837.GN19373@ando.pearwood.info> Message-ID: I do agree with the comments of Ludovic about the source code, Python is in English, the code for an open source project is international, and in this case, I prefer English. In the past, I have seen some databases in french, with the accents ?, ?, ? in the columns of the database, that?s really ugly :/ because if you forgot the encoding in the database, you will have a problem. Yesterday, I have read a source code in Dutch, sorry but if you don?t know this language, good luck if you want to change the code. And an other example, the comments in the code of the EuroPython site is in Italian, ok, the project has been developed for PyCon Italia, but now, we are a lot of international developers on this code, and sincerely, I can speak Italian but not everybody. Sincerely, English for the code and the database! On 4 Sep 2015, at 21:56, Ludovic Gasc wrote: > Thank you for the link, it's interesting. > > However, my remark it's mainly for the source code: Even if I > sincerely > think it's better to handle English the most possible you can, I see > no > issues to discuss about source code in your native language: I speak > myself > in French when I interact with French developers only in my company. > > Nevertheless, for the content of the source code or database > structure, at > least to me, you must write in English: I've already analysed source > code > in Dutch, it was a lot more complicated to understand the code, I've > lost a > lot of time for nothing. > The world is now global and dev resources are enough rare to avoid to > lock > your source code content in a local language. > > See for example the big work of LibreOffice to translate German > comments: > https://wiki.documentfoundation.org/Development/Easy_Hacks/Translation_Of_Comments > > With a localized turtle, you should give a bad habit at the beginning. > > -- > Ludovic Gasc (GMLudo) > http://www.gmludo.eu/ > > 2015-09-04 14:18 GMT+02:00 Steven D'Aprano : > >> On Fri, Sep 04, 2015 at 01:34:16PM +0200, Ludovic Gasc wrote: >>> I'm agree with Matthias: IT world is mostly English based. >> >> Fortunately for the 95% of the world who speak English as a second >> language, or not at all, that is changing. For example, StackOverflow >> has a very successful Brazilian site, and they make the case for >> non-English speakers well: >> >> >> https://blog.stackexchange.com/2014/02/cant-we-all-be-reasonable-and-speak-english/ >> >> Rather than just repeat what they say there, I will just ask everyone >> to >> read it. >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- St?phane Wirtel - http://wirtel.be - @matrixise From asweigart at gmail.com Fri Sep 4 22:31:24 2015 From: asweigart at gmail.com (Al Sweigart) Date: Fri, 4 Sep 2015 13:31:24 -0700 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> <20150904121837.GN19373@ando.pearwood.info> Message-ID: I completely agree that Python and codebases already in English should remain in English. And I want the source code for turtle.py to stay in English as well. This is where the gray area for Turtle begins though. Turtle is not for professional developers, where the English expectation is there. It is used for school kids to program in, and this code will most likely be forgotten about a week after the programming assignment is done. And in a sense, turtle.py is not used as a module by kids so much as an app (albeit a scriptable one) that moves the turtle around and draws shapes. The language barrier is a very real one for non-technical instructors, parents, and students. If we could minimize it down to less than a dozen Python keywords & names (import turtle, for, in, range, while, if, else) that would be a significant gain for Python's reach. And I don't think it would be much technical debt for turtle.py. I hope to have a complete translated set soon so I can submit a patch that shows how light of a change this would be. -Al On Fri, Sep 4, 2015 at 1:02 PM, St?phane Wirtel wrote: > I do agree with the comments of Ludovic about the source code, Python is > in English, the code for an open source project is international, and in > this case, I prefer English. > > In the past, I have seen some databases in french, with the accents ?, ?, > ? in the columns of the database, that?s really ugly :/ because if you > forgot the encoding in the database, you will have a problem. > > Yesterday, I have read a source code in Dutch, sorry but if you don?t know > this language, good luck if you want to change the code. > > And an other example, the comments in the code of the EuroPython site is > in Italian, ok, the project has been developed for PyCon Italia, but now, > we are a lot of international developers on this code, and sincerely, I can > speak Italian but not everybody. > > Sincerely, English for the code and the database! > > > On 4 Sep 2015, at 21:56, Ludovic Gasc wrote: > > Thank you for the link, it's interesting. >> >> However, my remark it's mainly for the source code: Even if I sincerely >> think it's better to handle English the most possible you can, I see no >> issues to discuss about source code in your native language: I speak >> myself >> in French when I interact with French developers only in my company. >> >> Nevertheless, for the content of the source code or database structure, at >> least to me, you must write in English: I've already analysed source code >> in Dutch, it was a lot more complicated to understand the code, I've lost >> a >> lot of time for nothing. >> The world is now global and dev resources are enough rare to avoid to lock >> your source code content in a local language. >> >> See for example the big work of LibreOffice to translate German comments: >> >> https://wiki.documentfoundation.org/Development/Easy_Hacks/Translation_Of_Comments >> >> With a localized turtle, you should give a bad habit at the beginning. >> >> -- >> Ludovic Gasc (GMLudo) >> http://www.gmludo.eu/ >> >> 2015-09-04 14:18 GMT+02:00 Steven D'Aprano : >> >> On Fri, Sep 04, 2015 at 01:34:16PM +0200, Ludovic Gasc wrote: >>> >>>> I'm agree with Matthias: IT world is mostly English based. >>>> >>> >>> Fortunately for the 95% of the world who speak English as a second >>> language, or not at all, that is changing. For example, StackOverflow >>> has a very successful Brazilian site, and they make the case for >>> non-English speakers well: >>> >>> >>> >>> https://blog.stackexchange.com/2014/02/cant-we-all-be-reasonable-and-speak-english/ >>> >>> Rather than just repeat what they say there, I will just ask everyone to >>> read it. >>> >>> >>> -- >>> Steve >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -- > St?phane Wirtel - http://wirtel.be - @matrixise > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 4 23:05:05 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 4 Sep 2015 14:05:05 -0700 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <20150904172710.GO19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> Message-ID: I find it really annoying when people pick one sentence out of a post to argue against at length, out of context. while entirely ignoring the actual substance of the post. Are you sincerely arguing that no children out there will have Python 3.5, 3.3, or 2.7, or that for all such student upgrading to 3.6 will be easier and face fewer permissions problems than using pip? If not, then how does this answer my point that some people will want this on PyPI even if it's in the 3.6 stdlib? Sent from my iPhone > On Sep 4, 2015, at 10:27, Steven D'Aprano wrote: > >> On Fri, Sep 04, 2015 at 12:18:52AM -0700, Andrew Barnert wrote: >> >>> On Sep 3, 2015, at 19:45, Steven D'Aprano wrote: >>> >>> - Beginners aren't going to know to "pip install whatever". Some of us >>> here seem to think that pip is the answer to everything, but if you look >>> on the python-list mailing list, you will see plenty of evidence that >>> people have trouble using pip. >> >> Of course a sizable chunk of those say "my Python didn't come with >> pip" and the after a bit of exploration you find that they're using >> Python 2.7.3 or something, so any feature added to Python 3.6 isn't >> likely to help them anyway. > > You say "of course", but did you actually look at the python-list > archives? If you do, you will see posts like these two within the last > 24 hours: > > [quote] > I am running Python 3.4 on Windows 7 and is facing [Error 13] > Permission Denied while installing Python packages... > [end quote] > > and: > > [quote] > Well I have certainly noted more than once that pip is contained in > Python 3.4. But I am having the most extreme problems with simply typing > "pip" into my command prompt and then getting back the normal > information on pip! > [end quote] > > And a random selection of other issues which I just happen to still > have visible in my news reader: > > [quote] > Python 2.7.9 and later (on the python2 series), and Python 3.4 and > later include pip by default. But I can not find it in python2.7.10 > package. What's the matter? How can i install pip on my Embedded device? > [end quote] > > [quote] > I've installed a fresh copy of Python 3.5.0b2 and - as recommended - > upgraded pip. I don't understand the reason for the permission errors as > I am owner and have full control for the temporary directory created. > [end quote] > > [quote] > I was fed up with trying to install from pypi to Windows. Setup.py more > often than not wouldn't be able to find the VS compiler. So I thought > I'd try the direct route to the excellent Christoph Gohlke site at > http://www.lfd.uci.edu/~gohlke/pythonlibs/ which is all whl files these > days. However as you can see below despite my best efforts I'm still > processing the tar.gz file, so what am I doing wrong? > [end quote] > > > (Some spelling errors and obvious typos corrected.) > > Please don't dismiss out of hand the actual experience of real users > with pip. At least one of those quotes above is from a long-time Python > regular who knows his way around the command line. > > This is not meant as an anti-pip screed, so please don't read it as > such. But it is meant as a reminder that pip is not perfect, and that > even experienced Python developers can have trouble installing packages. > Children with no experience with the command line or Python can not be > expected to install packages from PyPI without assistence, and if they > are using school computers, they simply may not be permitted to run "pip > install" even if it worked flawlessly. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From asweigart at gmail.com Fri Sep 4 23:43:32 2015 From: asweigart at gmail.com (Al Sweigart) Date: Fri, 4 Sep 2015 14:43:32 -0700 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> Message-ID: I see your point. I think there are two different arguments here: It would be good to have non-English turtle modules of PyPI for older versions of Python. But it would also be good to have non-English names added to the turtle module in the 3.6 stdlib. My main concern was that if these modules were on PyPI, they would be left out of the standard library. Then the "install from PyPI headache" arguments would apply. -Al On Fri, Sep 4, 2015 at 2:05 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > I find it really annoying when people pick one sentence out of a post to > argue against at length, out of context. while entirely ignoring the actual > substance of the post. > > Are you sincerely arguing that no children out there will have Python 3.5, > 3.3, or 2.7, or that for all such student upgrading to 3.6 will be easier > and face fewer permissions problems than using pip? If not, then how does > this answer my point that some people will want this on PyPI even if it's > in the 3.6 stdlib? > > Sent from my iPhone > > > On Sep 4, 2015, at 10:27, Steven D'Aprano wrote: > > > >> On Fri, Sep 04, 2015 at 12:18:52AM -0700, Andrew Barnert wrote: > >> > >>> On Sep 3, 2015, at 19:45, Steven D'Aprano wrote: > >>> > >>> - Beginners aren't going to know to "pip install whatever". Some of us > >>> here seem to think that pip is the answer to everything, but if you > look > >>> on the python-list mailing list, you will see plenty of evidence that > >>> people have trouble using pip. > >> > >> Of course a sizable chunk of those say "my Python didn't come with > >> pip" and the after a bit of exploration you find that they're using > >> Python 2.7.3 or something, so any feature added to Python 3.6 isn't > >> likely to help them anyway. > > > > You say "of course", but did you actually look at the python-list > > archives? If you do, you will see posts like these two within the last > > 24 hours: > > > > [quote] > > I am running Python 3.4 on Windows 7 and is facing [Error 13] > > Permission Denied while installing Python packages... > > [end quote] > > > > and: > > > > [quote] > > Well I have certainly noted more than once that pip is contained in > > Python 3.4. But I am having the most extreme problems with simply typing > > "pip" into my command prompt and then getting back the normal > > information on pip! > > [end quote] > > > > And a random selection of other issues which I just happen to still > > have visible in my news reader: > > > > [quote] > > Python 2.7.9 and later (on the python2 series), and Python 3.4 and > > later include pip by default. But I can not find it in python2.7.10 > > package. What's the matter? How can i install pip on my Embedded device? > > [end quote] > > > > [quote] > > I've installed a fresh copy of Python 3.5.0b2 and - as recommended - > > upgraded pip. I don't understand the reason for the permission errors as > > I am owner and have full control for the temporary directory created. > > [end quote] > > > > [quote] > > I was fed up with trying to install from pypi to Windows. Setup.py more > > often than not wouldn't be able to find the VS compiler. So I thought > > I'd try the direct route to the excellent Christoph Gohlke site at > > http://www.lfd.uci.edu/~gohlke/pythonlibs/ which is all whl files these > > days. However as you can see below despite my best efforts I'm still > > processing the tar.gz file, so what am I doing wrong? > > [end quote] > > > > > > (Some spelling errors and obvious typos corrected.) > > > > Please don't dismiss out of hand the actual experience of real users > > with pip. At least one of those quotes above is from a long-time Python > > regular who knows his way around the command line. > > > > This is not meant as an anti-pip screed, so please don't read it as > > such. But it is meant as a reminder that pip is not perfect, and that > > even experienced Python developers can have trouble installing packages. > > Children with no experience with the command line or Python can not be > > expected to install packages from PyPI without assistence, and if they > > are using school computers, they simply may not be permitted to run "pip > > install" even if it worked flawlessly. > > > > > > > > -- > > Steve > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 5 04:59:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 4 Sep 2015 19:59:42 -0700 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> Message-ID: <1D1D750F-C7D2-47B9-8915-255DAEB09FB3@yahoo.com> On Sep 4, 2015, at 14:43, Al Sweigart wrote: > > I see your point. I think there are two different arguments here: It would be good to have non-English turtle modules of PyPI for older versions of Python. But it would also be good to have non-English names added to the turtle module in the 3.6 stdlib. > > My main concern was that if these modules were on PyPI, they would be left out of the standard library. Then the "install from PyPI headache" arguments would apply. I understand, but I think that concern is misplaced. Having something on PyPI generally makes it easier, not harder, to get it into the stdlib. And it's also useful on its own, because not everyone has 3.6. The problem is that it seems like the obvious way to design the PyPI version and the stdlib version would be pretty different (unless you want to explain to novices how to install backports packages and import things conditionally). But hopefully someone can come up with a good solution to that? From stephen at xemacs.org Sat Sep 5 08:01:12 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 05 Sep 2015 15:01:12 +0900 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> Message-ID: <87oahh2ymv.fsf@uwakimon.sk.tsukuba.ac.jp> Al Sweigart writes: > Thinking about it some more, yeah, having a separate module on PyPI would > just be a waste of time. Python 2.7. 'nuff said, I hope. From rosuav at gmail.com Sat Sep 5 08:32:48 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 5 Sep 2015 16:32:48 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <87oahh2ymv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <87oahh2ymv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Sep 5, 2015 at 4:01 PM, Stephen J. Turnbull wrote: > Al Sweigart writes: > > > Thinking about it some more, yeah, having a separate module on PyPI would > > just be a waste of time. > > Python 2.7. 'nuff said, I hope. If someone's just learning to program, surely s/he can learn on Python 3 rather than Python 2. If localized names aren't available on Py2, so be it. (For the record, I would be in favour of it being on PyPI. I just don't think that "Python 2.7" is sufficient argument for that.) ChrisA From stephen at xemacs.org Sat Sep 5 08:46:09 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 05 Sep 2015 15:46:09 +0900 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> <20150904121837.GN19373@ando.pearwood.info> Message-ID: <87mvx12wjy.fsf@uwakimon.sk.tsukuba.ac.jp> Al Sweigart writes: > If we could minimize it down to less than a dozen Python keywords & > names (import turtle, for, in, range, while, if, else) that would > be a significant gain for Python's reach. I don't see why you would want any non-localized identifiers (modules, functions) at all in the base feature set. So you can (and I think should) drop range and turtle. I don't see any point in discussing the keywords here -- they are what they are. If a student decides to use something weird like "continue" or "try ... finally" it will work. Of course, the recommended set of syntaxes and their associated keywords matter a lot pedagogically, but we can leave that discussion to the pedagogues. When you're wearing your pedagogue hat and actually writing the style guide for teaching programming using turtle, then those interested can talk about that. Or maybe that should be left to experimentation. Some teachers may prefer to avoid "while condition: suite" in of "for i in iterable: if condition: suite". (Normally that would be nuts, of course, but here you could reduce the base set of keywords and syntaxes by one each.) There may be reasons why advanced users (and teachers) might want to use "non-base" facilities. There it's possible to do things like >>> from builtins import range as interval, print as output >>> for i in interval(2): output(i) 0 1 >>> which allows teachers to add any extensions they like conveniently, albeit verbosely. I still think an i18n-based architecture is the way to go, to minimize such boilerplate (error-prone for translators) among other reasons. Neither users nor translators need to see it, unless they want to see "how'd they do that?" In which case, isn't more "educational" to show them the way it's done in the real world? From stephen at xemacs.org Sat Sep 5 09:08:24 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 05 Sep 2015 16:08:24 +0900 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <20150904172710.GO19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> Message-ID: <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > You say "of course", but did you actually look at the python-list > archives? If you do, you will see posts like these two within the last > 24 hours: So let's fix it, already![1] Now that we have a blessed package management module, why not have a builtin that handles the simple cases? Say def installer(package, command='install'): ... where command would also take values 'status' (in which case package could be None, meaning list all installed packages, and 'status' might check for available upgrades as well as stating whether the package is known to this python instance), 'upgrade', 'install' (which might error if the package is already installed, since I envision installations taking place in the user's space which won't work for upgrading stdlib packages in a system Python, at least on Windows), and maybe 'remove'. I'm not real happy with the name "installer", but I chose it to imply that there is a command argument, and that it can do more than just install new packages. In general, I would say installer() should fail-safe (to the point of fail-annoying), and point to the pip (and maybe venv) docs. It should also be verbose (eg, explaining that it only knows how to install for the current user and things like that). Footnotes: [1] This really is not relevant to the "localized turtle" thread. If the current situation is acceptable in general, it's not an argument for putting turtle localizations in the stdlib. If it's not acceptable, well, let's fix it. From rustompmody at gmail.com Sat Sep 5 09:09:08 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Sat, 5 Sep 2015 00:09:08 -0700 (PDT) Subject: [Python-ideas] Packaging systems (was Non-English names in the turtle module) In-Reply-To: <20150904172710.GO19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> Message-ID: On Friday, September 4, 2015 at 10:57:42 PM UTC+5:30, Steven D'Aprano wrote: > > On Fri, Sep 04, 2015 at 12:18:52AM -0700, Andrew Barnert wrote: > > > On Sep 3, 2015, at 19:45, Steven D'Aprano > wrote: > > > > > > - Beginners aren't going to know to "pip install whatever". Some of us > > > here seem to think that pip is the answer to everything, but if you > look > > > on the python-list mailing list, you will see plenty of evidence that > > > people have trouble using pip. > > > > Of course a sizable chunk of those say "my Python didn't come with > > pip" and the after a bit of exploration you find that they're using > > Python 2.7.3 or something, so any feature added to Python 3.6 isn't > > likely to help them anyway. > > You say "of course", but did you actually look at the python-list > archives? If you do, you will see posts like these two within the last > 24 hours: > > [quote] > I am running Python 3.4 on Windows 7 and is facing [Error 13] > Permission Denied while installing Python packages... > [end quote] > > and: > > [quote] > Well I have certainly noted more than once that pip is contained in > Python 3.4. But I am having the most extreme problems with simply typing > "pip" into my command prompt and then getting back the normal > information on pip! > [end quote] > > And a random selection of other issues which I just happen to still > have visible in my news reader: > > [quote] > Python 2.7.9 and later (on the python2 series), and Python 3.4 and > later include pip by default. But I can not find it in python2.7.10 > package. What's the matter? How can i install pip on my Embedded device? > [end quote] > > [quote] > I've installed a fresh copy of Python 3.5.0b2 and - as recommended - > upgraded pip. I don't understand the reason for the permission errors as > I am owner and have full control for the temporary directory created. > [end quote] > > [quote] > I was fed up with trying to install from pypi to Windows. Setup.py more > often than not wouldn't be able to find the VS compiler. So I thought > I'd try the direct route to the excellent Christoph Gohlke site at > http://www.lfd.uci.edu/~gohlke/pythonlibs/ which is all whl files these > days. However as you can see below despite my best efforts I'm still > processing the tar.gz file, so what am I doing wrong? > [end quote] > > > (Some spelling errors and obvious typos corrected.) > > Please don't dismiss out of hand the actual experience of real users > with pip. At least one of those quotes above is from a long-time Python > regular who knows his way around the command line. > > This is not meant as an anti-pip screed, so please don't read it as > such. But it is meant as a reminder that pip is not perfect, and that > even experienced Python developers can have trouble installing packages. > Children with no experience with the command line or Python can not be > expected to install packages from PyPI without assistence, and if they > are using school computers, they simply may not be permitted to run "pip > install" even if it worked flawlessly. > > Packaging systems suffer from a Law: The quality of the packaging-system is inversely proportional to the quality of the language being packaged [Corollary to the invariable NIH syndrome that programmers suffer from] Notice: Haskell' hackage : Terrible Python's pip: Bad Ruby's gems: Ok Perl's CPAN : Good Debian's apt (mishmash of perl, shell and other unspeakbles) : Superb -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Sep 5 09:31:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 5 Sep 2015 17:31:24 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <20150904024552.GL19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> Message-ID: On 4 September 2015 at 12:45, Steven D'Aprano wrote: > One argument against this proposal is that Python is not really designed > as a kid-friendly learning language, and we should just abandon that > space to languages that do it better, like Scratch. I'd hate to see that > argument win, but given our limited resources perhaps we should know > when we're beaten. Compared to what Scratch can do, turtle graphics are > so very 1970s. Block based languages are to text based ones as picture books are to the written word - to get the combinatorial power of language into play, you need to be learning systems that have the capacity to be self hosting. You can write a Python interpreter in Python, but you can't write a Scratch environment in Scratch. This is reflected in the way primary schools digital environment curricula are now being designed - initial concepts of algorithms and flow control can be introduced without involving a computer at all (e.g. through games like Robot Turtles), then block based programming in environments like Scratch introduce the use of computers in a way that doesn't require particularly fine motor control or spelling skills. However, a common aspect I've seen talking to teachers from Australia, the US and the UK is that the aim is always to introduce kids to the full combinatorial power of a text based programming environment like Python, since that's what unlocks the ability to use computers to manipulate real world data and interfaces, rather than just a local constrained environment like the one in Scratch. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rustompmody at gmail.com Sat Sep 5 08:42:54 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Fri, 4 Sep 2015 23:42:54 -0700 (PDT) Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> Message-ID: <9816c9aa-0358-4d2a-b67b-e58ace40a0ac@googlegroups.com> On Friday, September 4, 2015 at 12:56:33 PM UTC+5:30, Matthias Bussonnier wrote: > > Hi all, > > Personal opinion, base by a bit of experience: > And my personal experience from the (an) other side: Some students of mine worked to add devanagari to python: https://github.com/rusimody/l10Python [Re-copying something here from a post on dev list What I would wish to add is tl;dr at bottom ] Here's an REPL-session to demo: [Note ?????????? is devanagari equivalent of 1234567890] -------------------------------------------------- Python 3.5.0b2 (default, Jul 30 2015, 19:32:42) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> ?? 12 >>> 23 == ?? True >>> ?? + ?? 46 >>> ?? + 34 46 >>> "12" == "??" False >>> 2 ? 3 True >>> 2 ? 3 True >>> (? x: x+3)(4) 7 >>> # as a result of which this doesn't work... I did say they are kids! ... >>> ? = 3 File "", line 1 ? = 3 ^ SyntaxError: invalid syntax >>> {1,2,3} ? {2,3,4} {2, 3} >>> {1,2,3} ? {2,3,4}\{2, 3} >>> {1,2,3} ? {2,3,4} {1, 2, 3, 4} >>> ? True False >>> ?([1,2,3,4]) 10 >>> ---------------------------------------------- The last is actually more an embarrassment than the ? breaking since they?ve *changed the lexer* to read the ? when all that was required was ? = sum !! In short... Kids! tl;dr For me (yes an educated Indian) English is a natural first language However the idea that English is the *only* language is about as quaint the idea that the extent of the universe is as vast as a 100 kilometers with Garden of Eden in the center. Our tiny experience with internationalizing python showed us that the lexer (at least) is terribly ASCII centric My immediate wish-list: Modularize the lexer into a pre-lexer converting UTF-8 to unicode codepoint followed by a pure unicode 32-bit codepoint based lexer My long-term wishlist (yeah somewhat unrealistic and utopian) for python 4000 is the increased awareness that the only reasonable international language is mathematics. Or http://blog.languager.org/2014/04/unicoded-python.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Sep 5 10:12:45 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 5 Sep 2015 18:12:45 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> Message-ID: On 5 September 2015 at 07:43, Al Sweigart wrote: > I see your point. I think there are two different arguments here: It would > be good to have non-English turtle modules of PyPI for older versions of > Python. But it would also be good to have non-English names added to the > turtle module in the 3.6 stdlib. > > My main concern was that if these modules were on PyPI, they would be left > out of the standard library. Then the "install from PyPI headache" arguments > would apply. The last major upgrade to turtle was the adoption of Gregor Lindl's xturtle for 2.6 as the standard turtle implementation, and he iterated on that as an external project for a while first. I think this is another case where a similar approach would work well - you could create a new "eduturtle" project as a fork of the current turtle module to allow more rapid iteration and feedback from educators unconstrained by the standard library's release cycle, and then propose it for default inclusion in 3.6. Another potentially desirable thing that could be explored within such a "turtle upgrade" project is switching it to using a HTML5 canvas as its drawing surface, rather than relying on Tkinter. We made a similar change a while ago with PyDoc, and while the pages generated by the local web service could definitely use some TLC from a front-end designer, I think it was a good call. That "HTML5-compatible-browser-as-GUI-framework" model is also the way IPython Notebook went for data analysis, and it unlocks an incredibly rich world of visualisation capabilities, that are not only useful in full browsers, but also in HTML widgets in desktop and mobile GUI frameworks. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Sep 5 10:22:48 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 5 Sep 2015 18:22:48 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: <87mvx12wjy.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <35076B57-2F85-4632-B522-47BDE868567C@gmail.com> <20150904121837.GN19373@ando.pearwood.info> <87mvx12wjy.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5 September 2015 at 16:46, Stephen J. Turnbull wrote: > I still think an i18n-based architecture is the way to go, to minimize > such boilerplate (error-prone for translators) among other reasons. > Neither users nor translators need to see it, unless they want to see > "how'd they do that?" In which case, isn't more "educational" to show > them the way it's done in the real world? Sorting out such procedural questions *iteratively* is one of the reasons I think it's desirable to pursue this externally first. There are a *lot* of possible options here, including using a full translation platform like Zanata or Pootle to manage the interaction with translators, and also a need to allow teachers of students that aren't native English speakers to easily try out the revised module *before* we commit to adding it to CPython. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Sep 5 10:30:21 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 5 Sep 2015 18:30:21 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5 September 2015 at 17:08, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > You say "of course", but did you actually look at the python-list > > archives? If you do, you will see posts like these two within the last > > 24 hours: > > So let's fix it, already![1] Now that we have a blessed package > management module, why not have a builtin that handles the simple > cases? Running "python -m pip" instead of "pip" already avoids many of the issues with PATH configuration, which is one of the reasons why that's what I recommend in the main Python docs at https://docs.python.org/3/installing/ & https://docs.python.org/2/installing/ Unfortunately, I've yet to convince the rest of PyPA (let alone the community at large) that telling people to call "pip" directly is *bad advice* (as it breaks in too many cases that beginners are going to encounter), so it would be helpful if folks helping beginners on python-list and python-tutor could provide feedback supporting that perspective by filing an issue against https://github.com/pypa/python-packaging-user-guide Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Sat Sep 5 10:46:20 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 05 Sep 2015 17:46:20 +0900 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <87oahh2ymv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87egid2qzn.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > If someone's just learning to program, surely s/he can learn on > Python 3 rather than Python 2. In any case, I could replace "2.7" with the (not yet released!) "3.5". 'nuff said, yet? Anyway, "learn with Python 3" is what we advocate, but the reality is otherwise in some places, and many platforms (eg, Mac). None of my students download Python 3 until I tell them to. They thought the system Python would be as "up to date" as the "Retina" display! I wouldn't be surprised if there aren't a lot of Mac systems in schools where many teachers just use the Python that is already there, and are far more likely to "pip tortuga" than they are to install a whole parallel Python 3. From bussonniermatthias at gmail.com Sat Sep 5 15:22:02 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sat, 5 Sep 2015 15:22:02 +0200 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I do have this package[1] That allow you to do `pip install <....>` from within an IPython session and will call the pip of the current python by importing pip instead of calling a subprocess. One of the things I would like is for that to actually wrap pip on python.org-installed python, and conda on conda-installed python. So if such a proposal is integrating into Python, it would be nice to have hooks that allow to "hide" which package manager is used under the hood. -- M [1]: https://pypi.python.org/pypi/pip_magic On Sat, Sep 5, 2015 at 10:30 AM, Nick Coghlan wrote: > On 5 September 2015 at 17:08, Stephen J. Turnbull wrote: >> Steven D'Aprano writes: >> >> > You say "of course", but did you actually look at the python-list >> > archives? If you do, you will see posts like these two within the last >> > 24 hours: >> >> So let's fix it, already![1] Now that we have a blessed package >> management module, why not have a builtin that handles the simple >> cases? > > Running "python -m pip" instead of "pip" already avoids many of the > issues with PATH configuration, which is one of the reasons why that's > what I recommend in the main Python docs at > https://docs.python.org/3/installing/ & > https://docs.python.org/2/installing/ > > Unfortunately, I've yet to convince the rest of PyPA (let alone the > community at large) that telling people to call "pip" directly is *bad > advice* (as it breaks in too many cases that beginners are going to > encounter), so it would be helpful if folks helping beginners on > python-list and python-tutor could provide feedback supporting that > perspective by filing an issue against > https://github.com/pypa/python-packaging-user-guide > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Sat Sep 5 17:24:02 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 6 Sep 2015 01:24:02 +1000 Subject: [Python-ideas] Non-English names in the turtle module. In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> Message-ID: <20150905152402.GU19373@ando.pearwood.info> On Fri, Sep 04, 2015 at 02:05:05PM -0700, Andrew Barnert wrote: > I find it really annoying when people pick one sentence out of a post > to argue against at length, out of context. while entirely ignoring > the actual substance of the post. Your post was three rather short paragraphs. I ignored the first paragraph because it had nothing to do with me, and I don't know the answer. I didn't respond to the third paragraph because I thought the conclusion (that getting permission to install pip would be easier than getting the most up-to-date version of Python installed) was unlikely at best, but regardless, you used enough weasel words ("seems like ... I'm guessing ... is probably ...") that it would be churlish to argue. Who knows? Yes, there could be some teachers who get permission for their students to install anything they like with pip but aren't allowed to upgrade to the latest version of Python. It's a big world and IT departments sometimes appear to choose their policies at random. I focused on the second paragraph because that was the comment you made that I want to respond to, namely that a sizeable chunk of problems with pip is that pip isn't installed. To reiterate, I don't believe that is the case, based on what I see on the python-list mailing list. Judging by the comments in the "packaging" subthread, this may have hit a chord with at least some others. > Are you sincerely arguing that no children out there will have Python > 3.5, 3.3, or 2.7, No. > or that for all such student upgrading to 3.6 will > be easier and face fewer permissions problems than using pip? For "all" of them? Probably not. > If not, > then how does this answer my point that some people will want this on > PyPI even if it's in the 3.6 stdlib? I didn't respond to that point. If you want me to respond, I'll say that I consider it unlikely that putting it on PyPI will be of much practical utility, given the user-base for turtle, but if people want to do both, it's not likely to do much harm either. -- Steve From steve at pearwood.info Sat Sep 5 17:38:01 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 6 Sep 2015 01:38:01 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150905153801.GV19373@ando.pearwood.info> On Sat, Sep 05, 2015 at 04:08:24PM +0900, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > You say "of course", but did you actually look at the python-list > > archives? If you do, you will see posts like these two within the last > > 24 hours: > > So let's fix it, already![1] Now that we have a blessed package > management module, why not have a builtin that handles the simple > cases? Say > > def installer(package, command='install'): > ... Python competes strongly with R in the scientific software area, and R supports a built-in to do just that: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/install.packages.html -- Steve From donald at stufft.io Sat Sep 5 18:38:29 2015 From: donald at stufft.io (Donald Stufft) Date: Sat, 5 Sep 2015 12:38:29 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <20150905153801.GV19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150905153801.GV19373@ando.pearwood.info> Message-ID: On September 5, 2015 at 11:40:17 AM, Steven D'Aprano (steve at pearwood.info) wrote: > On Sat, Sep 05, 2015 at 04:08:24PM +0900, Stephen J. Turnbull wrote: > > Steven D'Aprano writes: > > > > > You say "of course", but did you actually look at the python-list > > > archives? If you do, you will see posts like these two within the last > > > 24 hours: > > > > So let's fix it, already![1] Now that we have a blessed package > > management module, why not have a builtin that handles the simple > > cases? Say > > > > def installer(package, command='install'): > > ... > > Python competes strongly with R in the scientific software area, and R > supports a built-in to do just that: > > https://stat.ethz.ch/R-manual/R-devel/library/utils/html/install.packages.html > I don't know anything about R, but a built in function is a bad idea. It'll be a pretty big footgun I believe. For instance, if you already have requests 2.x installed and imported, and then you run the builtin and install something that triggers requests 1.x to be installed you'll end up with your Python in an inconsistent state. You might even end up importing something from requests and ending up with modules from two different versions of requests ending up in sys.modules. In addition, the standard library is not really enough to accurately install packages from PyPI. You need a real HTML parser that can handle malformed input safely, an implementation of PEP 440 versions and specifiers (currently implemented in the "packaging" library on PyPI), you also need some mechanism for inspecting the currently installed set of packages, so you need something like pkg_resources available to properly support that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From brett at python.org Sat Sep 5 19:05:21 2015 From: brett at python.org (Brett Cannon) Date: Sat, 05 Sep 2015 17:05:21 +0000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <20150905153801.GV19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150905153801.GV19373@ando.pearwood.info> Message-ID: On Sat, 5 Sep 2015 at 08:40 Steven D'Aprano wrote: > On Sat, Sep 05, 2015 at 04:08:24PM +0900, Stephen J. Turnbull wrote: > > Steven D'Aprano writes: > > > > > You say "of course", but did you actually look at the python-list > > > archives? If you do, you will see posts like these two within the last > > > 24 hours: > > > > So let's fix it, already![1] Now that we have a blessed package > > management module, why not have a builtin that handles the simple > > cases? Say > > > > def installer(package, command='install'): > > ... > > Python competes strongly with R in the scientific software area, and R > supports a built-in to do just that: > > > https://stat.ethz.ch/R-manual/R-devel/library/utils/html/install.packages.html The reason R has a built-in for this is because it's used a vast majority of the time from a REPL to do data analytics in an exploratory manner (think Jupyter notebook type of data exploration). Python does not have the same typical usage style and so I don't think we should follow R in this instance (although I have had R users says that packaging in R is far superior than Python due to ease of getting extensions installed period and not because of the lack of a function, but that's another discussion). -------------- next part -------------- An HTML attachment was scrubbed... URL: From cannatag at gmail.com Sat Sep 5 20:03:34 2015 From: cannatag at gmail.com (Giovanni Cannata) Date: Sat, 05 Sep 2015 20:03:34 +0200 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150905153801.GV19373@ando.pearwood.info> Message-ID: In reading this discussion with interest. pip and PyPI are what make the Python ecosystem live and vital. Especially PyPI is what surprises any Python newbie. A single repository freely available where to find valuable ready-to-use software, accessible from anywhere, it's hard to believe if you are, for example, a Java or .NET developer. But are you aware of the problems that PyPI is currently suffering? It's about two weeks that its searching engine is faulty and doesn't find many packages, even if they are available. This is a very bad thing because PyPI is the frontgate to the Python system for many people, more than the python.org site itself. I think that PyPI should deserve a special attention for the sake of the whole Python community. -------- Messaggio originale -------- Da:Donald Stufft Inviato:Sat, 05 Sep 2015 18:38:29 +0200 A:Steven D'Aprano ,python-ideas at python.org Oggetto:Re: [Python-ideas] High time for a builtin function to manage packages (simply)? >On September 5, 2015 at 11:40:17 AM, Steven D'Aprano (steve at pearwood.info) wrote: >> On Sat, Sep 05, 2015 at 04:08:24PM +0900, Stephen J. Turnbull wrote: >> > Steven D'Aprano writes: >> > >> > > You say "of course", but did you actually look at the python-list >> > > archives? If you do, you will see posts like these two within the last >> > > 24 hours: >> > >> > So let's fix it, already![1] Now that we have a blessed package >> > management module, why not have a builtin that handles the simple >> > cases? Say >> > >> > def installer(package, command='install'): >> > ... >> >> Python competes strongly with R in the scientific software area, and R >> supports a built-in to do just that: >> >> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/install.packages.html >> > >I don't know anything about R, but a built in function is a bad idea. It'll be >a pretty big footgun I believe. For instance, if you already have requests 2.x >installed and imported, and then you run the builtin and install something >that triggers requests 1.x to be installed you'll end up with your Python in >an inconsistent state. You might even end up importing something from requests >and ending up with modules from two different versions of requests ending up >in sys.modules. In addition, the standard library is not really enough to >accurately install packages from PyPI. You need a real HTML parser that can >handle malformed input safely, an implementation of PEP 440 versions and >specifiers (currently implemented in the "packaging" library on PyPI), you also >need some mechanism for inspecting the currently installed set of packages, so >you need something like pkg_resources available to properly support that. > > >----------------- >Donald Stufft >PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Sat Sep 5 20:05:53 2015 From: donald at stufft.io (Donald Stufft) Date: Sat, 5 Sep 2015 14:05:53 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150905153801.GV19373@ando.pearwood.info> Message-ID: On September 5, 2015 at 2:03:46 PM, Giovanni Cannata (cannatag at gmail.com) wrote: > > But are you aware of the problems that PyPI is currently suffering? It's about two weeks > that its searching engine is faulty and doesn't find many packages, even if they are available. > I?m aware. I?m only one person and my plate is extremely full. I?ve poking at the problem to try and figure it out. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From anandkrishnakumar123 at gmail.com Sat Sep 5 21:33:00 2015 From: anandkrishnakumar123 at gmail.com (Anand Krishnakumar) Date: Sat, 05 Sep 2015 19:33:00 +0000 Subject: [Python-ideas] Desperate need for enhanced print function Message-ID: Hi! This is my first time I'm sending an email to the python-ideas mailing list. I've got an enhancement idea for the built-in print function and I hope it is as good as I think it is. Imagine you have a trial.py file like this: a = 4 b = "Anand" print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") OR print("Hello, I am ", b, ". My favorite number is ", a, ".") Well I've got an idea for a function named "print_easy" (The only valid name I could come up with right now). So print_easy will be a function which will be used like this (instead of the current print statement in trial.py) : print_easy("Hello, I am", b, ". My favorite number is", a ".") Which gives out: Hello, I am Anand. My favorite number is 4 The work it does is that it casts the variables and it also formats the sentences it is provided with. It is exclusively for beginners. I'm 14 and I came up with this idea after seeing my fellow classmates at school struggling to do something like this with the standard print statement. Sure, you can use the format method but won't that be a bit too much for beginners? (Also, casting is inevitable in every programmer's career) Please let me know how this sounds. If it gains some traction, I'll work on it a bit more and clearly list out the features. Thanks, Anand. -- Anand. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sat Sep 5 21:39:35 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 05 Sep 2015 14:39:35 -0500 Subject: [Python-ideas] Desperate need for enhanced print function In-Reply-To: References: Message-ID: <6961A279-D649-4F37-9D5C-85476BE6F789@gmail.com> On September 5, 2015 2:33:00 PM CDT, Anand Krishnakumar wrote: >Hi! > >This is my first time I'm sending an email to the python-ideas mailing >list. I've got an enhancement idea for the built-in print function and >I >hope it is as good as I think it is. > >Imagine you have a trial.py file like this: > >a = 4 >b = "Anand" > >print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") >OR >print("Hello, I am ", b, ". My favorite number is ", a, ".") > >Well I've got an idea for a function named "print_easy" (The only valid >name I could come up with right now). > >So print_easy will be a function which will be used like this (instead >of >the current print statement in trial.py) : > >print_easy("Hello, I am", b, ". My favorite number is", a ".") I'm sorry...but I can't see the difference. Aren't the two calls exactly the same?? > >Which gives out: > >Hello, I am Anand. My favorite number is 4 > >The work it does is that it casts the variables and it also formats the >sentences it is provided with. It is exclusively for beginners. > >I'm 14 and I came up with this idea after seeing my fellow classmates >at >school struggling to do something like this with the standard print >statement. >Sure, you can use the format method but won't that be a bit too much >for >beginners? (Also, casting is inevitable in every programmer's career) > >Please let me know how this sounds. If it gains some traction, I'll >work on >it a bit more and clearly list out the features. > >Thanks, >Anand. -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. From cannatag at gmail.com Sat Sep 5 21:49:01 2015 From: cannatag at gmail.com (Giovanni Cannata) Date: Sat, 05 Sep 2015 21:49:01 +0200 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150905153801.GV19373@ando.pearwood.info> Message-ID: Hi Donald, you mean you're the only one in charge of maintaining PyPI? I'm sorry for this, I thought that a critical service like PyPI was supported by a team. I (and presume other developers) rely heavily on it. Maybe this should be brought to the attention of the PSF. -------- Messaggio originale -------- Da:Donald Stufft Inviato:Sat, 05 Sep 2015 20:05:53 +0200 A:python-ideas at python.org,Steven D'Aprano ,Giovanni Cannata Oggetto:Re: [Python-ideas] High time for a builtin function to manage packages (simply)? >On September 5, 2015 at 2:03:46 PM, Giovanni Cannata (cannatag at gmail.com) wrote: >> >> But are you aware of the problems that PyPI is currently suffering? It's about two weeks >> that its searching engine is faulty and doesn't find many packages, even if they are available. >> > >I?m aware. I?m only one person and my plate is extremely full. I?ve poking at the problem to try and figure it out. > >----------------- >Donald Stufft >PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Sep 5 22:13:28 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 5 Sep 2015 16:13:28 -0400 Subject: [Python-ideas] Desperate need for enhanced print function In-Reply-To: References: Message-ID: On 9/5/2015 3:33 PM, Anand Krishnakumar wrote: > Hi! > > This is my first time I'm sending an email to the python-ideas mailing > list. I've got an enhancement idea for the built-in print function and I > hope it is as good as I think it is. > > Imagine you have a trial.py file like this: > > a = 4 > b = "Anand" > > print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") > OR > print("Hello, I am ", b, ". My favorite number is ", a, ".") This prints Hello, I am Anand . My favorite number is 4 . because the sep parameter defaults to ' '. If you want 'ease', leave out end spaces within quotes and don't worry about spaces before periods. To get what you want, add ", sep=''" before the closing parenthesis. print("Hello, I am ", b, ". My favorite number is ", a, ".", sep='') Hello, I am Anand. My favorite number is 4. > Well I've got an idea for a function named "print_easy" (The only valid > name I could come up with right now). When you want a mix of '' and ' ' separators, learn to use templates. print("Hello, I am {}. My favorite number is {}.".format(b, a)) Hello, I am Anand. My favorite number is 4. This ends up being easier to type and read because is does not have all the extraneous unprinted commas and quotes in the middle. The formatting could also be written print("Hello, I am {name}. My favorite number is {favnum}." .format(name=b, favnum=a)) -- Terry Jan Reedy From donald at stufft.io Sat Sep 5 22:38:25 2015 From: donald at stufft.io (Donald Stufft) Date: Sat, 5 Sep 2015 16:38:25 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150905153801.GV19373@ando.pearwood.info> Message-ID: On September 5, 2015 at 3:49:07 PM, Giovanni Cannata (cannatag at gmail.com) wrote: > Hi Donald, you mean you're the only one in charge of maintaining PyPI? I'm sorry for this, > I thought that a critical service like PyPI was supported by a team. I (and presume other > developers) rely heavily on it. Maybe this should be brought to the attention of the PSF. > Yes and No. I?m the primary developer/administrator for it now and it doesn?t get much contribution from others. It is also supported by the Python infrastructure team, but there is only a handful of us and I?m the only person on that team who has paid time to work on that and the Infrastructure team is also responsible for many other python.org?services. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From tjreedy at udel.edu Sat Sep 5 23:03:36 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 5 Sep 2015 17:03:36 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9/5/2015 3:08 AM, Stephen J. Turnbull wrote: > So let's fix it, already![1] Now that we have a blessed package > management module, why not have a builtin that handles the simple > cases? Say > > def installer(package, command='install'): > ... Because new builtins have a high threashold to reach, and this doesn't reach it? Installation is a specialized and rare operation. Because pip must be installed anyway, so a function should be in the package and imported? from pip import main (I realized that PM indepedence is part of the proposal. See below.) I think a gui frontend is an even better idea. The tracker has a proposal to make such, once written, available from Idle. https://bugs.python.org/issue23551 I was thinking that the gui code should be in pip itself and not idlelib, so as to be available to any Python shell or IDE. If it covered multiple PMs, then it might go somewhere in the stdlib. -- Terry Jan Reedy From rosuav at gmail.com Sun Sep 6 01:41:38 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 6 Sep 2015 09:41:38 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Sep 6, 2015 at 7:03 AM, Terry Reedy wrote: > On 9/5/2015 3:08 AM, Stephen J. Turnbull wrote: > >> So let's fix it, already![1] Now that we have a blessed package >> management module, why not have a builtin that handles the simple >> cases? Say >> >> def installer(package, command='install'): >> ... > > > Because new builtins have a high threashold to reach, and this doesn't reach > it? Installation is a specialized and rare operation. If there's a simple entry-point like that in an importable module, anyone who wants it as a builtin can simply pre-import it (this would be used interactively anyway). All it'd take would be a known function to import. ChrisA From abarnert at yahoo.com Sun Sep 6 01:59:35 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 5 Sep 2015 16:59:35 -0700 Subject: [Python-ideas] Desperate need for enhanced print function In-Reply-To: References: Message-ID: <0636FF1F-9677-41CC-9B17-ED0B056829C5@yahoo.com> On Sep 5, 2015, at 12:33, Anand Krishnakumar wrote: > > The work it does is that it casts the variables and it also formats the sentences it is provided with. It is exclusively for beginners. What do you mean by "casts"? The print function already calls str on each of its arguments. Do you want print_easy to do something different? If so, what? And why do you call it "casting"? More generally, how do you want the output of print_easy to differ from the output of print, given the same arguments? If you're hoping it can automatically figure out where to put spaces and where not to, what rule do you want it to use? Obviously it can't be some complicated DWIM AI that figures out whether you're writing English sentences, German sentences, a columnar table, or source code, but maybe there's something simple you can come up with that's still broadly useful. (If you can figure out how to turn that into code, you can put print_easy up on PyPI and let people get some experience using it and increase the chances of getting buy-in to the idea, and even if everyone rejects it as a builtin, you and other students can still use it.) From tjreedy at udel.edu Sun Sep 6 06:04:42 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 6 Sep 2015 00:04:42 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9/5/2015 5:03 PM, Terry Reedy wrote: > I think a gui frontend is an even better idea. The tracker has a > proposal to make such, once written, available from Idle. > https://bugs.python.org/issue23551 > I was thinking that the gui code should be in pip itself and not > idlelib, so as to be available to any Python shell or IDE. If it covered > multiple PMs, then it might go somewhere in the stdlib. Inspired by this thread, I did some experiments and am fairly confident that pip.main can be imported and used directly, bypassing paths, subprocesses, and pipes. -- Terry Jan Reedy From russell at keith-magee.com Sun Sep 6 10:57:38 2015 From: russell at keith-magee.com (Russell Keith-Magee) Date: Sun, 6 Sep 2015 16:57:38 +0800 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Sep 6, 2015 at 12:04 PM, Terry Reedy wrote: > On 9/5/2015 5:03 PM, Terry Reedy wrote: > >> I think a gui frontend is an even better idea. The tracker has a >> proposal to make such, once written, available from Idle. >> https://bugs.python.org/issue23551 >> I was thinking that the gui code should be in pip itself and not >> idlelib, so as to be available to any Python shell or IDE. If it covered >> multiple PMs, then it might go somewhere in the stdlib. > > > Inspired by this thread, I did some experiments and am fairly confident that > pip.main can be imported and used directly, bypassing paths, subprocesses, > and pipes. I can confirm that this is, indeed, possible. I use this exact technique in my tool Briefcase to simplify the process of packaging code as an app bundle. https://github.com/pybee/briefcase/blob/master/briefcase/app.py#L108 Yours, Russ Magee %-) From trent at snakebite.org Sun Sep 6 17:19:54 2015 From: trent at snakebite.org (Trent Nelson) Date: Sun, 6 Sep 2015 11:19:54 -0400 Subject: [Python-ideas] PyParallel update Message-ID: <20150906151954.GB1069@trent.me> [CC'ing python-dev@ for those that are curious; please drop and keep follow-up discussion to python-ideas@] Hi folks, I've made a lot of progress on PyParallel since the PyCon dev summit (https://speakerdeck.com/trent/pyparallel-pycon-2015-language-summit); I fixed the outstanding breakage with generators, exceptions and whatnot. I got the "instantaneous Wiki search server" working[1] and implemented the entire TechEmpower Frameworks Benchmark Suite[2], including a PyParallel-friendly pyodbc module, allowing database connections and querying in parallel. [1]: https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py [2]: https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/tefb/tefb.py I set up a landing page for the project: http://pyparallel.org And there was some good discussion on reddit earlier this week: https://www.reddit.com/r/programming/comments/3jhv80/pyparallel_an_experimental_proofofconcept_fork_of/ I've put together some documentation on the project, its aims, and the key parts of the solution regarding the parallelism through simple client/server paradigms. This documentation is available directly on the github landing page for the project: https://github.com/pyparallel/pyparallel Writing that documentation forced me to formalize (or at least commit) to the restrictions/trade-offs that PyParallel would introduce, and I'm pretty happy I was basically able to boil it down into a single rule: Don't persist parallel objects. That keeps the mental model very simple. You don't need to worry about locking or ownership or races or anything like that. Just don't persist parallel objects, that's the only thing you have to remember. It's actually really easy to convert existing C code or Python code into something that is suitable for calling from within a parallel callback by just ensuring that rule isn't violated. It took about four hours to figure out how NumPy allocated stuff and add in the necessary PyParallel-aware tweaks, and not that much longer for pyodbc. (Most stuff "just works", though.) (The ABI changes would mean this is a Python 4.x type of thing; there are fancy ways we could avoid ABI changes and get this working on Python 3.x, but, eh, I like the 4.x target. It's realistic.) The other thing that clicked is that asyncio and PyParallel would actually work really well together for exploiting client-driven parallelism (PyParallel really is only suited to server-oriented parallelism at the moment, i.e. serving HTTP requests in parallel). With asyncio, though, you could keep the main-thread/single-thread client-drives-computation paradigm, but have it actually dispatch work to parallel.server() objects behind the scenes. For example, in order to process all files in a directory in parallel, asyncio would request a directory listing (i.e. issue a GET /) which the PyParallel HTTP server would return, it would then create non-blocking client connections to the same server and invoke whatever HTTP method is desired to do the file processing. You can either choose to write the new results from within the parallel context (which could then be accessed as normal files via HTTP), or you could have PyParallel return json/bytes, which could then be aggregated by asyncio. Everything is within the same process, so you get all the benefits that provides (free access to anything within scope, like large data structures, from within parallel contexts). You can synchronously call back into the main thread from a parallel thread, too, if you wanted to update a complex data structure directly. The other interesting thing that documentation highlights is the advantage of the split brain "main thread vs parallel thread" GC and non-GC allocators. I'm not sure if I've ever extolled the virtue of such an approach on paper or in e-mail. It's pretty neat though and allows us to avoid a whole raft of problems that need to be solved when you have a single GC/memory model. Next steps: once 3.5 is tagged, I'm going to bite the bullet and rebase. That'll require a bit of churn, so if there's enough interest from others, I figured we'd use the opportunity to at least get it building again on POSIX (Linux/OSX/FreeBSD). From there people can start implementing the missing bits for implementing the parallel machinery behind the scenes. The parallel interpreter thread changes I made are platform agnostic, the implementation just happens to be on Windows at the moment; don't let the Windows-only thing detract from what's actually being pitched: a (working, demonstrably-performant) solution to "Python's GIL problem". Regards, Trent. From srkunze at mail.de Sun Sep 6 19:33:29 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sun, 6 Sep 2015 19:33:29 +0200 Subject: [Python-ideas] Wheels For ... Message-ID: <55EC78E9.1050300@mail.de> Hi folks, currently, I came across http://pythonwheels.com/ during researching how to make a proper Python distribution for PyPI. I thought it would be great idea to tell other maintainers to upload their content as wheels so I approached a couple of them. Some of them already provided wheels. Happy being able to have built my own distribution, I discussed the issue at hand with some people and I would like to share my findings and propose some ideas: 1) documentation is weirdly split up/distributed and references old material 2) once up and running (setup.cfg, setup.py etc. etc.) it works but everybody needs to do it on their own 3) more than one way to do (upload, wheel, source/binary etc.) it (sigh) 4) making contact to propose wheels on github or per email is easy otherwise almost impossible or very tedious 5) reactions went evenly split from "none", "yes", "when ready" to "nope" None: well, okay yes: that's good when ready: well, okay nope: what a pity for wheels; example: https://github.com/simplejson/simplejson/issues/122 I personally find the situation not satisfying. Someone proposes the following solution in form of a question: Why do developers need to build their distribution themselves? I had not real answer to him, but pondering a while over it, I found it really insightful. Viewing this from a different angle, packaging your own distribution is actually a waste of time. It is a tedious, error-prone task involving no creativity whatsoever. Developers on the other hand are actually people with very little time and a lot of creativity at hand which should spend better. The logical conclusion would be that PyPI should build wheels for the developers for every python/platform combination necessary. With this post, I would like raise awareness of the people in charge of the Python infrastructure. Best, Sven From tjreedy at udel.edu Sun Sep 6 19:54:17 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 6 Sep 2015 13:54:17 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9/6/2015 4:57 AM, Russell Keith-Magee wrote: > On Sun, Sep 6, 2015 at 12:04 PM, Terry Reedy wrote: >> On 9/5/2015 5:03 PM, Terry Reedy wrote: >> >>> I think a gui frontend is an even better idea. The tracker has a >>> proposal to make such, once written, available from Idle. >>> https://bugs.python.org/issue23551 >>> I was thinking that the gui code should be in pip itself and not >>> idlelib, so as to be available to any Python shell or IDE. If it covered >>> multiple PMs, then it might go somewhere in the stdlib. >> >> >> Inspired by this thread, I did some experiments and am fairly confident that >> pip.main can be imported and used directly, bypassing paths, subprocesses, >> and pipes. > > I can confirm that this is, indeed, possible. I use this exact > technique in my tool Briefcase to simplify the process of packaging > code as an app bundle. > > https://github.com/pybee/briefcase/blob/master/briefcase/app.py#L108 There *is*, however, a potential gotcha suggested on the issue by Donald Stufft and verified by me. pip is designed for one command (main call) per invocation, not for repeated commands. When started, it finds and caches a list of installed packages. The install command does not update the cached list. So 'show new_or_upgraded_package' will not work. Your series of installs do not run into this. -- Terry Jan Reedy From bussonniermatthias at gmail.com Sun Sep 6 20:29:59 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sun, 6 Sep 2015 11:29:59 -0700 Subject: [Python-ideas] Wheels For ... In-Reply-To: <55EC78E9.1050300@mail.de> References: <55EC78E9.1050300@mail.de> Message-ID: Hi Sven, Just adding a few comments inline: On Sun, Sep 6, 2015 at 7:33 PM, Sven R. Kunze wrote: > 3) more than one way to do (upload, wheel, source/binary etc.) it (sigh) And most are uploading/registering over http (sight) > nope: what a pity for wheels; example: > https://github.com/simplejson/simplejson/issues/122 But that's for non pure-python wheels, wheel can be universal, in which case they are easy to build. > Why do developers need to build their distribution themselves? Historical reason. On GitHub, at least it is pretty easy to make Travis-CI build your wheels, some scientific packages (which are not the easier to build) have done that, so automation is possible. And these case need really particular environements where all aspects of the builds are controlled. > > I had not real answer to him, but pondering a while over it, I found it > really insightful. Viewing this from a different angle, packaging your own > distribution is actually a waste of time. It is a tedious, error-prone task > involving no creativity whatsoever. Developers on the other hand are > actually people with very little time and a lot of creativity at hand which > should spend better. The logical conclusion would be that PyPI should build > wheels for the developers for every python/platform combination necessary. I think that some of that could be done by warehouse at some point: https://github.com/pypa/warehouse But you will never be able to cover all. I'm sure people will ask PyPI to build for windows 98 server version otherwise. Personally for pure python packages I know use https://pypi.python.org/pypi/flit which is one of the only packaging tools for which I can remember all the step to get a package on PyPI without reading the docs. -- M [Sven, sorry for duplicate :-) ] From steve.dower at python.org Sun Sep 6 21:22:19 2015 From: steve.dower at python.org (Steve Dower) Date: Sun, 6 Sep 2015 12:22:19 -0700 Subject: [Python-ideas] Wheels For ... In-Reply-To: <55EC78E9.1050300@mail.de> References: <55EC78E9.1050300@mail.de> Message-ID: <55EC926B.5050708@python.org> On 06Sep2015 1033, Sven R. Kunze wrote: > The logical conclusion > would be that PyPI should build wheels for the developers for every > python/platform combination necessary. This would be a wonderful situation to end up in, but the problem is that many wheels have difficult source dependencies to configure. It is much easier for the developers who should already have working systems to build the wheel themselves then it would be either for them to provide/configure a remote system to do it, or for the end-user to configure their own system. (And if it can't be tested on a particular system, then the developer probably shouldn't release wheels for that system anyway.) What I would rather see is a way to delegate building to other people by explicitly allowing someone to add wheels to PyPI for existing releases without necessarily being able to make a new release or delete old ones. There is some trust involved, but it could also enable more ad-hoc systems of building wheels through Travis/Jenkins/VSO/etc. automation without needing to reveal login information through your repo (i.e. you give the "Jenkins-wheel" user permission to publish wheels for your package). Not sure how feasible this is, but I'd guess it's easier than trying to run our own build servers. Cheers, Steve From lac at openend.se Sun Sep 6 21:54:51 2015 From: lac at openend.se (Laura Creighton) Date: Sun, 06 Sep 2015 21:54:51 +0200 Subject: [Python-ideas] Wheels For ... In-Reply-To: References: <55EC78E9.1050300@mail.de> Message-ID: <201509061954.t86Jspjg011546@fido.openend.se> In a message of Sun, 06 Sep 2015 15:31:16 -0400, Terry Reedy writes: >On 9/6/2015 1:33 PM, Sven R. Kunze wrote: >> >> With this post, I would like raise awareness of the people in charge of >> the Python infrastructure. > >pypa is in charge of packaging. https://github.com/pypa >I believe the google groups link is their discussion forum. They have one -- https://groups.google.com/forum/#!forum/pypa-dev but you can also get them at the disutils mailing list. https://mail.python.org/mailman/listinfo/distutils-sig I think, rather than discussion, it is 'people willing to write code' that they are short of ... Laura From ncoghlan at gmail.com Mon Sep 7 01:48:21 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Sep 2015 09:48:21 +1000 Subject: [Python-ideas] Desperate need for enhanced print function In-Reply-To: References: Message-ID: On 6 September 2015 at 05:33, Anand Krishnakumar wrote: > print("Hello, I am ", b, ". My favorite number is ", a, ".") > > I'm 14 and I came up with this idea after seeing my fellow classmates at > school struggling to do something like this with the standard print > statement. > Sure, you can use the format method but won't that be a bit too much for > beginners? (Also, casting is inevitable in every programmer's career) Hi Anand, Your feedback reflects a common point of view on the surprising difficulty of producing nicely formatted messages from Python code. As such, it currently appears likely that Python 3.6 will allow you and your peers to write output messages like this: print(f"Hello, I am {b}. My favorite number is {a}.") as a simpler alternative to the current options: print("Hello, I am ", b, ". My favorite number is ", a, ".", sep="") print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") print("Hello, I am {}. My favorite number is {}.".format(b, a)) print("Hello, I am {b}. My favorite number is {a}.".format_map(locals())) print("Hello, I am %s. My favorite number is %s." % (b, a)) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Sep 7 02:20:51 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Sep 2015 10:20:51 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150905153801.GV19373@ando.pearwood.info> Message-ID: On 6 September 2015 at 05:49, Giovanni Cannata wrote: > Hi Donald, you mean you're the only one in charge of maintaining PyPI? I'm > sorry for this, I thought that a critical service like PyPI was supported by > a team. I (and presume other developers) rely heavily on it. Maybe this > should be brought to the attention of the PSF. We're aware. PyPI is currently the *only* python.org service with a dedicated full time developer (and Donald's time is actually contributed by the OpenStack group at HP rather than being funded directly by the PSF). There are also a number of organisations that donate resources to operating the python.org infrastructure (e.g. the Fastly CDN, and hosting services from Heroku, Rackspace and the Open Source Lab at Oregon State University). Bringing paid development in to support community driven projects, while also ensuring financial sustainability and fiscally responsible management of contributor's funds is an interesting challenge, and one the PSF continues to work to get better at. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Sep 7 02:26:26 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Sep 2015 10:26:26 +1000 Subject: [Python-ideas] Wheels For ... In-Reply-To: <201509061954.t86Jspjg011546@fido.openend.se> References: <55EC78E9.1050300@mail.de> <201509061954.t86Jspjg011546@fido.openend.se> Message-ID: On 7 September 2015 at 05:54, Laura Creighton wrote: > I think, rather than discussion, it is 'people willing to write code' > that they are short of ... For the build farm idea, it's not just writing the code initially, it's operating the resulting infrastructure, and that's a much bigger ongoing commitment. Automatically building wheels for source uploads is definitely on the wish list, there are just a large number of other improvements needed before it's feasible. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rymg19 at gmail.com Mon Sep 7 03:22:27 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 06 Sep 2015 20:22:27 -0500 Subject: [Python-ideas] Wheels For ... In-Reply-To: <55EC78E9.1050300@mail.de> References: <55EC78E9.1050300@mail.de> Message-ID: <3B3CDE41-BB58-447D-BB86-9994EE9587FA@gmail.com> On September 6, 2015 12:33:29 PM CDT, "Sven R. Kunze" wrote: >Hi folks, > >currently, I came across http://pythonwheels.com/ during researching >how >to make a proper Python distribution for PyPI. I thought it would be >great idea to tell other maintainers to upload their content as wheels >so I approached a couple of them. Some of them already provided wheels. > >Happy being able to have built my own distribution, I discussed the >issue at hand with some people and I would like to share my findings >and >propose some ideas: > >1) documentation is weirdly split up/distributed and references old >material >2) once up and running (setup.cfg, setup.py etc. etc.) it works but >everybody needs to do it on their own >3) more than one way to do (upload, wheel, source/binary etc.) it >(sigh) >4) making contact to propose wheels on github or per email is easy >otherwise almost impossible or very tedious >5) reactions went evenly split from "none", "yes", "when ready" to >"nope" > >None: well, okay >yes: that's good >when ready: well, okay >nope: what a pity for wheels; example: >https://github.com/simplejson/simplejson/issues/122 > >I personally find the situation not satisfying. Someone proposes the >following solution in form of a question: > >Why do developers need to build their distribution themselves? > >I had not real answer to him, but pondering a while over it, I found it > >really insightful. Viewing this from a different angle, packaging your >own distribution is actually a waste of time. It is a tedious, >error-prone task involving no creativity whatsoever. Developers on the >other hand are actually people with very little time and a lot of >creativity at hand which should spend better. The logical conclusion >would be that PyPI should build wheels for the developers for every >python/platform combination necessary. > You can already do this with CI services. I wrote a post about doing that with AppVeyor: http://kirbyfan64.github.io/posts/using-appveyor-to-distribute-python-wheels.html but the idea behind it should apply easily to Travis and others. In reality, you're probably using a CI service to run your tests anyway, so it might as well build your wheels, too! > >With this post, I would like raise awareness of the people in charge of > >the Python infrastructure. > > >Best, >Sven >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. From steve at pearwood.info Mon Sep 7 03:26:45 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 7 Sep 2015 11:26:45 +1000 Subject: [Python-ideas] Wheels For ... In-Reply-To: <55EC78E9.1050300@mail.de> References: <55EC78E9.1050300@mail.de> Message-ID: <20150907012645.GX19373@ando.pearwood.info> On Sun, Sep 06, 2015 at 07:33:29PM +0200, Sven R. Kunze wrote: > Why do developers need to build their distribution themselves? > > I had not real answer to him, but pondering a while over it, I found it > really insightful. Viewing this from a different angle, packaging your > own distribution is actually a waste of time. It is a tedious, > error-prone task involving no creativity whatsoever. Developers on the > other hand are actually people with very little time and a lot of > creativity at hand which should spend better. The logical conclusion > would be that PyPI should build wheels for the developers for every > python/platform combination necessary. Over on the python-list mailing list, Ned Batchelder asked a question. I haven't seen an answer there, and as far as I know he isn't subscribed here, so I'll take the liberty of copying his question here: Ned says: "As a developer of a Python package, I don't see how this would be better. The developer would still have to get their software into some kind of uniform configuration, so the central authority could package it. You've moved the problem from, "everyone has to make wheels" to "everyone has to make a tree that's structured properly." But if we can do the second thing, the first thing is really easy. Maybe I've misunderstood?" -- Steve From stephen at xemacs.org Mon Sep 7 03:51:23 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 07 Sep 2015 10:51:23 +0900 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87d1xv2e04.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > On 9/5/2015 3:08 AM, Stephen J. Turnbull wrote: > > > So let's fix it, already![1] Now that we have a blessed package > > management module, why not have a builtin that handles the simple > > cases? Say > > > > def installer(package, command='install'): > > ... > > Because new builtins have a high threashold to reach, and this doesn't > reach it? In your opinion. And in mine! Personally, I don't have a problem with remembering python -m pip, nor do I have a problem with explaining it as frequently as necessary to my students But there are only 20 of them, rather than the thousands that folks like Steven (and you?) are dealing with on python-list -- and there's the rub. I'm suggesting this because of the vehemence with which Steven (among others) objects to any suggestion that packages belong on PyPI, and the fact that he can back that up with fairly distressing anecdotes about the number of beginner posts asking about pip problems. I would really like to see that put to rest. > Installation is a specialized and rare operation. help(), quit(), and quit are builtins. I never use quit or quit() (Ctrl-D works on all the systems I use), so I guess they are "specialized and rare" in some sense, and I'm far more likely to use dir() and a pydoc HTML server than help(). More to the point, the trouble packaging causes beginners and Steven d'Aprano on python-list is apparently widespread and daily. At some point beginner-friendliness has enough value to make it into the stdlib and even builtins. > Because pip must be installed anyway, so a function should be in the > package and imported? > from pip import main I don't pronounce that "install". Discoverability matters a lot for the users in question (which is why I'm not happy with "installer", but it's somewhat more memorable than "pip"). > I think a gui frontend is an even better idea. I think it's a great idea in itself. But IMO it doesn't address this issue because the almost but not really universally-available GUI is Tcl/Tk, which isn't even available in any of the four packaged Python instances I have installed (Mac OS X system Python 2.6 and 2.7, MacPorts Python 2.7 and 3.4, although IIRC MacPorts offers a tk variant you can enable, but it's off by default). > The tracker has a proposal to make such, once written, available > from Idle. > https://bugs.python.org/issue23551 > I was thinking that the gui code should be in pip itself Obviously; it doesn't address the present issue if it's not ensured by ensure_pip. Which further suggests something like ensure_pyqt as well. (Or ensure_tk, if you think that perpetuating Tcl/Tk is acceptable.) I think that's a huge mess, given the size and messiness of the dependencies. I suppose a browser-based interface like that of pydoc could deal with the "universality" issue, but I don't know how fragile it is. From steve at pearwood.info Mon Sep 7 04:18:02 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 7 Sep 2015 12:18:02 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150907021802.GY19373@ando.pearwood.info> On Sat, Sep 05, 2015 at 05:03:36PM -0400, Terry Reedy wrote: > On 9/5/2015 3:08 AM, Stephen J. Turnbull wrote: > > >So let's fix it, already![1] Now that we have a blessed package > >management module, why not have a builtin that handles the simple > >cases? Say > > > > def installer(package, command='install'): > > ... > > Because new builtins have a high threashold to reach, and this doesn't > reach it? Installation is a specialized and rare operation. You're right about the first part, but as Chris has already suggested, this need not be *literally* a built-in. Like help() it could be imported at REPL startup. And I'm not really so sure about how rare it is. Sure, installing a single package only happens once... unless you're installing it to multiple installations. Or upgrading the package. Or installing more than one package. Looking at questions on various programming forums, including Python but other languages as well, "how do I install X?" is an extremely common question. And, with the general reluctance to add new packages to the stdlib, and the emphasis on putting them onto PyPI first, I think that it will become even more common in the future. > I think a gui frontend is an even better idea. The tracker has a > proposal to make such, once written, available from Idle. > https://bugs.python.org/issue23551 > I was thinking that the gui code should be in pip itself and not > idlelib, so as to be available to any Python shell or IDE. If it covered > multiple PMs, then it might go somewhere in the stdlib. As I see it, there are three high-level steps to an awesome installer: 1. Have an excellent repository of software to install; 2. have a powerful interactive interface to the repo that Just Works; 3. add a GUI interface. I think that with PyPI we certainly have #1 covered, but I don't think we have #2 yet, there are still too many ways that things can "Not Work". Number 3 is icing on the cake - it makes a great system even better. I didn't specify whether the interactive interface should be a stand-alone application like pip, or an command in the REPL like R uses, or even both. I like the idea of being able to install packages directly from the Python prompt. It works well within R, and I don't see why it wouldn't work in Python either. But it isn't much of an imposition to run "python -m pip ..." at the shell. -- Steve From abarnert at yahoo.com Mon Sep 7 05:07:44 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 6 Sep 2015 20:07:44 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <87d1xv2e04.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <87d1xv2e04.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sep 6, 2015, at 18:51, Stephen J. Turnbull wrote: > > But IMO it doesn't address this issue because the almost but not > really universally-available GUI is Tcl/Tk, which isn't even available > in any of the four packaged Python instances I have installed (Mac OS > X system Python 2.6 and 2.7, MacPorts Python 2.7 and 3.4, although > IIRC MacPorts offers a tk variant you can enable, but it's off by > default). Tcl/Tk, and Tkinter for all pre-installed Pythons but 2.3, have been included with every OS X since they started pre-installing 2.5. Here's a brand new laptop that I've done nothing to but run the 10.10.4 update and other recommended updates from the App Store app: $ /usr/bin/python2.6 Python 2.6.9 (unknown, Sep 9 2014, 15:05:12) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.391)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import Tkinter >>> root = Tkinter.Tk() >>> That works, and pops up an empty root window. And it works with all python.org installs for 10.6 or later, all Homebrew default installs, standard source builds... Just about anything besides MacPorts (which seems to want to build Tkinter against its own Tcl/Tk instead of Apple's) Also, why do you think Qt would be less of a problem? Apple has at various times bundled Qt into OS X and/or Xcode, but not consistently, and even when it's there it's often set up in a way that you can't use it, and of course Apple has never include PyQt or PySide. So, if pip used Qt, a user would have to go to qt.io, register an account, figure out what they need to download and install, figure out how to make it install system-wide instead of per-user, and then repeat for PySide against each copy of Python they want to use. Either that, or pip would have to include its own complete copy of Qt and PySide. From abarnert at yahoo.com Mon Sep 7 05:17:53 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 6 Sep 2015 20:17:53 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <20150907021802.GY19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150907021802.GY19373@ando.pearwood.info> Message-ID: <5647B0DF-5034-4736-9F02-B3E106DD8434@yahoo.com> On Sep 6, 2015, at 19:18, Steven D'Aprano wrote: > > > I didn't specify whether the interactive interface should be a > stand-alone application like pip, or an command in the REPL like R uses, > or even both. I like the idea of being able to install packages directly > from the Python prompt. It works well within R, and I don't see why it > wouldn't work in Python either. But it isn't much of an imposition to > run "python -m pip ..." at the shell. Personally, I've never found ^Zpython -m pip spam && fg too hard (or just using ! from IPython), but I can understand why novices might. :) Anyway, the problem comes when you upgrade (directly or indirectly) a module that's already imported. Reloading is neither easy (especially if you need to reload a module that you only imported indirectly and upgraded indirectly) nor fool-proof. When I run into problems, I usually don't have much trouble stashing any costly intermediate objects, exiting the REPL, re-launching, and restoring, but I don't think novices would have as much fun. Is there a way the installer could, after working out the requirements, tell you something like "This command will upgrade 'spam' from 1.3.2 to 1.4.1, and you have imported 'spam' and 'spam.eggs' from the package, so you may need to restart after the upgrade. Continue?" That might be good enough. It's not exactly an everyday problem, so as long as it's visible when it's happened and obvious how to work around it so users who run into it for the first time don't just decide Python or pip or spam is "broken" and give up, that might be sufficient. (And a GUI installer integrated into IDLE would presumably have no additional problems, and could make the experience even nicer--especially since it's already got a "Restart Shell" option built in.) From rosuav at gmail.com Mon Sep 7 05:25:32 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 7 Sep 2015 13:25:32 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <5647B0DF-5034-4736-9F02-B3E106DD8434@yahoo.com> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150907021802.GY19373@ando.pearwood.info> <5647B0DF-5034-4736-9F02-B3E106DD8434@yahoo.com> Message-ID: On Mon, Sep 7, 2015 at 1:17 PM, Andrew Barnert via Python-ideas wrote: > Anyway, the problem comes when you upgrade (directly or indirectly) a module that's already imported. Reloading is neither easy (especially if you need to reload a module that you only imported indirectly and upgraded indirectly) nor fool-proof. When I run into problems, I usually don't have much trouble stashing any costly intermediate objects, exiting the REPL, re-launching, and restoring, but I don't think novices would have as much fun. > > Is there a way the installer could, after working out the requirements, tell you something like "This command will upgrade 'spam' from 1.3.2 to 1.4.1, and you have imported 'spam' and 'spam.eggs' from the package, so you may need to restart after the upgrade. Continue?" That might be good enough. It's not exactly an everyday problem, so as long as it's visible when it's happened and obvious how to work around it so users who run into it for the first time don't just decide Python or pip or spam is "broken" and give up, that might be sufficient. > How often does pip actually need to upgrade an already-installed package in order to install something you've just requested? Maybe the rule could be simpler: if there are any upgrades at all, regardless of whether you've imported from those packages, recommend a restart. The use-case I'd be most expecting is this: >>> import spam Traceback (most recent call last): File "", line 1, in ImportError: No module named 'spam' >>> install("spam") ... chuggity chug chug ... >>> import spam >>> In the uncommon case where spam depends on ham v1.4.7 or newer *AND* you already have ham <1.4.7 installed, a simple message should suffice. (Oh, and you also have to not have any version of spam installed already, else you won't be able to use install() anyway.) ChrisA From donald at stufft.io Mon Sep 7 06:01:34 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 7 Sep 2015 00:01:34 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150905153801.GV19373@ando.pearwood.info> Message-ID: On September 6, 2015 at 8:20:54 PM, Nick Coghlan (ncoghlan at gmail.com) wrote: > On 6 September 2015 at 05:49, Giovanni Cannata wrote: > > Hi Donald, you mean you're the only one in charge of maintaining PyPI? I'm > > sorry for this, I thought that a critical service like PyPI was supported by > > a team. I (and presume other developers) rely heavily on it. Maybe this > > should be brought to the attention of the PSF. > > We're aware. PyPI is currently the *only* python.org service with a > dedicated full time developer (and Donald's time is actually > contributed by the OpenStack group at HP rather than being funded > directly by the PSF). There are also a number of organisations that > donate resources to operating the python.org infrastructure (e.g. the > Fastly CDN, and hosting services from Heroku, Rackspace and the Open > Source Lab at Oregon State University). > > Bringing paid development in to support community driven projects, > while also ensuring financial sustainability and fiscally responsible > management of contributor's funds is an interesting challenge, and one > the PSF continues to work to get better at. > > I'm not exactly full time on PyPI either, though I am full time on packaging. I split my time between PyPI, pip, and any other related work that I need to. So PyPI (as important as it is) really only has part of my attention. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From donald at stufft.io Mon Sep 7 06:05:31 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 7 Sep 2015 00:05:31 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150907021802.GY19373@ando.pearwood.info> <5647B0DF-5034-4736-9F02-B3E106DD8434@yahoo.com> Message-ID: On September 6, 2015 at 11:26:04 PM, Chris Angelico (rosuav at gmail.com) wrote: > > How often does pip actually need to upgrade an already-installed > package in order to install something you've just requested? > Maybe the > rule could be simpler: if there are any upgrades at all, regardless > of > whether you've imported from those packages, recommend a restart. > The > use-case I'd be most expecting is this: Due to the nature of ``pip install --upgrade``, it's fairly common. At this time ``pip install --upgrade`` is "greedy" and will try to upgrade the named package and all of it's dependencies, even if their is already a version of the dependency installed that satisfies the version constraints. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From rosuav at gmail.com Mon Sep 7 06:09:46 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 7 Sep 2015 14:09:46 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150907021802.GY19373@ando.pearwood.info> <5647B0DF-5034-4736-9F02-B3E106DD8434@yahoo.com> Message-ID: On Mon, Sep 7, 2015 at 2:05 PM, Donald Stufft wrote: > On September 6, 2015 at 11:26:04 PM, Chris Angelico (rosuav at gmail.com) wrote: >> > How often does pip actually need to upgrade an already-installed >> package in order to install something you've just requested? >> Maybe the >> rule could be simpler: if there are any upgrades at all, regardless >> of >> whether you've imported from those packages, recommend a restart. >> The >> use-case I'd be most expecting is this: > > Due to the nature of ``pip install --upgrade``, it's fairly common. At this > time ``pip install --upgrade`` is "greedy" and will try to upgrade the named > package and all of it's dependencies, even if their is already a version of the > dependency installed that satisfies the version constraints. Okay. What if "--upgrade" isn't the default when it's being called from within an interactive session? Would that work? ChrisA From donald at stufft.io Mon Sep 7 06:20:23 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 7 Sep 2015 00:20:23 -0400 Subject: [Python-ideas] Wheels For ... In-Reply-To: <20150907012645.GX19373@ando.pearwood.info> References: <55EC78E9.1050300@mail.de> <20150907012645.GX19373@ando.pearwood.info> Message-ID: On September 6, 2015 at 9:27:32 PM, Steven D'Aprano (steve at pearwood.info) wrote: > On Sun, Sep 06, 2015 at 07:33:29PM +0200, Sven R. Kunze wrote: > > > Why do developers need to build their distribution themselves? > > > > I had not real answer to him, but pondering a while over it, I found it > > really insightful. Viewing this from a different angle, packaging your > > own distribution is actually a waste of time. It is a tedious, > > error-prone task involving no creativity whatsoever. Developers on the > > other hand are actually people with very little time and a lot of > > creativity at hand which should spend better. The logical conclusion > > would be that PyPI should build wheels for the developers for every > > python/platform combination necessary. > > Over on the python-list mailing list, Ned Batchelder asked a question. I > haven't seen an answer there, and as far as I know he isn't subscribed > here, so I'll take the liberty of copying his question here: > > Ned says: > > "As a developer of a Python package, I don't see how this would be > better. The developer would still have to get their software into some > kind of uniform configuration, so the central authority could package > it. You've moved the problem from, "everyone has to make wheels" to > "everyone has to make a tree that's structured properly." But if we can > do the second thing, the first thing is really easy. > > Maybe I've misunderstood?" > > A PyPI build farm for authors is something I plan on getting too if someone doesn't beat me to it. It won't be mandatory to use it, and it's not going to cover every corner case where you need some crazy obscure library installed, but it will ideally try to make things better. As far why it's better, it's actually pretty simple. Let's take lxml for example which binds against libxml2. It needs built on Windows, it needs built on OSX, it needs built on various Linux distributions in order to cover the spread of just the common cases. If we want to start to get into uncommon platforms we're looking at various BSDs, Solaris, etc. It's a lot of infrastructure to maintain for the common cases much less the uncommon cases that we can centralize the maintenance into one location. In addition to all of that, it turns out you pretty much need to get most of the way to defining that configuration in a central location anyways since pip already needs to know how to build your project, the only things it doesn't know is what platforms you support and what, if any, extra libraries you require to be installed. There are some PEPs in the works that may make that second part known ahead of time and for the first one, we can simply ask when a project enable the build farm. Even if we only support things that don't require additional libraries to be? installed, that's still a pretty big win by default that will allow a wide number of projects to be installed from a binary distribution and not require a compiler toolchain that previously couldn't be due to authors not willing or not able to manage the overhead of the infrastructure around building on all of those platforms. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From njs at pobox.com Mon Sep 7 07:07:35 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 6 Sep 2015 22:07:35 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150907021802.GY19373@ando.pearwood.info> <5647B0DF-5034-4736-9F02-B3E106DD8434@yahoo.com> Message-ID: On Sep 6, 2015 9:09 PM, "Chris Angelico" wrote: > > On Mon, Sep 7, 2015 at 2:05 PM, Donald Stufft wrote: > > On September 6, 2015 at 11:26:04 PM, Chris Angelico (rosuav at gmail.com) wrote: > >> > How often does pip actually need to upgrade an already-installed > >> package in order to install something you've just requested? > >> Maybe the > >> rule could be simpler: if there are any upgrades at all, regardless > >> of > >> whether you've imported from those packages, recommend a restart. > >> The > >> use-case I'd be most expecting is this: > > > > Due to the nature of ``pip install --upgrade``, it's fairly common. At this > > time ``pip install --upgrade`` is "greedy" and will try to upgrade the named > > package and all of it's dependencies, even if their is already a version of the > > dependency installed that satisfies the version constraints. > > Okay. What if "--upgrade" isn't the default when it's being called > from within an interactive session? Would that work? FWIW the recursive behaviour of --upgrade is perhaps the single most hated feature of pip (almost all scientific packages find it so annoying that they refuse to provide dependency metadata at all), and AFAIK everyone has agreed to deprecate it in general and replace it with a non-recursive upgrade command, just no-one has gotten around to it: https://github.com/pypa/pip/issues/59 So I wouldn't worry about defining special interactive semantics in particular, someone just has to make the patch to change it in general :-) The trickier bit is that I'm not sure there's actually any way right now to know what python packages were affected by a given install or upgrade command, because it can be the case that after 'pip install X' you then do 'import Y' -- the wheel and module names don't have to match, and in practice it's not uncommon for there to be discrepancies. (For example, after 'pip install matplotlib' you can do both 'import matplotlib' and 'import pylab'.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Sep 7 07:09:03 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Sep 2015 15:09:03 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <20150907021802.GY19373@ando.pearwood.info> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150907021802.GY19373@ando.pearwood.info> Message-ID: On 7 September 2015 at 12:18, Steven D'Aprano wrote: > On Sat, Sep 05, 2015 at 05:03:36PM -0400, Terry Reedy wrote: >> On 9/5/2015 3:08 AM, Stephen J. Turnbull wrote: >> >> >So let's fix it, already![1] Now that we have a blessed package >> >management module, why not have a builtin that handles the simple >> >cases? Say >> > >> > def installer(package, command='install'): >> > ... >> >> Because new builtins have a high threashold to reach, and this doesn't >> reach it? Installation is a specialized and rare operation. > > You're right about the first part, but as Chris has already suggested, > this need not be *literally* a built-in. Like help() it could be > imported at REPL startup. Technically it's "import site" that injects those - you have to run with "-S" to prevent them from being installed: $ python3 -c "quit()" $ python3 -Sc "quit()" Traceback (most recent call last): File "", line 1, in NameError: name 'quit' is not defined Regardless, I agree a "site builtin" like help() or quit() is a better option here than a true builtin, and I also think it's a useful idea. I'd make it simpler than the proposed API though, and instead just offer an "install(specifier)" API that was a thin shell around pip.main: try: import pip except ImportError: pass else: def install(specifier): cmd = ["install"] if sys.prefix == sys.base_prefix: cmd.append("--user") # User installs only when outside a venv cmd.append(specifier) # TODO: throw exception when there's a problem pip.main(cmd) If folks want more flexibility, then they'll need to access (and understand) the underlying installer. As far as other possible objections go: * the pkg_resources global state problem we should be able to work around just by reloading pkg_resources (if already loaded) after installing new packages (I've previously tried to address some aspects of that particular problem upstream, but doing so poses significant backwards compatibility challenges) * I believe integration with systems like conda, PyPM, and the Enthought installer should be addressed through a plugin model in pip, rather than directly in the standard library * providing a standard library API for querying the set of installed packages independently of pip is a separate question Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Sep 7 07:18:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Sep 2015 15:18:18 +1000 Subject: [Python-ideas] Wheels For ... In-Reply-To: <3B3CDE41-BB58-447D-BB86-9994EE9587FA@gmail.com> References: <55EC78E9.1050300@mail.de> <3B3CDE41-BB58-447D-BB86-9994EE9587FA@gmail.com> Message-ID: On 7 September 2015 at 11:22, Ryan Gonzalez wrote: > On September 6, 2015 12:33:29 PM CDT, "Sven R. Kunze" wrote: >>really insightful. Viewing this from a different angle, packaging your >>own distribution is actually a waste of time. It is a tedious, >>error-prone task involving no creativity whatsoever. Developers on the >>other hand are actually people with very little time and a lot of >>creativity at hand which should spend better. The logical conclusion >>would be that PyPI should build wheels for the developers for every >>python/platform combination necessary. >> > > You can already do this with CI services. I wrote a post about doing that with AppVeyor: > > http://kirbyfan64.github.io/posts/using-appveyor-to-distribute-python-wheels.html > > but the idea behind it should apply easily to Travis and others. In reality, you're probably using a CI service to run your tests anyway, so it might as well build your wheels, too! Right, Appveyor also has the most well-defined CI instructions on packaging.python.org: https://packaging.python.org/en/latest/appveyor.html It doesn't do auto-upload, as many projects only release occasionally rather than for every commit. However, it may be desirable to go into more detail about how to do that, if you'd be interested in sending a PR based on your post. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Mon Sep 7 07:12:08 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 6 Sep 2015 22:12:08 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150907021802.GY19373@ando.pearwood.info> <5647B0DF-5034-4736-9F02-B3E106DD8434@yahoo.com> Message-ID: <248D6F33-6FC0-4239-A15D-98529962DF5A@yahoo.com> On Sep 6, 2015, at 20:25, Chris Angelico wrote: > > On Mon, Sep 7, 2015 at 1:17 PM, Andrew Barnert via Python-ideas > wrote: >> Anyway, the problem comes when you upgrade (directly or indirectly) a module that's already imported. Reloading is neither easy (especially if you need to reload a module that you only imported indirectly and upgraded indirectly) nor fool-proof. When I run into problems, I usually don't have much trouble stashing any costly intermediate objects, exiting the REPL, re-launching, and restoring, but I don't think novices would have as much fun. >> >> Is there a way the installer could, after working out the requirements, tell you something like "This command will upgrade 'spam' from 1.3.2 to 1.4.1, and you have imported 'spam' and 'spam.eggs' from the package, so you may need to restart after the upgrade. Continue?" That might be good enough. It's not exactly an everyday problem, so as long as it's visible when it's happened and obvious how to work around it so users who run into it for the first time don't just decide Python or pip or spam is "broken" and give up, that might be sufficient. > > How often does pip actually need to upgrade an already-installed > package in order to install something you've just requested? Not that often, which is why I said "It's not exactly an everyday problem"; just often enough that some novices are going to run into it once or twice, so it can't be ignored. > Maybe the > rule could be simpler: if there are any upgrades at all, regardless of > whether you've imported from those packages, recommend a restart. I suppose that's possible too. It's overzealous, but it still won't happen _that_ often, so if my suggestion is too much work, this one seems fine to me. Especially if the message made the issue clear, something about "After an upgrade, you should restart, because any packages you already imported may be unchanged or inconsistent". > The > use-case I'd be most expecting is this: > >>>> import spam > Traceback (most recent call last): > File "", line 1, in > ImportError: No module named 'spam' >>>> install("spam") > ... chuggity chug chug ... >>>> import spam > > In the uncommon case where spam depends on ham v1.4.7 or newer *AND* > you already have ham <1.4.7 installed, a simple message should > suffice. (Oh, and you also have to not have any version of spam > installed already, else you won't be able to use install() anyway.) > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Mon Sep 7 07:25:36 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 6 Sep 2015 22:25:36 -0700 Subject: [Python-ideas] Wheels For ... In-Reply-To: References: <55EC78E9.1050300@mail.de> <20150907012645.GX19373@ando.pearwood.info> Message-ID: On Sep 6, 2015, at 21:20, Donald Stufft wrote: > > Let's take lxml for > example which binds against libxml2. It needs built on Windows, it needs built > on OSX, it needs built on various Linux distributions in order to cover the > spread of just the common cases. IIRC, Apple included ancient versions (even at the time) of libxml2 up to around 10.7, and at one point they even included one of the broken 2.7.x versions. So a build farm building for 10.6+ (which I think is what python.org builds still target?) is going to build against an ancient libxml2, meaning some features of lxml2 will be disabled, and others may even be broken. Even if I'm remembering wrong about Apple, I'm sure there are linux distros with similar issues. Fortunately, lxml has a built-in option (triggered by an env variable) for dealing with this, by downloading the source, building a local copy of the libs, and statically linking them into lxml, but that means you need some way for a package to specify env variables to be set on the build server. And can you expect most libraries with similar issues to do the same? (I don't know how many packages actually have similar problems, but since you specifically mentioned lxml as your example, and I had headaches building it for a binary-distributed app supporting 10.6-10.9 a few years ago, I happened to remember this problem.) From njs at pobox.com Mon Sep 7 07:39:39 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 6 Sep 2015 22:39:39 -0700 Subject: [Python-ideas] Wheels For ... In-Reply-To: References: <55EC78E9.1050300@mail.de> <20150907012645.GX19373@ando.pearwood.info> Message-ID: On Sep 6, 2015 10:28 PM, "Andrew Barnert via Python-ideas" < python-ideas at python.org> wrote: > > On Sep 6, 2015, at 21:20, Donald Stufft wrote: > > > > Let's take lxml for > > example which binds against libxml2. It needs built on Windows, it needs built > > on OSX, it needs built on various Linux distributions in order to cover the > > spread of just the common cases. > > IIRC, Apple included ancient versions (even at the time) of libxml2 up to around 10.7, and at one point they even included one of the broken 2.7.x versions. So a build farm building for 10.6+ (which I think is what python.org builds still target?) is going to build against an ancient libxml2, meaning some features of lxml2 will be disabled, and others may even be broken. Even if I'm remembering wrong about Apple, I'm sure there are linux distros with similar issues. > > Fortunately, lxml has a built-in option (triggered by an env variable) for dealing with this, by downloading the source, building a local copy of the libs, and statically linking them into lxml, but that means you need some way for a package to specify env variables to be set on the build server. And can you expect most libraries with similar issues to do the same? Yes, you can! :-) I mean, not everyone will necessarily use it, but adding code like if "PYPI_BUILD_SERVER" in os.environ: do_static_link = True to your setup.py is *wayyyy* easier than buying an OS X machine and maintaining it and doing manual builds at every release. Or finding a volunteer who has an OS X box and nagging them at every release and dealing with trust hassles. And there are a lot of packages out there that just have some cython files in them for speedups with no external dependencies, or whatever. A build farm wouldn't have to be perfect to be extremely useful. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Sep 7 07:46:44 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 07 Sep 2015 07:46:44 +0200 Subject: [Python-ideas] One way to do format and print (was: Desperate need for enhanced print function) In-Reply-To: References: Message-ID: <55ED24C4.9000205@mail.de> On 07.09.2015 01:48, Nick Coghlan wrote: > As such, it currently appears likely that Python 3.6 will allow you and > your peers to write output messages like this: > > print(f"Hello, I am {b}. My favorite number is {a}.") > > as a simpler alternative to the current options: > > print("Hello, I am ", b, ". My favorite number is ", a, ".", sep="") > print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") > print("Hello, I am {}. My favorite number is {}.".format(b, a)) > print("Hello, I am {b}. My favorite number is {a}.".format_map(locals())) > print("Hello, I am %s. My favorite number is %s." % (b, a)) Wow, that is awesome and awkward at the same time. Shouldn't Python 3.7 deprecate at least some of them? (Just looking at the Zen of Python and https://xkcd.com/927/ ) Best, Sven From stephen at xemacs.org Mon Sep 7 09:26:11 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 07 Sep 2015 16:26:11 +0900 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <87d1xv2e04.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <878u8i3d2k.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > Tcl/Tk, and Tkinter for all pre-installed Pythons but 2.3, have > been included with every OS X since they started pre-installing > 2.5. My mistake, it's only MacPorts where I don't have it. I used MacPorts' all-lowercase spelling, which doesn't work in the system Python. (The capitalized spelling doesn't work in MacPorts.) > And it works with all python.org installs for 10.6 or later, all > Homebrew default installs, standard source builds... Just about > anything besides MacPorts (which seems to want to build Tkinter > against its own Tcl/Tk instead of Apple's) I recall having problems with trying to build and run against the system Tcl/Tk in both source and MacPorts, but that was a *long* time ago (2.6-ish). Trying it now, on my Mac OS X Yosemite system python 2.7.10, "root=Tkinter.Tk()" creates and displays a window, but doesn't pop it up. In fact, "root.tkraise()" doesn't, either. Oops. On this system, IDLE has the same problem with its initial window, and furthermore complains that Tcl/Tk 8.5.9 is unstable. Quite possibly this window-raising issue is Just Me. But based on my own experience, it is not at all obvious that ensuring availability of a GUI is possible in the same way we can ensure pip. > Also, why do you think Qt would be less of a problem? I don't. I think "ensure PyQt" would be a huge burden, much greater than Tkinter. Bottom line: IMO, at this point in time, if it has to Just Work, it has to Work Without GUI. (Modulo the possibility that we can use an HTML server and borrow the display engine from the platform web browser. I think I already mentioned that, and I think it's really the way to go. People who *don't* have a web browser probably can handle "python -m pip ..." without StackOverflow.) From rosuav at gmail.com Mon Sep 7 10:23:13 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 7 Sep 2015 18:23:13 +1000 Subject: [Python-ideas] One way to do format and print (was: Desperate need for enhanced print function) In-Reply-To: <55ED24C4.9000205@mail.de> References: <55ED24C4.9000205@mail.de> Message-ID: On Mon, Sep 7, 2015 at 3:46 PM, Sven R. Kunze wrote: > On 07.09.2015 01:48, Nick Coghlan wrote: >> >> As such, it currently appears likely that Python 3.6 will allow you and >> your peers to write output messages like this: >> >> print(f"Hello, I am {b}. My favorite number is {a}.") >> >> as a simpler alternative to the current options: >> >> print("Hello, I am ", b, ". My favorite number is ", a, ".", sep="") >> print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") >> print("Hello, I am {}. My favorite number is {}.".format(b, a)) >> print("Hello, I am {b}. My favorite number is >> {a}.".format_map(locals())) >> print("Hello, I am %s. My favorite number is %s." % (b, a)) > > > Wow, that is awesome and awkward at the same time. > > Shouldn't Python 3.7 deprecate at least some of them? (Just looking at the > Zen of Python and https://xkcd.com/927/ ) Which would you deprecate? print("Hello, I am ", b, ". My favorite number is ", a, ".", sep="") The print function stringifies all its arguments and outputs them, joined by a separator. Aside from the 2/3 compatibility requirement for single-argument print calls, there's no particular reason to deprecate this. In any case, this isn't "yet another way to format strings", it's a feature of print. print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") String concatenation is definitely not going away; but even without PEP 498, I would prefer to use percent formatting or .format() above this. Its main advantage over those is that the expressions are in the right place, which PEP 498 also offers; if it lands, I fully expect 3.6+ code to use it rather than this. But the _functionality_ can't be taken away. print("Hello, I am {}. My favorite number is {}.".format(b, a)) This one is important for non-literals. It's one of the two main ways of formatting strings... print("Hello, I am %s. My favorite number is %s." % (b, a)) ... and this is the other. Being available for non-literals means they can be used with i18n, string tables, and other transformations. Percent formatting is similar to what other C-derived languages have, and .format() has certain flexibilities, so neither is likely to be deprecated any time soon. print("Hello, I am {b}. My favorite number is {a}.".format_map(locals())) This one, though, is a bad idea for several reasons. Using locals() for formatting is restricted - no globals, no expressions, and no nonlocals that aren't captured in some other way. If this one, and this one alone, can be replaced by f-string usage, it's done its job. ChrisA From tjreedy at udel.edu Mon Sep 7 10:53:00 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 7 Sep 2015 04:53:00 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <878u8i3d2k.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <87d1xv2e04.fsf@uwakimon.sk.tsukuba.ac.jp> <878u8i3d2k.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9/7/2015 3:26 AM, Stephen J. Turnbull wrote: > Andrew Barnert writes: > > > Tcl/Tk, and Tkinter for all pre-installed Pythons but 2.3, have > > been included with every OS X since they started pre-installing > > 2.5. > > My mistake, it's only MacPorts where I don't have it. I used > MacPorts' all-lowercase spelling, which doesn't work in the system > Python. (The capitalized spelling doesn't work in MacPorts.) > > > And it works with all python.org installs for 10.6 or later, all > > Homebrew default installs, standard source builds... Just about > > anything besides MacPorts (which seems to want to build Tkinter > > against its own Tcl/Tk instead of Apple's) My impression is that MacParts builds Tkinter 8.6 instead of 8.5. > I recall having problems with trying to build and run against the > system Tcl/Tk in both source and MacPorts, but that was a *long* time > ago (2.6-ish). Trying it now, on my Mac OS X Yosemite system python > 2.7.10, "root=Tkinter.Tk()" creates and displays a window, but doesn't > pop it up. In fact, "root.tkraise()" doesn't, either. Oops. On this > system, IDLE has the same problem with its initial window, and > furthermore complains that Tcl/Tk 8.5.9 is unstable. Mac users who download the PSF Mac installer and want to use tkinter should read https://www.python.org/download/mac/tcltk/ Before the redesign, there was a link to this from the download page, but the redesign seems to have removed it. The page mentions that there may be a window update problem with the apple tk. -- Terry Jan Reedy From mal at egenix.com Mon Sep 7 11:01:12 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 07 Sep 2015 11:01:12 +0200 Subject: [Python-ideas] Desperate need for enhanced print function In-Reply-To: References: Message-ID: <55ED5258.2020804@egenix.com> On 07.09.2015 01:48, Nick Coghlan wrote: > On 6 September 2015 at 05:33, Anand Krishnakumar > wrote: >> print("Hello, I am ", b, ". My favorite number is ", a, ".") >> >> I'm 14 and I came up with this idea after seeing my fellow classmates at >> school struggling to do something like this with the standard print >> statement. >> Sure, you can use the format method but won't that be a bit too much for >> beginners? (Also, casting is inevitable in every programmer's career) > > Hi Anand, > > Your feedback reflects a common point of view on the surprising > difficulty of producing nicely formatted messages from Python code. As > such, it currently appears likely that Python 3.6 will allow you and > your peers to write output messages like this: > > print(f"Hello, I am {b}. My favorite number is {a}.") > > as a simpler alternative to the current options: > > print("Hello, I am ", b, ". My favorite number is ", a, ".", sep="") > print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") > print("Hello, I am {}. My favorite number is {}.".format(b, a)) > print("Hello, I am {b}. My favorite number is {a}.".format_map(locals())) > print("Hello, I am %s. My favorite number is %s." % (b, a)) No need to wait for Python 3.6. Since print is a function, you can easily override it using your own little helper to make things easier for you. And this works in all Python versions starting with Python 2.6: """ # For Python 2 you need to make print a function first: from __future__ import print_function import sys _orig_print = print # Use .format() as basis for print() def fprint(template, *args, **kws): caller = sys._getframe(1) context = caller.f_locals _orig_print(template.format(**context), *args, **kws) # Use C-style %-formatting as basis for print() def printf(template, *args, **kws): caller = sys._getframe(1) context = caller.f_locals _orig_print(template % context, *args, **kws) # Examples: a = 1 fprint('a = {a}') printf('a = %(a)s') # Let's use fprint() as standard print() in this module: print = fprint b = 3 print('b = {b}') """ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 07 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-08-27: Released eGenix mx Base 3.2.9 ... http://egenix.com/go83 2015-09-18: PyCon UK 2015 ... 11 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From p.f.moore at gmail.com Mon Sep 7 12:57:10 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 7 Sep 2015 11:57:10 +0100 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 5 September 2015 at 09:30, Nick Coghlan wrote: > Unfortunately, I've yet to convince the rest of PyPA (let alone the > community at large) that telling people to call "pip" directly is *bad > advice* (as it breaks in too many cases that beginners are going to > encounter), so it would be helpful if folks helping beginners on > python-list and python-tutor could provide feedback supporting that > perspective by filing an issue against > https://github.com/pypa/python-packaging-user-guide I would love to see "python -m pip" (or where the launcher is appropriate, the shorter "py -m pip") be the canonical invocation used in all documentation, discussion and advice on running pip. The main problems seem to be (1) "but just typing "pip" is shorter and easier to remember", (2) "I don't understand why pip can't just be a normal command" and sometimes (3) "isn't this just on Windows because you can't update pip in place on Windows" (no it isn't, but it's a common misconception of the issue). But I would agree with Nick, and recommend that anyone advising people on how to use pip, *especially* if you are helping them with issues, to always use "python -m pip" as the canonical command. If you need to explain why, say that this makes sure that you run pip from the correct Python interpreter, that's the basic point here. Paul From encukou at gmail.com Mon Sep 7 13:09:03 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 7 Sep 2015 13:09:03 +0200 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150907021802.GY19373@ando.pearwood.info> Message-ID: > * I believe integration with systems like conda, PyPM, and the > Enthought installer should be addressed through a plugin model in pip, > rather than directly in the standard library Perhaps integration with RPM/APT could use that as well. If only to require some kind of --im-realy-sure flag for "sudo pip". From stephen at xemacs.org Mon Sep 7 13:33:50 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 07 Sep 2015 20:33:50 +0900 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <87d1xv2e04.fsf@uwakimon.sk.tsukuba.ac.jp> <878u8i3d2k.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <874mj631lt.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > My impression is that MacParts builds Tkinter 8.6 instead of 8.5. If you mean that MacPorts' current Tcl and Tk ports install Tcl/Tk 8.6, that is correct. > Mac users who download the PSF Mac installer and want to use tkinter > should read > https://www.python.org/download/mac/tcltk/ > The page mentions that there may be a window update problem with the > apple tk. It also mentions that various Tk versions "in common use" are unsupported by the python.org-installed Python, and in particular not the Cocoa Tk. I suppose it's not hard to do that? Or maybe chances are that the X11 and Cocoa Tks "just work", but aren't tested for the Mac installers? From abarnert at yahoo.com Mon Sep 7 14:03:18 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 7 Sep 2015 05:03:18 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <874mj631lt.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <87d1xv2e04.fsf@uwakimon.sk.tsukuba.ac.jp> <878u8i3d2k.fsf@uwakimon.sk.tsukuba.ac.jp> <874mj631lt.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <509A9271-AE8C-4E66-97AF-2518C40E95D3@yahoo.com> On Sep 7, 2015, at 04:33, Stephen J. Turnbull wrote: > > Terry Reedy writes: > >> My impression is that MacParts builds Tkinter 8.6 instead of 8.5. > > If you mean that MacPorts' current Tcl and Tk ports install Tcl/Tk 8.6, > that is correct. > >> Mac users who download the PSF Mac installer and want to use tkinter >> should read >> https://www.python.org/download/mac/tcltk/ > >> The page mentions that there may be a window update problem with the >> apple tk. > > It also mentions that various Tk versions "in common use" are > unsupported by the python.org-installed Python, and in particular not > the Cocoa Tk. I suppose it's not hard to do that? Or maybe chances > are that the X11 and Cocoa Tks "just work", but aren't tested for the > Mac installers? It's not just a matter of "not tested"; there are actual glitches with some of the versions, including two pretty serious ones that can lead to a freeze, or to some window management commands being ignored. But once you have some experience with it, and enough test machines of course, it's not actually that hard to build a binary-shippable GUI app that avoids all of these problems and runs against any of Apple's Tk versions from 10.6+ and against the Python.org recommended versions (which I know, because I've done it, at least with Python 2.7 and 3.3). Making it work reliably from the REPL, or for a script that's not wrapped as a .app, is definitely a lot less fun. But people who want to install from within the REPL or the system shell probably don't want the GUI. I don't know about making it work reliably from within IDLE. I don't see any reason IDLE couldn't just launch a .app on Mac if that's a problem, but you have to remember the extra fun bit that the app will get its environment from LaunchServices, not IDLE, so you'd need some other way to tell it to use the current venv. (Possibly this just means the app is linked into the venv?) From steve at pearwood.info Mon Sep 7 14:19:35 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 7 Sep 2015 22:19:35 +1000 Subject: [Python-ideas] One way to do format and print (was: Desperate need for enhanced print function) In-Reply-To: References: <55ED24C4.9000205@mail.de> Message-ID: <20150907121932.GE19373@ando.pearwood.info> On Mon, Sep 07, 2015 at 06:23:13PM +1000, Chris Angelico wrote: > print("Hello, I am {b}. My favorite number is {a}.".format_map(locals())) > > This one, though, is a bad idea for several reasons. Such as what? > Using locals() > for formatting is restricted - no globals, no expressions, and no > nonlocals that aren't captured in some other way. That's a feature, not a bug. locals(), by definition, only includes locals. If you want globals, non-locals, built-ins, expressions, or the kitchen sink, you can't get them from locals(). Just because the locals() trick doesn't handle every possible scenario doesn't make it a bad idea for those cases it does handle, any more than it is a bad idea to use 2**5 just because the ** operator doesn't handle the 3-argument form of pow(). I probably wouldn't use the locals() form if the variable names were hard-coded like that, especially for just two of them: "Hello, I am {b}. My favorite number is {a}.".format(a=a, b=b) Where the locals() trick comes in handy is when your template string is not hard-coded: if greet: template = "Hello, I am {name}, and my favourite %s is {%s}." else: template = "My favourite %s is {%s}." if condition: template = template % ("number", "x") else: template = template % ("colour", "c") print(template.format_map(locals()) -- Steve From rosuav at gmail.com Mon Sep 7 14:21:47 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 7 Sep 2015 22:21:47 +1000 Subject: [Python-ideas] One way to do format and print (was: Desperate need for enhanced print function) In-Reply-To: <20150907121932.GE19373@ando.pearwood.info> References: <55ED24C4.9000205@mail.de> <20150907121932.GE19373@ando.pearwood.info> Message-ID: On Mon, Sep 7, 2015 at 10:19 PM, Steven D'Aprano wrote: > On Mon, Sep 07, 2015 at 06:23:13PM +1000, Chris Angelico wrote: > >> print("Hello, I am {b}. My favorite number is {a}.".format_map(locals())) >> >> This one, though, is a bad idea for several reasons. > > Such as what? > > >> Using locals() >> for formatting is restricted - no globals, no expressions, and no >> nonlocals that aren't captured in some other way. > > That's a feature, not a bug. > > locals(), by definition, only includes locals. If you want globals, > non-locals, built-ins, expressions, or the kitchen sink, you can't get > them from locals(). Just because the locals() trick doesn't handle every > possible scenario doesn't make it a bad idea for those cases it does > handle, any more than it is a bad idea to use 2**5 just because the ** > operator doesn't handle the 3-argument form of pow(). > > I probably wouldn't use the locals() form if the variable names were > hard-coded like that, especially for just two of them: > > "Hello, I am {b}. My favorite number is {a}.".format(a=a, b=b) > > Where the locals() trick comes in handy is when your template string is > not hard-coded: > > if greet: > template = "Hello, I am {name}, and my favourite %s is {%s}." > else: > template = "My favourite %s is {%s}." > if condition: > template = template % ("number", "x") > else: > template = template % ("colour", "c") > print(template.format_map(locals()) It's still a poor equivalent for the others. In terms of "why do we have so many different ways to do the same thing", the response is "the good things to do with format_map(locals()) are not the things you can do with f-strings". If what you're looking for can be done with either, it's almost certainly not better to use locals(). ChrisA From srkunze at mail.de Mon Sep 7 18:22:12 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 07 Sep 2015 18:22:12 +0200 Subject: [Python-ideas] Wheels For ... In-Reply-To: References: <55EC78E9.1050300@mail.de> <201509061954.t86Jspjg011546@fido.openend.se> Message-ID: <55EDB9B4.7020909@mail.de> On 07.09.2015 02:26, Nick Coghlan wrote: > For the build farm idea, it's not just writing the code initially, > it's operating the resulting infrastructure, and that's a much bigger > ongoing commitment. Automatically building wheels for source uploads > is definitely on the wish list, there are just a large number of other > improvements needed before it's feasible. Could you be more specific on these improvements, Nick? Best, Sven From srkunze at mail.de Mon Sep 7 18:27:42 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 07 Sep 2015 18:27:42 +0200 Subject: [Python-ideas] Wheels For ... In-Reply-To: References: <55EC78E9.1050300@mail.de> <20150907012645.GX19373@ando.pearwood.info> Message-ID: <55EDBAFE.1040303@mail.de> On 07.09.2015 07:39, Nathaniel Smith wrote: > > > Fortunately, lxml has a built-in option (triggered by an env > variable) for dealing with this, by downloading the source, building a > local copy of the libs, and statically linking them into lxml, but > that means you need some way for a package to specify env variables to > be set on the build server. And can you expect most libraries with > similar issues to do the same? > > Yes, you can! :-) > > I mean, not everyone will necessarily use it, but adding code like > > if "PYPI_BUILD_SERVER" in os.environ: > do_static_link = True > > to your setup.py is *wayyyy* easier than buying an OS X machine and > maintaining it and doing manual builds at every release. Or finding a > volunteer who has an OS X box and nagging them at every release and > dealing with trust hassles. > You bet what I just needed to do. Depending on somebody else machine is really frustrating. > And there are a lot of packages out there that just have some cython > files in them for speedups with no external dependencies, or whatever. > A build farm wouldn't have to be perfect to be extremely useful. > I agree. Just good enough suffices for 80% of all the packages to be in good shape. Nick mentioned some improvements that are necessary before we can indulge such a building farm (except the farm itself). Best, Sven From srkunze at mail.de Mon Sep 7 18:40:33 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 07 Sep 2015 18:40:33 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: <20150907121932.GE19373@ando.pearwood.info> References: <55ED24C4.9000205@mail.de> <20150907121932.GE19373@ando.pearwood.info> Message-ID: <55EDBE01.7030408@mail.de> On 07.09.2015 14:19, Steven D'Aprano wrote: > I probably wouldn't use the locals() form if the variable names were > hard-coded like that, especially for just two of them: > > "Hello, I am {b}. My favorite number is {a}.".format(a=a, b=b) > > Where the locals() trick comes in handy is when your template string is > not hard-coded: > > if greet: > template = "Hello, I am {name}, and my favourite %s is {%s}." > else: > template = "My favourite %s is {%s}." > if condition: > template = template % ("number", "x") > else: > template = template % ("colour", "c") > print(template.format_map(locals()) Err? I rather think you wouldn't pass code review. Best, Sven From srkunze at mail.de Mon Sep 7 18:48:15 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 07 Sep 2015 18:48:15 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> Message-ID: <55EDBFCF.2030301@mail.de> On 07.09.2015 10:23, Chris Angelico wrote: > Which would you deprecate? Hard to tell. Let me see what you got me here. Remember, I am just asking as I don't know better: > print("Hello, I am ", b, ". My favorite number is ", a, ".", sep="") > > The print function stringifies all its arguments and outputs them, > joined by a separator. Aside from the 2/3 compatibility requirement > for single-argument print calls, there's no particular reason to > deprecate this. In any case, this isn't "yet another way to format > strings", it's a feature of print. Still necessary? > print("Hello, I am " + b + ". My favorite number is " + str(a) + ".") > > String concatenation is definitely not going away; but even without > PEP 498, I would prefer to use percent formatting or .format() above > this. Its main advantage over those is that the expressions are in the > right place, which PEP 498 also offers; if it lands, I fully expect > 3.6+ code to use it rather than this. But the _functionality_ can't be > taken away. For sure, however that shouldn't be used in the official documentation if so now, right? > print("Hello, I am {}. My favorite number is {}.".format(b, a)) > > This one is important for non-literals. It's one of the two main ways > of formatting strings... > > print("Hello, I am %s. My favorite number is %s." % (b, a)) > > ... and this is the other. Being available for non-literals means they > can be used with i18n, string tables, and other transformations. > Percent formatting is similar to what other C-derived languages have, Still necessary? Really, really necessary? Or just because we can? > and .format() has certain flexibilities, so neither is likely to be > deprecated any time soon. format has its own merits as it works like f-strings but on non-literals. (again this one-way/one-syntax thing) > print("Hello, I am {b}. My favorite number is {a}.".format_map(locals())) > > This one, though, is a bad idea for several reasons. Using locals() > for formatting is restricted - no globals, no expressions, and no > nonlocals that aren't captured in some other way. If this one, and > this one alone, can be replaced by f-string usage, it's done its job. Well sure, I we all agree on not using that until f-strings are released. Best, Sven From ron3200 at gmail.com Mon Sep 7 19:32:22 2015 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 7 Sep 2015 12:32:22 -0500 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> <20150907121932.GE19373@ando.pearwood.info> Message-ID: On 09/07/2015 07:21 AM, Chris Angelico wrote: >> Where the locals() trick comes in handy is when your template string is >> >not hard-coded: >> > >> > if greet: >> > template = "Hello, I am {name}, and my favourite %s is {%s}." >> > else: >> > template = "My favourite %s is {%s}." >> > if condition: >> > template = template % ("number", "x") >> > else: >> > template = template % ("colour", "c") >> > print(template.format_map(locals()) > It's still a poor equivalent for the others. In terms of "why do we > have so many different ways to do the same thing", the response is > "the good things to do with format_map(locals()) are not the things > you can do with f-strings". If what you're looking for can be done > with either, it's almost certainly not better to use locals(). The ability for a format string or template to take a mapping is very useful. Weather or not it's ok for that mapping to be from locals() is a separate issue and depends on other factors as well. It may be perfectly fine in some cases, but not so in others. The issue with + concatenation is it doesn't call str on the objects. That can't be changed. A new operator (or methods on str) that does that could work. It's still not as concise as f-strings which I think is a major motivation for having them. Cheers, Ron From skrah at bytereef.org Mon Sep 7 19:39:24 2015 From: skrah at bytereef.org (Stefan Krah) Date: Mon, 7 Sep 2015 17:39:24 +0000 (UTC) Subject: [Python-ideas] One way to do format and print References: <55ED24C4.9000205@mail.de> <55EDBFCF.2030301@mail.de> Message-ID: Sven R. Kunze writes: > > > > print("Hello, I am %s. My favorite number is %s." % (b, a)) > > > > ... and this is the other. Being available for non-literals means they > > can be used with i18n, string tables, and other transformations. > > Percent formatting is similar to what other C-derived languages have, > > Still necessary? Really, really necessary? Or just because we can? > Absolutely. For many Python users this is the preferred form. I find that of all variations, this one is the most readable. Stefan Krah From srkunze at mail.de Mon Sep 7 20:57:44 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 7 Sep 2015 20:57:44 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> <55EDBFCF.2030301@mail.de> Message-ID: <55EDDE28.5020308@mail.de> On 07.09.2015 19:39, Stefan Krah wrote: > Sven R. Kunze writes: >>> print("Hello, I am %s. My favorite number is %s." % (b, a)) >>> >>> ... and this is the other. Being available for non-literals means they >>> can be used with i18n, string tables, and other transformations. >>> Percent formatting is similar to what other C-derived languages have, >> Still necessary? Really, really necessary? Or just because we can? >> > Absolutely. For many Python users this is the preferred form. I find > that of all variations, this one is the most readable. Okay, convinced. ;) No, seriously, what would you do when Python would deprecate % syntax? Could you switch to {} ? Best, Sven From skrah at bytereef.org Mon Sep 7 22:29:46 2015 From: skrah at bytereef.org (Stefan Krah) Date: Mon, 7 Sep 2015 20:29:46 +0000 (UTC) Subject: [Python-ideas] One way to do format and print References: <55ED24C4.9000205@mail.de> <55EDBFCF.2030301@mail.de> <55EDDE28.5020308@mail.de> Message-ID: Sven R. Kunze writes: > >>> Percent formatting is similar to what other C-derived languages have, > >> Still necessary? Really, really necessary? Or just because we can? > >> > > Absolutely. For many Python users this is the preferred form. I find > > that of all variations, this one is the most readable. > > Okay, convinced. ;) > > No, seriously, what would you do when Python would deprecate % syntax? > Could you switch to {} ? There are many conservative Python users who are probably underrepresented on this list. All can say is that %-formatting never went out of fashion, see e.g. https://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Streams , https://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Strings , https://golang.org/pkg/fmt/ and many others. Fortunately, there are no plans to deprecate %-formatting (latest reference is PEP-498). Stefan Krah From abarnert at yahoo.com Mon Sep 7 22:47:59 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 7 Sep 2015 13:47:59 -0700 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55EDBFCF.2030301@mail.de> References: <55ED24C4.9000205@mail.de> <55EDBFCF.2030301@mail.de> Message-ID: On Sep 7, 2015, at 09:48, Sven R. Kunze wrote: > >> On 07.09.2015 10:23, Chris Angelico wrote: >> Which would you deprecate? > > Hard to tell. Let me see what you got me here. Remember, I am just asking as I don't know better: > >> print("Hello, I am ", b, ". My favorite number is ", a, ".", sep="") >> >> The print function stringifies all its arguments and outputs them, >> joined by a separator. Aside from the 2/3 compatibility requirement >> for single-argument print calls, there's no particular reason to >> deprecate this. In any case, this isn't "yet another way to format >> strings", it's a feature of print. > > Still necessary? Necessary? No. Useful? Yes. For example, in a 5-line script I wrote last night, I've got print(head, *names, sep='\t'). I could have used print('\t'.join(chain([head], names)) instead--in fact, any use of multi-argument print can be replaced by print(sep.join(map(str, args)))--but that's less convenient, less readable, and less likely to occur to novices. And there are plenty of other alternatives, from print('{}\t{}'.format(head, '\t'.join(names)) to print(('%s\t'*(len(names)+1) % ((head,)+names))[:-1]) to that favorite of novices on Stack Overflow, print(str([head]+names)[1:-1].replace(', ', '\t')), but would you really want to use any of these? Or course in a "real program" that I needed to use more than once, I would have used the csv module instead of a 5-line script driven by a 3-line shell script, and there's a limit to how far you want to push the argument for quick&dirty scripting/interactive convenience... but that limit isn't "none at all". When you start trying to mix manual adding of spaces with sep='' to get a sentence formatted exactly right, that's a good sign that you should be using format instead of multi-arg print; when you start trying to add up format strings, that's a good sign you should be using either something simpler or something more complicated. That is an extra thing novices have to get the hang of to become proficient Python programmers that doesn't exist for C or Perl. But the fact that you _can_ use Python like C, but don't have to, isn't really a downside of Python. From abarnert at yahoo.com Mon Sep 7 23:00:19 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 7 Sep 2015 14:00:19 -0700 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55EDDE28.5020308@mail.de> References: <55ED24C4.9000205@mail.de> <55EDBFCF.2030301@mail.de> <55EDDE28.5020308@mail.de> Message-ID: On Sep 7, 2015, at 11:57, Sven R. Kunze wrote: > >> On 07.09.2015 19:39, Stefan Krah wrote: >> Sven R. Kunze writes: >>>> print("Hello, I am %s. My favorite number is %s." % (b, a)) >>>> >>>> ... and this is the other. Being available for non-literals means they >>>> can be used with i18n, string tables, and other transformations. >>>> Percent formatting is similar to what other C-derived languages have, >>> Still necessary? Really, really necessary? Or just because we can? >> Absolutely. For many Python users this is the preferred form. I find >> that of all variations, this one is the most readable. > > Okay, convinced. ;) > > No, seriously, what would you do when Python would deprecate % syntax? Could you switch to {} ? There's some confusion over this because the str.format proposal originally suggested deprecating %, and there are still some bloggers and StackOverflow users and so on that claim it does (sometimes even citing the PEP, which explicitly says the opposite). But there will always be cases that % is better for, such as: * sharing a table of format strings with code in C or another language * simple formats that need to be done fast in a loop * formatting strings to use as str.format format strings * messages that you've converted from logging to real output * ASCII-based wire protocols or file formats So, even if it weren't for the backward compatibility issue for millions of lines of old code (and thousands of stubborn old coders), I doubt it would ever go away. At most, the tutorial and other docs might change to de-emphasize it and make it seem more like an "expert" feature only useful for cases like the above (as is already true for string.Template, and may become true for both % and str.format after f-strings reach widespread use--but nobody can really predict that until f-strings are actually in practical use). TOOWTDI is a guideline that has to balance against other guidelines, not a strict rule that always trumps everything else, and unless someone can come up with something new that's so much better than both format and % that it's clearly worth overcoming the inertia rather than just being a case of the old standards joke (insert XKCD reference here), there will be two ways to do this. From random832 at fastmail.com Mon Sep 7 22:52:52 2015 From: random832 at fastmail.com (Random832) Date: Mon, 07 Sep 2015 16:52:52 -0400 Subject: [Python-ideas] One way to do format and print References: <55ED24C4.9000205@mail.de> Message-ID: Chris Angelico writes: > ... and this is the other. Being available for non-literals means they > can be used with i18n, string tables, and other transformations. > Percent formatting is similar to what other C-derived languages have, > and .format() has certain flexibilities, so neither is likely to be > deprecated any time soon. I've never understood why .format was invented in the first place, rather than extending percent-formatting to have the features that it has over it. From rymg19 at gmail.com Tue Sep 8 00:39:26 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 07 Sep 2015 17:39:26 -0500 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> Message-ID: On September 7, 2015 3:52:52 PM CDT, Random832 wrote: >Chris Angelico writes: >> ... and this is the other. Being available for non-literals means >they >> can be used with i18n, string tables, and other transformations. >> Percent formatting is similar to what other C-derived languages have, >> and .format() has certain flexibilities, so neither is likely to be >> deprecated any time soon. > >I've never understood why .format was invented in the first place, >rather than extending percent-formatting to have the features that it >has over it. > t = (1, 2, 3) # 400 lines later... print '%s' % t # oops! >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. From rosuav at gmail.com Tue Sep 8 01:07:02 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Sep 2015 09:07:02 +1000 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> <20150907121932.GE19373@ando.pearwood.info> Message-ID: On Tue, Sep 8, 2015 at 3:32 AM, Ron Adam wrote: > The ability for a format string or template to take a mapping is very > useful. Weather or not it's ok for that mapping to be from locals() is a > separate issue and depends on other factors as well. It may be perfectly > fine in some cases, but not so in others. I think that's the most we're ever going to have in terms of deprecations. None of the _functionality_ of any of the examples will be going away, but some of them will be non-recommended ways of doing certain things. I definitely agree that taking format values from a mapping is useful. ChrisA From dan at tombstonezero.net Tue Sep 8 01:13:49 2015 From: dan at tombstonezero.net (Dan Sommers) Date: Mon, 7 Sep 2015 23:13:49 +0000 (UTC) Subject: [Python-ideas] One way to do format and print References: <55ED24C4.9000205@mail.de> Message-ID: On Mon, 07 Sep 2015 17:39:26 -0500, Ryan Gonzalez wrote: > On September 7, 2015 3:52:52 PM CDT, Random832 wrote: >> I've never understood why .format was invented in the first place, >> rather than extending percent-formatting to have the features that it >> has over it. > > t = (1, 2, 3) > # 400 lines later... > print '%s' % t # oops! t = (1, 2, 3) # 400 lines later t *= 4 # oops? Why do you (Ryan Gonzalez) have names that are important enough to span over 400 lines of source code but not important enough to call something more interesting than "t"? And why are we conflating the print function with string formatting with natural language translation in the first place? From rosuav at gmail.com Tue Sep 8 01:15:05 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 8 Sep 2015 09:15:05 +1000 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> <55EDBFCF.2030301@mail.de> <55EDDE28.5020308@mail.de> Message-ID: On Tue, Sep 8, 2015 at 7:00 AM, Andrew Barnert via Python-ideas wrote: > But there will always be cases that % is better for, such as: > > * sharing a table of format strings with code in C or another language > * simple formats that need to be done fast in a loop > * formatting strings to use as str.format format strings > * messages that you've converted from logging to real output > * ASCII-based wire protocols or file formats Supporting this last one is PEP 461. There are no proposals on the cards to add a b"...".format() method (it's not out of the question, but there are problems to be overcome because of the extreme generality of it), yet we have percent formatting for bytestrings. I think that's a strong indication that percent formatting is fully supported and will be for the future. ChrisA From rymg19 at gmail.com Tue Sep 8 01:25:52 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 07 Sep 2015 18:25:52 -0500 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> Message-ID: <14DCC5DA-01F8-4208-A42D-842211DADC22@gmail.com> On September 7, 2015 6:13:49 PM CDT, Dan Sommers wrote: >On Mon, 07 Sep 2015 17:39:26 -0500, Ryan Gonzalez wrote: > >> On September 7, 2015 3:52:52 PM CDT, Random832 > wrote: > >>> I've never understood why .format was invented in the first place, >>> rather than extending percent-formatting to have the features that >it >>> has over it. >> >> t = (1, 2, 3) >> # 400 lines later... >> print '%s' % t # oops! > >t = (1, 2, 3) ># 400 lines later >t *= 4 # oops? > >Why do you (Ryan Gonzalez) have names that are important enough to span >over 400 lines of source code but not important enough to call >something >more interesting than "t"? > >And why are we conflating the print function with string formatting >with >natural language translation in the first place? You're blowing this out of proportion. I was simply showing how string formatting can be *weird* when it comes to tuples. > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. From random832 at fastmail.com Tue Sep 8 01:28:13 2015 From: random832 at fastmail.com (Random832) Date: Mon, 07 Sep 2015 19:28:13 -0400 Subject: [Python-ideas] One way to do format and print References: <55ED24C4.9000205@mail.de> Message-ID: Ryan Gonzalez writes: > t = (1, 2, 3) > # 400 lines later... > print '%s' % t # oops! I always use % (t,) when intending to format a single object. But anyway, my ideal version of it would have a .format method, but using identical format strings. My real question was what the benefit of the {}-format for format strings is, over an extended %-format. From ncoghlan at gmail.com Tue Sep 8 01:45:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 8 Sep 2015 09:45:22 +1000 Subject: [Python-ideas] Wheels For ... In-Reply-To: <55EDB9B4.7020909@mail.de> References: <55EC78E9.1050300@mail.de> <201509061954.t86Jspjg011546@fido.openend.se> <55EDB9B4.7020909@mail.de> Message-ID: On 8 September 2015 at 02:22, Sven R. Kunze wrote: > On 07.09.2015 02:26, Nick Coghlan wrote: >> >> For the build farm idea, it's not just writing the code initially, >> it's operating the resulting infrastructure, and that's a much bigger >> ongoing commitment. Automatically building wheels for source uploads >> is definitely on the wish list, there are just a large number of other >> improvements needed before it's feasible. > > > Could you be more specific on these improvements, Nick? - PyPI: migrating from the legacy Zope codebase to Warehouse - PyPI: end-to-end content signing (PEPs 458 & 480) - PyPI: automated analytics & dashboards - Tooling: integration with operating systems & other platforms - Python Software Foundation financial sustainability - Python Software Foundation project management capacity - Infrastructure improvements for the CPython workflow Those aren't dependencies of automatic wheel-building per se, but rather are issues that are higher priorities for folks like Donald (in terms of actually getting things done), myself (in terms of collaborating more effectively with other open source ecosystems), and the PSF staff and Board (in terms of ensuring the python.org infrastructure is being appropriately maintained). Running an automated build service is expensive, not primarily in setting it up, but in terms of the ongoing sustaining engineering costs (including security monitoring and response), so before we commit to doing it, we need to know how we're going to fund it. However, most of the PSF's focus at the moment is on getting the things we *already* do [1] on a more sustainable footing, so adding *new* services isn't currently a priority. Cheers, Nick. [1] https://wiki.python.org/moin/PythonSoftwareFoundation/Proposals/StrategicPriorities -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Sep 8 02:31:56 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 8 Sep 2015 10:31:56 +1000 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> Message-ID: On 8 September 2015 at 09:28, Random832 wrote: > > Ryan Gonzalez writes: >> t = (1, 2, 3) >> # 400 lines later... >> print '%s' % t # oops! > > I always use % (t,) when intending to format a single object. But > anyway, my ideal version of it would have a .format method, but using > identical format strings. My real question was what the benefit of the > {}-format for format strings is, over an extended %-format. It turns out PEP 3101 doesn't really go into this, so I guess it was a case where all of us involved in the discussion knew the reasons a new format was needed, so we never wrote them down. As such, it's worth breaking the problem down into a few different subproblems: 1. Invocation via __mod__ 2. Positional formatting 3. Name based formatting 4. Extending formatting to new types in an extensible, backwards compatible way The problems with formatting dictionaries and tuples relate to the "fmt % values" invocation model, rather than the substitution field syntax. As such, we *could* have designed str.format() and str.format_map() around %-interpolation. The reasons we chose not to do that relate to the other problems. For positional formatting of short strings, %-interpolation actually works pretty well, and it has the advantage of being consistent with printf() style APIs in C and C++. This is the use case where it has proven most difficult to get people to switch away from mod-formatting, and is also the approach we used to restore binary interpolation support in Python 3.5. An illustrative example is to compare formatting a floating point number to use 2 decimal places: >>> x = y = 1.0 >>> "%.2f, %.2f" % (x, y) '1.00, 1.00' >>> "{:.2f}, {:.2f}".format(x, y) '1.00, 1.00' I consider the second example there to be *less* readable than the original mod-formatting. These kinds of cases are why we *changed our mind* from "we'd like to deprecate mod-formatting, but we haven't figured out a practical way to do so" to "mod-formatting and brace-formatting are better at different things, so it's actually useful having both of them available". For name based formatting, by contrast, the "%(name)s" syntax is noisy and clumsy compared to the shorter "{name}" format introduced in PEP 3101 (borrowed from C#). There the value has been clear, and so folks have been significantly more amenable to switching away from mod-formatting: >>> "%(x).2f, %(y).2f" % dict(x=x, y=y) '1.00, 1.00' >>> "{x:.2f}, {y:.2f}".format(x=x, y=y) '1.00, 1.00' It's that last example which PEP 498 grants native syntax, with the entire trailing method call being replaced by a simple leading "f": >>> f"{x:.2f}, {y:.2f}" '1.00, 1.00' This gets us back to TOOWTDI (after a long detour away from it), since direct interpolation will clearly be the obvious way to go when interpolating into a literal format string - the other options will only be needed when literal formatting isn't appropriate for some reason. The final reason for introducing a distinct formatting system doesn't relate to syntax, but rather to semantics. Mod-formatting is defined around the builtin types, with "__str__" as the catch-all fallback for interpolating arbitrary objects. PEP 3101 introduced a new *protocol* method (__format__) that allowed classes more control over how their instances were formatted, with the typical example being to allow dates and times to accept strftime formatting strings directly rather than having to make a separate strftime call prior to formatting. Python generally follows a philosophy of "constructs with different semantics should use different syntax" (at least in the core language design), which is reflected in the fact that a new formatting syntax was introduced in conjunction with a new formatting protocol. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Tue Sep 8 03:44:36 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Sep 2015 10:44:36 +0900 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> <55EDBFCF.2030301@mail.de> <55EDDE28.5020308@mail.de> Message-ID: <87r3m91y7v.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > On Tue, Sep 8, 2015 at 7:00 AM, Andrew Barnert via Python-ideas > wrote: > > But there will always be cases that % is better for, such as: [...] > > * ASCII-based wire protocols or file formats > > Supporting this last one is PEP 461. There are no proposals on the > cards to add a b"...".format() method (it's not out of the question, > but there are problems to be overcome Actually, it was proposed and pronounced (immediately on proposal :-). There were no truly difficult technical problems, but Guido decided it was a YAGNI, and often an attractive nuisance. In particular, many of the use cases for bytestring formatting are performance-critical bit- shoveling applications, and adding a few extra method lookups and calls to every formatting operation would be a problem. Many others involve porting Python 2 applications that used str to hold and format "external" strings, and those use %-formatting, not .format. > because of the extreme generality of it), Hm. It seems to me in the PEP 498 discussion that Guido doesn't see generality as a problem to be solved by restricting it, but rather as a characteristic of an implementation that makes it more or less suitable for a given feature. I guess that Guido would insist on having bytes.format be almost identical to str.format, except maybe for a couple of tweaks similar to those added to bytes' % operator. From random832 at fastmail.com Tue Sep 8 03:46:53 2015 From: random832 at fastmail.com (Random832) Date: Mon, 07 Sep 2015 21:46:53 -0400 Subject: [Python-ideas] One way to do format and print References: <55ED24C4.9000205@mail.de> Message-ID: Nick Coghlan writes: > The final reason for introducing a distinct formatting system doesn't > relate to syntax, but rather to semantics. Mod-formatting is defined > around the builtin types, with "__str__" as the catch-all fallback for > interpolating arbitrary objects. PEP 3101 introduced a new *protocol* > method (__format__) that allowed classes more control over how their > instances were formatted, with the typical example being to allow > dates and times to accept strftime formatting strings directly rather > than having to make a separate strftime call prior to formatting. > Python generally follows a philosophy of "constructs with different > semantics should use different syntax" I guess my problem is that I don't consider the fact that %s forces something to string, %f to float, etc, to be desired semantics, I consider it to be a bug that could, and *should*, have been changed by an alternate-universe PEP. There's nothing *good* about the fact that '%.20f' % Decimal('0.1') gives 0.10000000000000000555 instead of 0.10000000000000000000, and that there are no hooks for Decimal to make it do otherwise. There's nothing that would IMO be legitimately broken by allowing it to do so. You could, for example, have object.__format__ fall back on the type conversion semantics, so that it would continue to work with existing types that do not define their own __format__. From ncoghlan at gmail.com Tue Sep 8 04:18:40 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 8 Sep 2015 12:18:40 +1000 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> Message-ID: On 8 September 2015 at 11:46, Random832 wrote: > Nick Coghlan writes: >> The final reason for introducing a distinct formatting system doesn't >> relate to syntax, but rather to semantics. Mod-formatting is defined >> around the builtin types, with "__str__" as the catch-all fallback for >> interpolating arbitrary objects. PEP 3101 introduced a new *protocol* >> method (__format__) that allowed classes more control over how their >> instances were formatted, with the typical example being to allow >> dates and times to accept strftime formatting strings directly rather >> than having to make a separate strftime call prior to formatting. >> Python generally follows a philosophy of "constructs with different >> semantics should use different syntax" > > I guess my problem is that I don't consider the fact that %s forces > something to string, %f to float, etc, to be desired semantics, I > consider it to be a bug that could, and *should*, have been changed by > an alternate-universe PEP. > > There's nothing *good* about the fact that '%.20f' % Decimal('0.1') > gives 0.10000000000000000555 instead of 0.10000000000000000000, and that > there are no hooks for Decimal to make it do otherwise. Ah, but there *is* something good about it: the fact that percent-formatting is restricted to a constrained set of known types makes it fundamentally more *predictable* and more *portable* than brace-formatting. The flexibility of str.format is wonderful if you're only needing to deal with Python code, and Python's type system. It's substantially less wonderful if you're designing formatting operations that need to span multiple languages that only have the primitive core defined by C in common. These characteristics are what make percent-formatting a more suitable approach to binary interpolation than the fully flexible formatting system. Binary interpolation is not only really hard to do right, it's also really hard to *test* - many of the things that can go wrong are driven by the specific data values you choose to test with, rather than being structural errors in the data types you use. These benefits aren't particularly obvious until you try to live without them and figure out why you missed them, but we *have* done that in the 7 years since 2.6 was released, and hence have a good understanding of why brace-formatting wasn't the wholesale replacement for percent-formatting that we originally expected it to be. That said, there *have* been ongoing efforts to improve the numeric formatting capabilities of printf and related operations in C/C++ that we haven't been tracking at the Python level. In relation to decimal support specifically, the C++ write-up at http://open-std.org/JTC1/SC22/WG21/docs/papers/2014/n3871.html also links to the C level proposal and the proposed changes to the interpretation of the floating point codes when working with decimal data types. However, as far as I am aware, there isn't anyone specifically tracking the evolution of printf() formatting codes and reviewing them for applicability to Python's percent-formatting support - it's done more in an ad hoc fashion as folks developing in both Python and C/C++ start using a new formatting code on the C/C++ side of things and request that it also be added to Python. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Tue Sep 8 05:01:44 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 08 Sep 2015 12:01:44 +0900 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> Message-ID: <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> Random832 writes: > But anyway, my ideal version of it would have a .format method, but > using identical format strings. "Identical" is impossible, even you immediately admit that an extension is necessary: > My real question was what the benefit of the {}-format for format > strings is, over an extended %-format. One issue is that the "%(name)s" form proves to be difficult for end-users and translators to get right. Eg, it's a FAQ for Mailman, which uses such format strings for interpolatable footers, including personalized user unsubscribe links and the like. This is just a fact. I don't have any similar evidence that "{}" is better. Introspecting, I find have the whole spec enclosed in parentheses far more readable than the very finicky %-specs. It feels more like a replacement field to me. I also find the braces to be far more readable parenthesization than (round) parentheses (TeX influence there, maybe?) In particular, these two attributes of "{}" are why I use .format by preference even in simple cases where % is both sufficient and clearly more compact. Obviously that's a personal preference but I doubt I'm the only one who feels that way. %-formatting provides no way to indicate which positional parameter goes with which format spec. It was hoped that such a facility might be useful in I18N, where the syntax of translated string often must be rather different from English syntax. That has turned out not to be the case, but that is the way the C world was going at the time. In recent C (well, C89 or C99 ;-) there is a way to do this, but that would require extending the Python %-spec syntax. {}-formatting allows one level of recursion. That is "|{result:{width}d}|".format(result=42, width=10) produces "| 42|". In %-formatting, a recursive syntax would be rather finicky, and an alternative syntax for formatting the format spec would be shocking to people used to %-formatting, I guess. {}-formatting actually admits an arbitrary string after the ":", to be interpreted by the object's __format__ method, rather than by format. The {arg : sign width.prec type} format is respected by the builtin types, but a type like datetime can (and does!) support the full strftime() syntax, eg, "It is now {0:%H:%M:%S} on {0:%Y/%m/%d}.".format(datetime.now()) produced 'It is now 11:58:53 on 2015/09/08.' for me just now. I don't see how equivalent flexibility could be provided with %-spec syntax without twisting it completely out of shape. These last three, besides presenting (more or less minor) technical difficulties for a %-spec extension, run into Guido's allergy to subtle context-dependent differences in syntax, as we've seen in the discussion of whether expression syntax in f-strings should be restricted as compared to "full" expression syntax. That is, the more natural the extension of %-spec syntax we used, the more confusing it would be to users, especially new users reading old code (and wondering why it does things the hard way). OTOH, if it's not a "natural" extension, you lose many of the benefits of an extension in the first place. From random832 at fastmail.com Tue Sep 8 06:27:43 2015 From: random832 at fastmail.com (Random832) Date: Tue, 08 Sep 2015 00:27:43 -0400 Subject: [Python-ideas] One way to do format and print References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: "Stephen J. Turnbull" writes: > Random832 writes: > > > But anyway, my ideal version of it would have a .format method, but > > using identical format strings. > > "Identical" is impossible, even you immediately admit that an > extension is necessary: By "Identical" I meant there would be a single mechanism, including all new features, used for both str.format and str.__mod__ - i.e. the extensions would also be available using the % operator. The extensions would be implemented, but the % operator would not be left behind. Hence, identical. From tritium-list at sdamon.com Tue Sep 8 11:35:57 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Tue, 08 Sep 2015 05:35:57 -0400 Subject: [Python-ideas] NuGet/Chocolatey feed for releases Message-ID: <55EEABFD.8040600@sdamon.com> It would be incredibly convenient, especially for users of AppVayor's continuous integration service, if there were a(n official) repository for chocolatey containing recent releases of python. The official Chocolatey gallery contains installers for the latest 2.7 and 3.4 (as of this post). What I am proposing would contain the most commonly used pythons in testing (2.6 2.7 3.3 3.4 and future releases). I am perfectly willing to set up a repo for my own use, but am posting this to see if there is community support...or psf support... for setting up an official repo. From wolfgang.maier at biologie.uni-freiburg.de Tue Sep 8 12:00:57 2015 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 8 Sep 2015 12:00:57 +0200 Subject: [Python-ideas] new format spec for iterable types Message-ID: Hi, in the parallel "format and print" thread, Andrew Barnert wrote: > For example, in a 5-line script I wrote last night, I've got > print(head, *names, sep='\t'). I could have used > print('\t'.join(chain([head], names)) instead--in fact, any use of > multi-argument print can be replaced by > print(sep.join(map(str, args)))--but that's less convenient, less > readable, and less likely to occur to novices. And there are plenty > of other alternatives, from > print('{}\t{}'.format(head, '\t'.join(names)) to ... That last thing, '{}\t{}'.format(head, '\t'.join(names)), is something I find myself writing relatively often - when I do not want to print the result immediately, but store it - but it is ugly to read with its nested method calls and the separators occurring in two very different places. Now Andrew's comment has prompted me to think about alternative syntax for this and I came up with this idea: What if built in iterable types through their __format__ method supported a format spec string of the form "*separator" and interpreted it as join your elements' formatted representations using "separator" ? A quick and dirty illustration in Python: class myList(list): def __format__ (self, fmt=''): if fmt == '': return str(self) if fmt[0] == '*': sep = fmt[1:] or ' ' return sep.join(format(e) for e in self) else: raise TypeError() head = 99 data = myList(range(10)) s = '{}, {:*, }'.format(head, data) # or s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') print(s) print(s2) # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Thoughts? From abarnert at yahoo.com Tue Sep 8 14:24:22 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 8 Sep 2015 05:24:22 -0700 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: Message-ID: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> On Sep 8, 2015, at 03:00, Wolfgang Maier wrote: > > Hi, > > in the parallel "format and print" thread, Andrew Barnert wrote: > > > > For example, in a 5-line script I wrote last night, I've got > > print(head, *names, sep='\t'). I could have used > > print('\t'.join(chain([head], names)) instead--in fact, any use of > > multi-argument print can be replaced by > > print(sep.join(map(str, args)))--but that's less convenient, less > > readable, and less likely to occur to novices. And there are plenty > > of other alternatives, from > > print('{}\t{}'.format(head, '\t'.join(names)) to > ... > > That last thing, '{}\t{}'.format(head, '\t'.join(names)), is something I find myself writing relatively often - when I do not want to print the result immediately, but store it - but it is ugly to read with its nested method calls and the separators occurring in two very different places. > Now Andrew's comment has prompted me to think about alternative syntax for this and I came up with this idea: > > What if built in iterable types through their __format__ method supported a format spec string of the form "*separator" and interpreted it as join your elements' formatted representations using "separator" ? > A quick and dirty illustration in Python: > > class myList(list): > def __format__ (self, fmt=''): > if fmt == '': > return str(self) > if fmt[0] == '*': > sep = fmt[1:] or ' ' > return sep.join(format(e) for e in self) > else: > raise TypeError() > > head = 99 > data = myList(range(10)) > s = '{}, {:*, }'.format(head, data) > # or > s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') > print(s) > print(s2) > # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Formatting positional argument #2 with *{sep} as the format specifier makes no sense to me. Even knowing what you're trying to do, I can't understand what *(', ') is going to pass to data.__format__, or why it should do what you want. What is the * supposed to mean there? Is it akin to *args in a function call expression, so you get ',' and ' ' as separate positional arguments? If so, how does the fmt[1] do anything useful? It seems like you would be using [' '] as the separator, and in not sure what that would do that you'd want. From wolfgang.maier at biologie.uni-freiburg.de Tue Sep 8 14:55:43 2015 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 8 Sep 2015 14:55:43 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> Message-ID: <55EEDACF.7000204@biologie.uni-freiburg.de> On 08.09.2015 14:24, Andrew Barnert via Python-ideas wrote: > > Formatting positional argument #2 with *{sep} as the format specifier makes no sense to me. Even knowing what you're trying to do, I can't understand what *(', ') is going to pass to data.__format__, or why it should do what you want. What is the * supposed to mean there? Is it akin to *args in a function call expression, so you get ',' and ' ' as separate positional arguments? If so, how does the fmt[1] do anything useful? It seems like you would be using [' '] as the separator, and in not sure what that would do that you'd want. > Not sure what happened to the indentation in the posted code. Here's another attempt copy pasting from working code as I thought I had done before (sorry for the inconvenience): class myList(list): def __format__ (self, fmt=''): if fmt == '': return str(self) if fmt[0] == '*': sep = fmt[1:] or ' ' return sep.join(format(e) for e in self) else: raise TypeError() head = 99 data = myList(range(10)) s = '{}, {:*, }'.format(head, data) # or s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') print(s) print(s2) # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Does that make things clearer? From oscar.j.benjamin at gmail.com Tue Sep 8 15:41:39 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 8 Sep 2015 14:41:39 +0100 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> Message-ID: On 8 September 2015 at 13:24, Andrew Barnert via Python-ideas wrote: > Wolfgang wrote: >> A quick and dirty illustration in Python: >> >> class myList(list): >> def __format__ (self, fmt=''): >> if fmt == '': >> return str(self) >> if fmt[0] == '*': >> sep = fmt[1:] or ' ' >> return sep.join(format(e) for e in self) >> else: >> raise TypeError() >> >> head = 99 >> data = myList(range(10)) >> s = '{}, {:*, }'.format(head, data) >> # or >> s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') >> print(s) >> print(s2) >> # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 > > Formatting positional argument #2 with *{sep} as the format specifier makes no sense to me. Even knowing what you're trying to do, I can't understand what *(', ') is going to pass to data.__format__, or why it should do what you want. What is the * supposed to mean there? Is it akin to *args in a function call expression, so you get ',' and ' ' as separate positional arguments? If so, how does the fmt[1] do anything useful? It seems like you would be using [' '] as the separator, and in not sure what that would do that you'd want. The *{sep} surprised me until I tried >>> '{x:.{n}f}'.format(x=1.234567, n=2) '1.23' So format uses a two-level pass over the string for nested curly brackets (I tried a third level of nesting but it didn't work). So following it through: '{}{sep}{:*{sep}}'.format(head, data, sep=', ') '{}, {:*, }'.format(head, data) '{}, {}'.format(head, format(data, '*, ')) '{}, {}'.format(head, ', '.join(format(e) for e in data)) '99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9' Unfortunately there's no way to also give a format string to the inner format call format(e) if I wanted to e.g. format those numbers in hex. -- Oscar From wolfgang.maier at biologie.uni-freiburg.de Tue Sep 8 16:20:43 2015 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 8 Sep 2015 16:20:43 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> Message-ID: <55EEEEBB.4080203@biologie.uni-freiburg.de> On 08.09.2015 15:41, Oscar Benjamin wrote: > > The *{sep} surprised me until I tried > > >>> '{x:.{n}f}'.format(x=1.234567, n=2) > '1.23' > > So format uses a two-level pass over the string for nested curly > brackets (I tried a third level of nesting but it didn't work). > Yes, this is documented behavior (https://docs.python.org/3/library/string.html#format-string-syntax): "A format_spec field can also include nested replacement fields within it. These nested replacement fields can contain only a field name; conversion flags and format specifications are not allowed. The replacement fields within the format_spec are substituted before the format_spec string is interpreted. This allows the formatting of a value to be dynamically specified." > Unfortunately there's no way to also give a format string to the inner > format call format(e) if I wanted to e.g. format those numbers in hex. Right, that would require a much more complex format_spec definition. But the proposed simple version saves me from mistakenly writing: '{}\t{}'.format(head, '\t'.join(data)) when some of the elements in data aren't strings and I should have written: '{}\t{}'.format(head, '\t'.join(str(e) for e in data)) , a mistake that I seem to never learn to avoid :) From rymg19 at gmail.com Tue Sep 8 16:37:36 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 08 Sep 2015 09:37:36 -0500 Subject: [Python-ideas] NuGet/Chocolatey feed for releases In-Reply-To: <55EEABFD.8040600@sdamon.com> References: <55EEABFD.8040600@sdamon.com> Message-ID: <89BE5DA9-CB1B-481E-9601-96F94057BFA0@gmail.com> Beware: when the Chocolatey devs said "backlog", they *meant* backlog. I submitted an updated PyPy package to them months ago, and it still hasn't been updated yet. On September 8, 2015 4:35:57 AM CDT, Alexander Walters wrote: >It would be incredibly convenient, especially for users of AppVayor's >continuous integration service, if there were a(n official) repository >for chocolatey containing recent releases of python. The official >Chocolatey gallery contains installers for the latest 2.7 and 3.4 (as >of >this post). What I am proposing would contain the most commonly used >pythons in testing (2.6 2.7 3.3 3.4 and future releases). > >I am perfectly willing to set up a repo for my own use, but am posting >this to see if there is community support...or psf support... for >setting up an official repo. >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Sep 8 18:27:27 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 09 Sep 2015 01:27:27 +0900 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <55EEEEBB.4080203@biologie.uni-freiburg.de> References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> <55EEEEBB.4080203@biologie.uni-freiburg.de> Message-ID: <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> Wolfgang Maier writes: > But the proposed simple version saves me from mistakenly writing: > > '{}\t{}'.format(head, '\t'.join(data)) > > when some of the elements in data aren't strings and I should have written: > > '{}\t{}'.format(head, '\t'.join(str(e) for e in data)) > > , a mistake that I seem to never learn to avoid :) (Note: I don't suffer from that particular mistake, so I may be biased.) I think it's a nice trick but doesn't clear the bar for adding to the standard iterables yet. A technical comment: you don't actually need the '*' for myList (although I guess you find it useful to get an error rather than line noise as a separator if it isn't present?) On the basic idea: if this can be generalized a bit so that head = 99 data = range(10) # optimism! s = '{:.1f}, {:.1f*, }'.format(head, data) produces s == '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0' then I'd be a lot more attracted to it. I would think the simple version is likely to produce rather ugly output if you have a bunch of floats in data. (BTW, that string was actually generated with '{:.2f}, {}'.format(99, ', '.join('{:.2f}'.format(x) for x in range(10))) which doesn't win any beauty contests.) Bikeshedding in advance, now you pretty much need the '*' (and have to hope that the types in the iterable don't use it themselves!), because '{:.1f, }' really does look like line noise! I might actually prefer '|' (or '/') which is "heavier" and "looks like a separator" to me: s = '{:.1f}, {:.1f|, }'.format(head, data) Finally, another alternative syntax would be the same in the replacement field, but instead of iterables implementing it, the .format method would (using your syntax and example for easier comparison): s = '{}, {:*, }'.format(head, *data) I'm afraid this won't work unless restricted to be the last replacement field, where it just consumes all remaining positional arguments. I think that restriction deserves a loud "ugh", but maybe it will give somebody a better idea. Steve From srkunze at mail.de Tue Sep 8 19:49:39 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 08 Sep 2015 19:49:39 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: Message-ID: <55EF1FB3.5000407@mail.de> On 08.09.2015 12:00, Wolfgang Maier wrote: > > head = 99 > data = myList(range(10)) > s = '{}, {:*, }'.format(head, data) > # or > s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') > print(s) > print(s2) > # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 > > Thoughts? I like it and I agree this is an oft-used pattern. From my experience I can tell patterns are workarounds if a language cannot handle it properly. I cannot tell what a concrete syntax would exactly look like but I would love to see an easy-to-read solution. Best, Sven From rosuav at gmail.com Tue Sep 8 19:58:49 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 9 Sep 2015 03:58:49 +1000 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <55EF1FB3.5000407@mail.de> References: <55EF1FB3.5000407@mail.de> Message-ID: On Wed, Sep 9, 2015 at 3:49 AM, Sven R. Kunze wrote: > On 08.09.2015 12:00, Wolfgang Maier wrote: >> >> >> head = 99 >> data = myList(range(10)) >> s = '{}, {:*, }'.format(head, data) >> # or >> s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') >> print(s) >> print(s2) >> # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 >> >> Thoughts? > > > I like it and I agree this is an oft-used pattern. From my experience I can > tell patterns are workarounds if a language cannot handle it properly. > > I cannot tell what a concrete syntax would exactly look like but I would > love to see an easy-to-read solution. It looks tempting, but there's a reason Python has join() as a *string* method, not a method on any sort of iterable. For the same reason, I think it'd be better to handle this as a special case inside str.format(), rather than as a format string of the iterables; it would be extremely surprising for code to be able to join a list, a tuple, a ListIterator, or a generator, but not a custom class with __iter__ and __next__ methods. (Even more surprising if it works with some standard library types and not others.) Plus, it'd mean a lot of code duplication across all those types, which is unnecessary. It'd be rather cool if it could be done as a special format string, though, which says "here's a separator, here's a format string, now iterate over the argument and format them with that string, then join them with that sep, and stick it in here". It might get a bit verbose, though. ChrisA From srkunze at mail.de Tue Sep 8 20:03:47 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 08 Sep 2015 20:03:47 +0200 Subject: [Python-ideas] Wheels For ... In-Reply-To: References: <55EC78E9.1050300@mail.de> <201509061954.t86Jspjg011546@fido.openend.se> <55EDB9B4.7020909@mail.de> Message-ID: <55EF2303.8080901@mail.de> On 08.09.2015 01:45, Nick Coghlan wrote: > On 8 September 2015 at 02:22, Sven R. Kunze wrote: >> On 07.09.2015 02:26, Nick Coghlan wrote: >>> For the build farm idea, it's not just writing the code initially, >>> it's operating the resulting infrastructure, and that's a much bigger >>> ongoing commitment. Automatically building wheels for source uploads >>> is definitely on the wish list, there are just a large number of other >>> improvements needed before it's feasible. >> >> Could you be more specific on these improvements, Nick? > - PyPI: migrating from the legacy Zope codebase to Warehouse > - PyPI: end-to-end content signing (PEPs 458 & 480) > - PyPI: automated analytics & dashboards > - Tooling: integration with operating systems & other platforms > - Python Software Foundation financial sustainability > - Python Software Foundation project management capacity > - Infrastructure improvements for the CPython workflow Very appreciated. Let's see how they make progress on these. > Those aren't dependencies of automatic wheel-building per se, but > rather are issues that are higher priorities for folks like Donald (in > terms of actually getting things done), myself (in terms of > collaborating more effectively with other open source ecosystems), and > the PSF staff and Board (in terms of ensuring the python.org > infrastructure is being appropriately maintained). > > Running an automated build service is expensive, not primarily in > setting it up, but in terms of the ongoing sustaining engineering > costs (including security monitoring and response), so before we > commit to doing it, we need to know how we're going to fund it. > However, most of the PSF's focus at the moment is on getting the > things we *already* do [1] on a more sustainable footing, so adding > *new* services isn't currently a priority. > > Cheers, > Nick. > > [1] https://wiki.python.org/moin/PythonSoftwareFoundation/Proposals/StrategicPriorities > From oscar.j.benjamin at gmail.com Tue Sep 8 20:15:24 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 8 Sep 2015 19:15:24 +0100 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> <55EEEEBB.4080203@biologie.uni-freiburg.de> <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8 September 2015 at 17:27, Stephen J. Turnbull wrote: > > A technical comment: you don't actually need the '*' for myList > (although I guess you find it useful to get an error rather than line > noise as a separator if it isn't present?) I think Wolfgang wants it to work with any iterable rather than his own custom type (at least that's what I'd want). For that to work it would be better if it was handled by the format method itself rather than every iterable's __format__ method. Then it could work with generators, lists, tuples etc. > On the basic idea: if this can be generalized a bit so that > > head = 99 > data = range(10) # optimism! > s = '{:.1f}, {:.1f*, }'.format(head, data) > > produces > > s == '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0' > > then I'd be a lot more attracted to it. ATM the colon separates the part of the format element that is interpreted by the format method to find the formatted object from the part that is passed to the __format__ method of the formatted object. Perhaps an additional colon could be used to separate the separator for when the formatted object is an iterable so that 'foo {name::} bar'.format(name=) could become 'foo {_name} bar'.format(_name = ''.join(format(o, '') for o in )) The example would then be >>> '{:.1f}, {:.1f:, }'.format(99, range(10)) '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0' -- Oscar From srkunze at mail.de Tue Sep 8 20:21:53 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 08 Sep 2015 20:21:53 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: <55EF1FB3.5000407@mail.de> Message-ID: <55EF2741.8070507@mail.de> On 08.09.2015 19:58, Chris Angelico wrote: > It'd be rather cool if it could be done as a special format string, > though, which says "here's a separator, here's a format string, now > iterate over the argument and format them with that string, then join > them with that sep, and stick it in here". It might get a bit verbose, > though. Most of the time, the "format string" of yours I use is "str". So, defaulting to "str" would suffice at least from my point of view: output = f'Have a look at this comma separated list: {fruits#, }.' Substitute # by any character that you see fit. I mean, seriously, you don't use a full-featured template engine, throw an iterable into it and hope that is just works and provides some readable output. Job done and you can move on. What do you expect? From my point of view, the str + join suffices for once again 80% of the use-cases. Best, Sven From srkunze at mail.de Tue Sep 8 20:39:34 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 08 Sep 2015 20:39:34 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <55EF2B66.4020509@mail.de> On 08.09.2015 06:27, Random832 wrote: > By "Identical" I meant there would be a single mechanism, including all > new features, used for both str.format and str.__mod__ - i.e. the > extensions would also be available using the % operator. The extensions > would be implemented, but the % operator would not be left > behind. Hence, identical. Is it an issue when I think the % should be left behind? Just my personal preference. It only increases the learning curve with no actual benefits. Performance? Make {} faster. Best, Sven From oscar.j.benjamin at gmail.com Tue Sep 8 20:42:22 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 8 Sep 2015 19:42:22 +0100 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> <55EEEEBB.4080203@biologie.uni-freiburg.de> <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8 September 2015 at 19:15, Oscar Benjamin wrote: > ATM the colon separates the part of the format element that is > interpreted by the format method to find the formatted object from the > part that is passed to the __format__ method of the formatted object. > Perhaps an additional colon could be used to separate the separator > for when the formatted object is an iterable so that > > 'foo {name::} bar'.format(name=) > > could become > > 'foo {_name} bar'.format(_name = ''.join(format(o, '') > for o in )) > > The example would then be > > >>> '{:.1f}, {:.1f:, }'.format(99, range(10)) > '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0' Except that obviously that wouldn't work because colon can be part of the string e.g. for datetime: >>> '{:%H:%M}'.format(datetime.datetime.now()) '19:39' So you'd need something before the colon to disambiguate. In which case perhaps 'foo {*name::} bar'.format(name=) meaning that if the * is there then everything after the second colon is the format string. Then it would be: >>> '{:.1f}, {*:, :.1f}'.format(99, range(10)) '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0' -- Oscar From random832 at fastmail.us Tue Sep 8 21:38:04 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 08 Sep 2015 15:38:04 -0400 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> <55EEEEBB.4080203@biologie.uni-freiburg.de> <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1441741084.1614682.378110905.58940FE2@webmail.messagingengine.com> On Tue, Sep 8, 2015, at 12:27, Stephen J. Turnbull wrote: > I'm afraid this won't work unless restricted to be the last > replacement field, where it just consumes all remaining positional > arguments. I think that restriction deserves a loud "ugh", but maybe > it will give somebody a better idea. So, this is the second time in as many weeks that I've suggested a new !converter, but this seems like the place for it - have something like "!join" which "converts" [wraps] the argument in a class whose __format__ method knows how to join [and call __format__ on the individual members]. So you could make a list of floating point numbers by "List: {0:, |.2f!join}".format([1.2, 3.4, 5.6]) and it will simply call Joiner([1.2, 3.4, 5.6]).__format__(", |.2f") From random832 at fastmail.us Tue Sep 8 21:39:55 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 08 Sep 2015 15:39:55 -0400 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55EF2B66.4020509@mail.de> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> Message-ID: <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> On Tue, Sep 8, 2015, at 14:39, Sven R. Kunze wrote: > Is it an issue when I think the % should be left behind? Just my > personal preference. > > It only increases the learning curve with no actual benefits. My take is: Having two format string grammars is worse than having one, even if the %-grammar is worse than the {}-grammar. From tritium-list at sdamon.com Tue Sep 8 22:51:37 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Tue, 08 Sep 2015 16:51:37 -0400 Subject: [Python-ideas] NuGet/Chocolatey feed for releases In-Reply-To: <89BE5DA9-CB1B-481E-9601-96F94057BFA0@gmail.com> References: <55EEABFD.8040600@sdamon.com> <89BE5DA9-CB1B-481E-9601-96F94057BFA0@gmail.com> Message-ID: <55EF4A59.6080605@sdamon.com> This would be to bypass the chocolatey gallery - users of this would use the sources parameter to choco install. On 9/8/2015 10:37, Ryan Gonzalez wrote: > Beware: when the Chocolatey devs said "backlog", they *meant* backlog. > I submitted an updated PyPy package to them months ago, and it still > hasn't been updated yet. > > On September 8, 2015 4:35:57 AM CDT, Alexander Walters > wrote: > > It would be incredibly convenient, especially for users of AppVayor's > continuous integration service, if there were a(n official) repository > for chocolatey containing recent releases of python. The official > Chocolatey gallery contains installers for the latest 2.7 and 3.4 (as of > this post). What I am proposing would contain the most commonly used > pythons in testing (2.6 2.7 3.3 3.4 and future releases). > > I am perfectly willing to set up a repo for my own use, but am posting > this to see if there is community support...or psf support... for > setting up an official repo. > ------------------------------------------------------------------------ > > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct:http://python.org/psf/codeofconduct/ > > > -- > Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Wed Sep 9 00:18:03 2015 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 8 Sep 2015 17:18:03 -0500 Subject: [Python-ideas] NuGet/Chocolatey feed for releases In-Reply-To: <55EEABFD.8040600@sdamon.com> References: <55EEABFD.8040600@sdamon.com> Message-ID: * Do you have a chocolatey nuget build script for [buildbot, jenkins]? Written in Python? * https://www.python.org/dev/buildbot/ * https://github.com/conda/conda-recipes/tree/master/python-2.7.8 * https://github.com/conda/conda-recipes/blob/master/python-3.5/meta.yaml * A pkg repo maintainer could scrape/poll these * https://www.python.org/downloads/windows/ * [ ] (schema.org RDFa/JSONLD for releases would be great) * https://en.wikipedia.org/wiki/NuGet * http://docs.continuum.io/anaconda/install#windows-install (2.7, 3.4) * http://docs.continuum.io/anaconda/pkg-docs On Sep 8, 2015 4:37 AM, "Alexander Walters" wrote: > It would be incredibly convenient, especially for users of AppVayor's > continuous integration service, if there were a(n official) repository for > chocolatey containing recent releases of python. The official Chocolatey > gallery contains installers for the latest 2.7 and 3.4 (as of this post). > What I am proposing would contain the most commonly used pythons in testing > (2.6 2.7 3.3 3.4 and future releases). > > I am perfectly willing to set up a repo for my own use, but am posting > this to see if there is community support...or psf support... for setting > up an official repo. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Sep 9 02:03:22 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 8 Sep 2015 17:03:22 -0700 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <1441741084.1614682.378110905.58940FE2@webmail.messagingengine.com> References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> <55EEEEBB.4080203@biologie.uni-freiburg.de> <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> <1441741084.1614682.378110905.58940FE2@webmail.messagingengine.com> Message-ID: <94C3256D-0380-497F-82CF-98B17C904222@yahoo.com> On Sep 8, 2015, at 12:38, random832 at fastmail.us wrote: > >> On Tue, Sep 8, 2015, at 12:27, Stephen J. Turnbull wrote: >> I'm afraid this won't work unless restricted to be the last >> replacement field, where it just consumes all remaining positional >> arguments. I think that restriction deserves a loud "ugh", but maybe >> it will give somebody a better idea. > > So, this is the second time in as many weeks that I've suggested a new > !converter, but this seems like the place for it - have something like > "!join" which "converts" [wraps] the argument in a class whose > __format__ method knows how to join [and call __format__ on the > individual members]. > > So you could make a list of floating point numbers by "List: {0:, > |.2f!join}".format([1.2, 3.4, 5.6]) > > and it will simply call Joiner([1.2, 3.4, 5.6]).__format__(", |.2f") I like this version. Even without the flexibility, just adding another hardcoded 'j' converter for iterables would be nice, but being able to program it would of course be better. From abarnert at yahoo.com Wed Sep 9 02:09:41 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 8 Sep 2015 17:09:41 -0700 Subject: [Python-ideas] One way to do format and print In-Reply-To: <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> Message-ID: <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> On Sep 8, 2015, at 12:39, random832 at fastmail.us wrote: > >> On Tue, Sep 8, 2015, at 14:39, Sven R. Kunze wrote: >> Is it an issue when I think the % should be left behind? Just my >> personal preference. >> >> It only increases the learning curve with no actual benefits. > > My take is: Having two format string grammars is worse than having one, > even if the %-grammar is worse than the {}-grammar. I think it's already been established why % formatting is not going away any time soon. As for de-emphasizing it, I think that's already done pretty well in the current docs. The tutorial has a nice long introduction to str.format, a one-paragraph section on "old string formatting" with a single %5.3f example, and a one-sentence mention of Template. The stdtypes chapter in the library reference explains the difference between the two in a way that makes format sound more attractive for novices, and then has details on each one as appropriate. What else should be done? From stephen at xemacs.org Wed Sep 9 04:37:20 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 09 Sep 2015 11:37:20 +0900 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> <55EEEEBB.4080203@biologie.uni-freiburg.de> <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87d1xs1fof.fsf@uwakimon.sk.tsukuba.ac.jp> Oscar Benjamin writes: > ATM the colon separates the part of the format element that is > interpreted by the format method to find the formatted object from the > part that is passed to the __format__ method of the formatted object. > Perhaps an additional colon could be used to separate the separator > for when the formatted object is an iterable so that > > 'foo {name::} bar'.format(name=) I thought about a colon, but that loses if the objects are times. I guess that kills '/' and '-', too, since the objects might be dates. Of course there may be a tricky way to use these that I haven't thought of, or they could be escaped for use in . From stephen at xemacs.org Wed Sep 9 05:05:44 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 09 Sep 2015 12:05:44 +0900 Subject: [Python-ideas] One way to do format and print In-Reply-To: <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> Message-ID: <87bndc1ed3.fsf@uwakimon.sk.tsukuba.ac.jp> random832 at fastmail.us writes: > My take is: Having two format string grammars is worse than having > one, even if the %-grammar is worse than the {}-grammar. The same has been said for having two (or more) loop grammars. Where have you gone, Repeat Until A nation turns its lonely eyes to you What's that you say, Mrs Robinson Pascal itself has turned to Modula 2 (or 3) (how 'bout 4?!) The point is that not all experiments can be contained in a single personal branch posted to GitHub. (Kudos to Trent who seems to be pulling off that trick as we controverse. Even if it's possible it's not necessarily easy!) We all agree in isolation with the value you express (more or less TOOWTDI), but it's the balance of many desiderata that makes Python a great language. But that balance clearly is against you: At the time {}-formatting was introduced, there had already been several less-than-wildly-successful experiments (including string.Template), yet the PEP was nevertheless accepted. In string formatting, the consensus is evidently that it's difficult enough to measure improvement objectively that sufficiently plausible experiments will still be admitted into the stdlib (or not, in the case of the much-delayed PEP 461 -- way to go, Ethan! -- the decision *not* to have backward compatible formatting for bytestrings was itself an experiment in this sense). And it's a difficult enough design space that the principle of minimal sufficient change (implicit in what you're saying) was not strictly applied to {}-formatting (or string.Template, for that matter). I'm not just being ornery, string formatting is near and dear to my heart. I'm genuinely curious why you choose a much more conservative balance in this area than Python has. But to my eyes your posts so far amount to an attempted wake-up call: "TOOWTDI is more important in string formatting than you all seem to think!" and no more. Sincere regards, From tritium-list at sdamon.com Wed Sep 9 07:36:23 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Wed, 09 Sep 2015 01:36:23 -0400 Subject: [Python-ideas] NuGet/Chocolatey feed for releases In-Reply-To: References: <55EEABFD.8040600@sdamon.com> Message-ID: <55EFC557.9090705@sdamon.com> I do not see how a build script (to build python?) would be needed. The existing installers would be sufficient. The packages themselves would have to be XML and powershell (that is the NuGet/Chocolatey infrastructure.) As it stands, hosting your own nuget/chocolatey feed required a windows server (not ideal, but workable). I am finding it hard to actually find the api specification. On 9/8/2015 18:18, Wes Turner wrote: > > * Do you have a chocolatey nuget build script for [buildbot, > jenkins]? Written in Python? > * https://www.python.org/dev/buildbot/ > * https://github.com/conda/conda-recipes/tree/master/python-2.7.8 > * > https://github.com/conda/conda-recipes/blob/master/python-3.5/meta.yaml > > * A pkg repo maintainer could > scrape/poll these > * https://www.python.org/downloads/windows/ > * [ ] (schema.org RDFa/JSONLD for releases > would be great) > * https://en.wikipedia.org/wiki/NuGet > * http://docs.continuum.io/anaconda/install#windows-install (2.7, 3.4) > * http://docs.continuum.io/anaconda/pkg-docs > > On Sep 8, 2015 4:37 AM, "Alexander Walters" > wrote: > > It would be incredibly convenient, especially for users of > AppVayor's continuous integration service, if there were a(n > official) repository for chocolatey containing recent releases of > python. The official Chocolatey gallery contains installers for > the latest 2.7 and 3.4 (as of this post). What I am proposing > would contain the most commonly used pythons in testing (2.6 2.7 > 3.3 3.4 and future releases). > > I am perfectly willing to set up a repo for my own use, but am > posting this to see if there is community support...or psf > support... for setting up an official repo. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Wed Sep 9 09:33:57 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 09 Sep 2015 17:33:57 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <85h9n482sa.fsf@benfinney.id.au> Paul Moore writes: > On 5 September 2015 at 09:30, Nick Coghlan wrote: > > Unfortunately, I've yet to convince the rest of PyPA (let alone the > > community at large) that telling people to call "pip" directly is *bad > > advice* (as it breaks in too many cases that beginners are going to > > encounter), so it would be helpful if folks helping beginners on > > python-list and python-tutor could provide feedback supporting that > > perspective by filing an issue against > > https://github.com/pypa/python-packaging-user-guide > > I would love to see "python -m pip" (or where the launcher is > appropriate, the shorter "py -m pip") be the canonical invocation used > in all documentation, discussion and advice on running pip. Contrariwise, I would like to see ?pip? become the canonical invocation used in all documentation, discussion, and advice; and if there are any technical barriers to that least-surprising method, to see those barriers addressed and removed. > The main problems seem to be (1) "but just typing "pip" is shorter and > easier to remember", With the concomitant benefit that it's easier to teach and learn. This is not insignificant. > (2) "I don't understand why pip can't just be a normal command" This is my main objection, but rather stated as: We already have a firmly-established naming convention for user-level commands, that works in a huge number of languages; Python has no good reason to be an exception, especially not in one of the first commands that new users will need to encounter. If something is preventing ?pip? from being the command to type to run Pip, then surely the right place to apply pressure is not on everyone who instructs and documents and interfaces with end-users now and indefinitely; but instead on whatever is preventing the One Obvious Way to work. No? -- \ ?Yesterday I saw a subliminal advertising executive for just a | `\ second.? ?Steven Wright | _o__) | Ben Finney From p.f.moore at gmail.com Wed Sep 9 10:56:39 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Sep 2015 09:56:39 +0100 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <85h9n482sa.fsf@benfinney.id.au> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On 9 September 2015 at 08:33, Ben Finney wrote: > Contrariwise, I would like to see ?pip? become the canonical invocation > used in all documentation, discussion, and advice; and if there are any > technical barriers to that least-surprising method, to see those > barriers addressed and removed. There is at least one fundamental, technical, and (so far) unsolveable issue with using "pip" as the canonical invocation. pip install -U pip fails on Windows, because the exe wrapper cannot be replaced by a process running that wrapper (the "pip" command runs pip.exe which needs to replace pip.exe, but can't because the OS has it open as the current running process). There have been a number of proposals for fixing this, but none have been viable so far. We'd need someone to provide working code (not just suggestions on things that might work, but actual working code) before we could recommend anything other than "python -m pip install -U pip" as the correct way of upgrading pip. And recommending one thing when upgrading pip, but another for "normal use" is also confusing for beginners. (And we have evidence from the pip issue tracker people *do* find this confusing, and not just beginners...) Apart from that issue, which is Windows only (and thus some people find it less compelling) we have also had reported issues of people running pip, and it installs things into the "wrong" Python installation. This is typically because of PATH configuration issues, where "pip" is being found via one PATH element, but "python" is found via a different one. I don't have specifics to hand, so I can't clarify *how* people have managed to construct such breakage, but I can state that it happens, and the relevant people are usually very confused by the results. Again, "python -m pip" avoids any confusion here - that invocation clearly and unambiguously installs to the Python installation you invoked. In actual fact, if it weren't for the backward compatibility issues it would cause, I'd be tempted to argue that pip shouldn't provide any wrapper at all, and *only* offer "python -m pip" as a means of invoking it (precisely because it's so closely tied to the Python interpreter used to invoke it). But that's never going to happen and I don't intend it as a serious proposal. Paul From oscar.j.benjamin at gmail.com Wed Sep 9 13:56:53 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 9 Sep 2015 12:56:53 +0100 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <87d1xs1fof.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> <55EEEEBB.4080203@biologie.uni-freiburg.de> <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> <87d1xs1fof.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9 September 2015 at 03:37, Stephen J. Turnbull wrote: > Oscar Benjamin writes: > > > ATM the colon separates the part of the format element that is > > interpreted by the format method to find the formatted object from the > > part that is passed to the __format__ method of the formatted object. > > Perhaps an additional colon could be used to separate the separator > > for when the formatted object is an iterable so that > > > > 'foo {name::} bar'.format(name=) > > I thought about a colon, but that loses if the objects are times. I > guess that kills '/' and '-', too, since the objects might be dates. > Of course there may be a tricky way to use these that I haven't > thought of, or they could be escaped for use in . You can use the * at the start of the format element (before the first colon). It can then imply that there will be two colons to separate the three parts with any further colons part of fmt e.g.: '{*::}'.format(...) So then you can have: >>> '{*numbers: :.1f}'.format(numbers) '1.0, 2.0, 3.0' >>> '{*times:, :%H:%M}'.format(times) '12:30, 14:50, 22:39' -- Oscar From wolfgang.maier at biologie.uni-freiburg.de Wed Sep 9 15:41:56 2015 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Wed, 9 Sep 2015 15:41:56 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: Message-ID: Thanks for all the feedback! Just to summarize ideas and to clarify what I had in mind when proposing this: 1) Yes, I would like to have this work with any (or at least most) iterables, not just with my own custom type that I used for illustration. So having this handled by the format method rather than each object's __format__ method could make sense. It was just simple to implement it in Python through the __format__ method. Why did I propose * as the first character of the new format spec string? Because I think you really need some token to state unambiguously[1] that what follows is a format specification that involves going through the elements of the iterable instead of working on the container object itself. I thought that * is most intuitive to understand because of its use in unpacking. [1] unfortunately, in my original proposal the leading * can still be ambiguous because *<, *> *= and *^ could mean element joining with <, >, = or ^ as separators or aligning of the container's formatted string representation using * as the fill character. Ideally, the * should be the very first thing inside a replacement field - pretty much as suggested by Oscar - and should not be part of the format spec. This is not feasible through a format spec handled by the __format__ method, but through a modified str.format method, i.e., that's another argument for this approach. Examples: 'foo {*name:} bar'.format(name=) 'foo {*0:} bar {1}'.format(x, y) 'foo {*:} bar'.format(x) 2) As for including an additional format spec to apply to the elements of the iterable: I decided against including this in the original proposal to keep it simple and to get feedback on the general idea first. The problem here is that any solution requires an additional token to indicate the boundary between the part and the element format spec. Since you would not want to have anyone's custom format spec broken by this, this boils down to disallowing one reserved character in the part, like in Oscar's example: 'foo {*name::} bar'.format(name=) where cannot contain a colon. So that character would have to be chosen carefully (both : and | are quite readable, but also relatively common element separators I guess). In addition, the part should be non-optional (though the empty string should be allowed) to guarantee the presence of the delimiter token, which avoids accidental splitting of lonely element format specs into a "" and part: # format the elements of name using , join them using 'foo {*name::} bar'.format(name=) # format the elements of name using , join them using '' 'foo {*name::} bar'.format(name=) # a syntax error 'foo {*name:} bar'.format(name=) On the other hand, these restriction do not look too dramatic given the flexibility gain in most situations. So to sum up how this could work: If str.format encounters a leading * in a replacement field, it splits the format spec (i.e. everything after the first colon) on the first occurrence of the | separator (possibly ':' or '|') and does, essentially: .join(format(e, ) for e in iterable) Without the *, it just works the current way. 3) Finally, the alternative idea of having the new functionality handled by a new !converter, like: "List: {0!j:,}".format([1.2, 3.4, 5.6]) I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either: - a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or - the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. Please correct me, if I misunderstood something about this alternative proposal. Best, Wolfgang From eric at trueblade.com Wed Sep 9 16:02:27 2015 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 9 Sep 2015 10:02:27 -0400 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: Message-ID: <55F03BF3.50106@trueblade.com> At some point, instead of complicating how format works internally, you should just write a function that does what you want. I realize there's a continuum between '{}'.format(iterable) and '{ Thanks for all the feedback! > > Just to summarize ideas and to clarify what I had in mind when proposing > this: > > 1) > Yes, I would like to have this work with any (or at least most) > iterables, not just with my own custom type that I used for illustration. > So having this handled by the format method rather than each object's > __format__ method could make sense. It was just simple to implement it > in Python through the __format__ method. > > Why did I propose * as the first character of the new format spec string? > Because I think you really need some token to state unambiguously[1] > that what follows is a format specification that involves going through > the elements of the iterable instead of working on the container object > itself. I thought that * is most intuitive to understand because of its > use in unpacking. > > [1] unfortunately, in my original proposal the leading * can still be > ambiguous because *<, *> *= and *^ could mean element joining with <, >, > = or ^ as separators or aligning of the container's formatted string > representation using * as the fill character. > > > Ideally, the * should be the very first thing inside a replacement field > - pretty much as suggested by Oscar - and should not be part of the > format spec. This is not feasible through a format spec handled by the > __format__ method, but through a modified str.format method, i.e., > that's another argument for this approach. Examples: > > 'foo {*name:} bar'.format(name=) > 'foo {*0:} bar {1}'.format(x, y) > 'foo {*:} bar'.format(x) > > > 2) > As for including an additional format spec to apply to the elements of > the iterable: > I decided against including this in the original proposal to keep it > simple and to get feedback on the general idea first. > The problem here is that any solution requires an additional token to > indicate the boundary between the part and the element > format spec. Since you would not want to have anyone's custom format > spec broken by this, this boils down to disallowing one reserved > character in the part, like in Oscar's example: > > 'foo {*name::} bar'.format(name=) > > where cannot contain a colon. > > So that character would have to be chosen carefully (both : and | are > quite readable, but also relatively common element separators I guess). > In addition, the part should be non-optional (though the > empty string should be allowed) to guarantee the presence of the > delimiter token, which avoids accidental splitting of lonely element > format specs into a "" and part: > > # format the elements of name using , join them using > 'foo {*name::} bar'.format(name=) > # format the elements of name using , join them using '' > 'foo {*name::} bar'.format(name=) > # a syntax error > 'foo {*name:} bar'.format(name=) > > On the other hand, these restriction do not look too dramatic given the > flexibility gain in most situations. > > So to sum up how this could work: > If str.format encounters a leading * in a replacement field, it splits > the format spec (i.e. everything after the first colon) on the first > occurrence of the | separator (possibly ':' or '|') and does, > essentially: > > .join(format(e, ) for e in iterable) > > Without the *, it just works the current way. > > > 3) > Finally, the alternative idea of having the new functionality handled by > a new !converter, like: > > "List: {0!j:,}".format([1.2, 3.4, 5.6]) > > I considered this idea before posting the original proposal, but, in > addition to requiring a change to str.format (which would need to > recognize the new token), this approach would need either: > > - a new special method (e.g., __join__) to be implemented for every type > that should support it, which is worse than for my original proposal or > > - the str.format method must react directly to the converter flag, which > is then no different to the above solution just that it uses !j instead > of *. Personally, I find the * syntax more readable, plus, the !j syntax > would then suggest that this is a regular converter (calling a special > method of the object) when, in fact, it is not. > Please correct me, if I misunderstood something about this alternative > proposal. > > Best, > Wolfgang > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From wolfgang.maier at biologie.uni-freiburg.de Wed Sep 9 16:32:08 2015 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Wed, 9 Sep 2015 16:32:08 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <55F03BF3.50106@trueblade.com> References: <55F03BF3.50106@trueblade.com> Message-ID: <55F042E8.10509@biologie.uni-freiburg.de> Well, here it is: def unpack_format (iterable, format_spec=None): if format_spec: try: sep, element_fmt = format_spec.split('|', 1) except ValueError: raise TypeError('Invalid format_spec for iterable formatting') return sep.join(format(e, element_fmt) for e in iterable) usage examples: # '0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, 8.00, 9.00' '{}'.format(unpack_format(range(10), ', |.2f')) # '0.001.002.003.004.005.006.007.008.009.00' '{}'.format(unpack_format(range(10), '|.2f')) # invalid syntax '{}'.format(unpack_format(range(10), '.2f')) Best, Wolfgang On 09.09.2015 16:02, Eric V. Smith wrote: > At some point, instead of complicating how format works internally, you > should just write a function that does what you want. I realize there's > a continuum between '{}'.format(iterable) and > '{ to draw the line. But when the solution is to bake knowledge of > iterables into .format(), I think we've passed the point where we should > switch to a function: '{}'.format(some_function(iterable)). > > In any event, If you want to play with this, I suggest you write > some_function(iterable) that does what you want, first. > > Eric. > > On 9/9/2015 9:41 AM, Wolfgang Maier wrote: >> Thanks for all the feedback! >> >> Just to summarize ideas and to clarify what I had in mind when proposing >> this: >> >> 1) >> Yes, I would like to have this work with any (or at least most) >> iterables, not just with my own custom type that I used for illustration. >> So having this handled by the format method rather than each object's >> __format__ method could make sense. It was just simple to implement it >> in Python through the __format__ method. >> >> Why did I propose * as the first character of the new format spec string? >> Because I think you really need some token to state unambiguously[1] >> that what follows is a format specification that involves going through >> the elements of the iterable instead of working on the container object >> itself. I thought that * is most intuitive to understand because of its >> use in unpacking. >> >> [1] unfortunately, in my original proposal the leading * can still be >> ambiguous because *<, *> *= and *^ could mean element joining with <, >, >> = or ^ as separators or aligning of the container's formatted string >> representation using * as the fill character. >> >> >> Ideally, the * should be the very first thing inside a replacement field >> - pretty much as suggested by Oscar - and should not be part of the >> format spec. This is not feasible through a format spec handled by the >> __format__ method, but through a modified str.format method, i.e., >> that's another argument for this approach. Examples: >> >> 'foo {*name:} bar'.format(name=) >> 'foo {*0:} bar {1}'.format(x, y) >> 'foo {*:} bar'.format(x) >> >> >> 2) >> As for including an additional format spec to apply to the elements of >> the iterable: >> I decided against including this in the original proposal to keep it >> simple and to get feedback on the general idea first. >> The problem here is that any solution requires an additional token to >> indicate the boundary between the part and the element >> format spec. Since you would not want to have anyone's custom format >> spec broken by this, this boils down to disallowing one reserved >> character in the part, like in Oscar's example: >> >> 'foo {*name::} bar'.format(name=) >> >> where cannot contain a colon. >> >> So that character would have to be chosen carefully (both : and | are >> quite readable, but also relatively common element separators I guess). >> In addition, the part should be non-optional (though the >> empty string should be allowed) to guarantee the presence of the >> delimiter token, which avoids accidental splitting of lonely element >> format specs into a "" and part: >> >> # format the elements of name using , join them using >> 'foo {*name::} bar'.format(name=) >> # format the elements of name using , join them using '' >> 'foo {*name::} bar'.format(name=) >> # a syntax error >> 'foo {*name:} bar'.format(name=) >> >> On the other hand, these restriction do not look too dramatic given the >> flexibility gain in most situations. >> >> So to sum up how this could work: >> If str.format encounters a leading * in a replacement field, it splits >> the format spec (i.e. everything after the first colon) on the first >> occurrence of the | separator (possibly ':' or '|') and does, >> essentially: >> >> .join(format(e, ) for e in iterable) >> >> Without the *, it just works the current way. >> >> >> 3) >> Finally, the alternative idea of having the new functionality handled by >> a new !converter, like: >> >> "List: {0!j:,}".format([1.2, 3.4, 5.6]) >> >> I considered this idea before posting the original proposal, but, in >> addition to requiring a change to str.format (which would need to >> recognize the new token), this approach would need either: >> >> - a new special method (e.g., __join__) to be implemented for every type >> that should support it, which is worse than for my original proposal or >> >> - the str.format method must react directly to the converter flag, which >> is then no different to the above solution just that it uses !j instead >> of *. Personally, I find the * syntax more readable, plus, the !j syntax >> would then suggest that this is a regular converter (calling a special >> method of the object) when, in fact, it is not. >> Please correct me, if I misunderstood something about this alternative >> proposal. >> >> Best, >> Wolfgang >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From p.f.moore at gmail.com Wed Sep 9 16:41:19 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Sep 2015 15:41:19 +0100 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <55F042E8.10509@biologie.uni-freiburg.de> References: <55F03BF3.50106@trueblade.com> <55F042E8.10509@biologie.uni-freiburg.de> Message-ID: On 9 September 2015 at 15:32, Wolfgang Maier wrote: > Well, here it is: > > def unpack_format (iterable, format_spec=None): > if format_spec: > try: > sep, element_fmt = format_spec.split('|', 1) > except ValueError: > raise TypeError('Invalid format_spec for iterable formatting') > return sep.join(format(e, element_fmt) for e in iterable) > > usage examples: > > # '0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, 8.00, 9.00' > '{}'.format(unpack_format(range(10), ', |.2f')) > > # '0.001.002.003.004.005.006.007.008.009.00' > '{}'.format(unpack_format(range(10), '|.2f')) > > # invalid syntax > '{}'.format(unpack_format(range(10), '.2f')) Honestly, it seems to me that def format_iterable(it, spec, sep=', '): return sep.join(format(e, spec) for e in it) # '0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, 8.00, 9.00' format_iterable(range(10), '.2f') # '0.001.002.003.004.005.006.007.008.009.00' format_iterable(range(10), '.2f', sep='') is perfectly adequate. It reads more clearly to me than the "sep|fmt" syntax does, as well. Paul From wolfgang.maier at biologie.uni-freiburg.de Wed Sep 9 16:41:26 2015 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Wed, 9 Sep 2015 16:41:26 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <55F042E8.10509@biologie.uni-freiburg.de> References: <55F03BF3.50106@trueblade.com> <55F042E8.10509@biologie.uni-freiburg.de> Message-ID: Or with default behavior when there is no format_spec: def unpack_format (iterable, format_spec=None): if format_spec: try: sep, element_fmt = format_spec.split('|', 1) except ValueError: raise TypeError('Invalid format_spec for iterable formatting') return sep.join(format(e, element_fmt) for e in iterable) else: return ' '.join(format(e) for e in iterable) On 09.09.2015 16:32, Wolfgang Maier wrote: > Well, here it is: > > def unpack_format (iterable, format_spec=None): > if format_spec: > try: > sep, element_fmt = format_spec.split('|', 1) > except ValueError: > raise TypeError('Invalid format_spec for iterable formatting') > return sep.join(format(e, element_fmt) for e in iterable) > > usage examples: > > # '0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, 8.00, 9.00' > '{}'.format(unpack_format(range(10), ', |.2f')) > > # '0.001.002.003.004.005.006.007.008.009.00' > '{}'.format(unpack_format(range(10), '|.2f')) > > # invalid syntax > '{}'.format(unpack_format(range(10), '.2f')) > > Best, > Wolfgang > > > On 09.09.2015 16:02, Eric V. Smith wrote: >> At some point, instead of complicating how format works internally, you >> should just write a function that does what you want. I realize there's >> a continuum between '{}'.format(iterable) and >> '{> to draw the line. But when the solution is to bake knowledge of >> iterables into .format(), I think we've passed the point where we should >> switch to a function: '{}'.format(some_function(iterable)). >> >> In any event, If you want to play with this, I suggest you write >> some_function(iterable) that does what you want, first. >> >> Eric. >> >> On 9/9/2015 9:41 AM, Wolfgang Maier wrote: >>> Thanks for all the feedback! >>> >>> Just to summarize ideas and to clarify what I had in mind when proposing >>> this: >>> >>> 1) >>> Yes, I would like to have this work with any (or at least most) >>> iterables, not just with my own custom type that I used for >>> illustration. >>> So having this handled by the format method rather than each object's >>> __format__ method could make sense. It was just simple to implement it >>> in Python through the __format__ method. >>> >>> Why did I propose * as the first character of the new format spec >>> string? >>> Because I think you really need some token to state unambiguously[1] >>> that what follows is a format specification that involves going through >>> the elements of the iterable instead of working on the container object >>> itself. I thought that * is most intuitive to understand because of its >>> use in unpacking. >>> >>> [1] unfortunately, in my original proposal the leading * can still be >>> ambiguous because *<, *> *= and *^ could mean element joining with <, >, >>> = or ^ as separators or aligning of the container's formatted string >>> representation using * as the fill character. >>> >>> >>> Ideally, the * should be the very first thing inside a replacement field >>> - pretty much as suggested by Oscar - and should not be part of the >>> format spec. This is not feasible through a format spec handled by the >>> __format__ method, but through a modified str.format method, i.e., >>> that's another argument for this approach. Examples: >>> >>> 'foo {*name:} bar'.format(name=) >>> 'foo {*0:} bar {1}'.format(x, y) >>> 'foo {*:} bar'.format(x) >>> >>> >>> 2) >>> As for including an additional format spec to apply to the elements of >>> the iterable: >>> I decided against including this in the original proposal to keep it >>> simple and to get feedback on the general idea first. >>> The problem here is that any solution requires an additional token to >>> indicate the boundary between the part and the element >>> format spec. Since you would not want to have anyone's custom format >>> spec broken by this, this boils down to disallowing one reserved >>> character in the part, like in Oscar's example: >>> >>> 'foo {*name::} bar'.format(name=) >>> >>> where cannot contain a colon. >>> >>> So that character would have to be chosen carefully (both : and | are >>> quite readable, but also relatively common element separators I guess). >>> In addition, the part should be non-optional (though the >>> empty string should be allowed) to guarantee the presence of the >>> delimiter token, which avoids accidental splitting of lonely element >>> format specs into a "" and part: >>> >>> # format the elements of name using , join them using >>> 'foo {*name::} bar'.format(name=) >>> # format the elements of name using , join them using '' >>> 'foo {*name::} bar'.format(name=) >>> # a syntax error >>> 'foo {*name:} bar'.format(name=) >>> >>> On the other hand, these restriction do not look too dramatic given the >>> flexibility gain in most situations. >>> >>> So to sum up how this could work: >>> If str.format encounters a leading * in a replacement field, it splits >>> the format spec (i.e. everything after the first colon) on the first >>> occurrence of the | separator (possibly ':' or '|') and does, >>> essentially: >>> >>> .join(format(e, ) for e in iterable) >>> >>> Without the *, it just works the current way. >>> >>> >>> 3) >>> Finally, the alternative idea of having the new functionality handled by >>> a new !converter, like: >>> >>> "List: {0!j:,}".format([1.2, 3.4, 5.6]) >>> >>> I considered this idea before posting the original proposal, but, in >>> addition to requiring a change to str.format (which would need to >>> recognize the new token), this approach would need either: >>> >>> - a new special method (e.g., __join__) to be implemented for every type >>> that should support it, which is worse than for my original proposal or >>> >>> - the str.format method must react directly to the converter flag, which >>> is then no different to the above solution just that it uses !j instead >>> of *. Personally, I find the * syntax more readable, plus, the !j syntax >>> would then suggest that this is a regular converter (calling a special >>> method of the object) when, in fact, it is not. >>> Please correct me, if I misunderstood something about this alternative >>> proposal. >>> >>> Best, >>> Wolfgang >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From p.f.moore at gmail.com Wed Sep 9 16:58:14 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Sep 2015 15:58:14 +0100 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: <55F03BF3.50106@trueblade.com> <55F042E8.10509@biologie.uni-freiburg.de> Message-ID: On 9 September 2015 at 15:41, Wolfgang Maier wrote: > def unpack_format (iterable, format_spec=None): > if format_spec: > try: > sep, element_fmt = format_spec.split('|', 1) > except ValueError: > raise TypeError('Invalid format_spec for iterable formatting') > return sep.join(format(e, element_fmt) for e in iterable) > else: > return ' '.join(format(e) for e in iterable) >From the docs, "The default format_spec is an empty string which usually gives the same effect as calling str(value)" So you can just use format_spec='' and avoid the extra conditional logic. Paul From srkunze at mail.de Wed Sep 9 18:05:10 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 09 Sep 2015 18:05:10 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> Message-ID: <55F058B6.9000202@mail.de> On 09.09.2015 02:09, Andrew Barnert via Python-ideas wrote: > I think it's already been established why % formatting is not going away any time soon. > > As for de-emphasizing it, I think that's already done pretty well in the current docs. The tutorial has a nice long introduction to str.format, a one-paragraph section on "old string formatting" with a single %5.3f example, and a one-sentence mention of Template. The stdtypes chapter in the library reference explains the difference between the two in a way that makes format sound more attractive for novices, and then has details on each one as appropriate. What else should be done? I had difficulties to find what you mean by tutorial. But hey, being a Python user for years and not knowing where the official tutorial resides... Anyway, Google presented me the version 2.7 of the tutorial. Thus, the link to the stdtypes documentation does not exhibit the note of, say, 3.5: "Note: The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly). Using the newer str.format() interface helps avoid these errors, and also provides a generally more powerful, flexible and extensible approach to formatting text." So, adding it to the 2.7 docs would be a start. I still don't understand what's wrong with deprecating %, but okay. I think f-strings will push {} to wide-range adoption. Best, Sven From guido at python.org Wed Sep 9 18:35:12 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Sep 2015 09:35:12 -0700 Subject: [Python-ideas] Should our default random number generator be secure? Message-ID: I've received several long emails from Theo de Raadt (OpenBSD founder) about Python's default random number generator. This is the random module, and it defaults to a Mersenne Twister (MT) seeded by 2500 bytes of entropy taken from os.urandom(). Theo's worry is that while the starting seed is fine, MT is not good when random numbers are used for crypto and other security purposes. I've countered that it's not meant for that (you should use random.SystemRandom() or os.urandom() for that) but he counters that people don't necessarily know that and are using the default random.random() setup for security purposes without realizing how wrong that is. There is already a warning in the docs for the random module that it's not suitable for security, but -- as the meme goes -- nobody reads the docs. Theo then went into technicalities that went straight over my head, concluding with a strongly worded recommendation of the OpenBSD version of arc4random() (which IIUC is based on something called "chacha", not on "RC4" despite that being in the name). He says it is very fast (but I don't know what that means). I've invited Theo to join this list but he's too busy. The two core Python experts on the random module have given me opinions suggesting that there's not much wrong with MT, so here I am. Who is right? What should we do? Is there anything we need to do? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From skrah at bytereef.org Wed Sep 9 18:35:46 2015 From: skrah at bytereef.org (Stefan Krah) Date: Wed, 9 Sep 2015 16:35:46 +0000 (UTC) Subject: [Python-ideas] One way to do format and print References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> Message-ID: Sven R. Kunze writes: > I still don't understand what's wrong with deprecating %, but okay. I > think f-strings will push {} to wide-range adoption. Then it will probably be hard to explain, so I'll be direct: 1) Many Python users are fed up with churn and don't want to do yet another rewrite of their applications (just after migrating to 3.x). 2) Despite many years of officially preferring {}-formatting in the docs (and on Stackoverflow), people *still* use %-formatting. This should be a clue that they actually like it. 3) %-formatting often has better performance and is often easier to read. 4) Yes, in other cases {}-formatting is easier to read. So choose whatever is best. Stefan Krah From donald at stufft.io Wed Sep 9 18:53:33 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 9 Sep 2015 12:53:33 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: On September 9, 2015 at 12:36:16 PM, Guido van Rossum (guido at python.org) wrote: > > I've invited Theo to join this list but he's too busy. The two core Python > experts on the random module have given me opinions suggesting that there's > not much wrong with MT, so here I am. Who is right? What should we do? Is > there anything we need to do? > Everyone is right :) MT is a fine algorithm for random numbers when you don't need them to be? cryptographically safe, it is a disastrous algorithm if you do need them to be safe. As long as you only use MT (and the default ``random``) implementation for things where the fact the numbers you get aren't going to be quite random (e.g. they are actually predictable) and you use os.urandom/random.SystemRandom for everything where you need actual random then everything is fine. The problem boils down to, are people going to accidently use the default random module when they really should use os.urandom or random.SystemRandom. It is my opinion (and I believe Theo's) that they are going to use the MT backed random functions in random.py when they shouldn't be. However I don't have a great solution to what we should do about it. One option is to add a new, random.FastInsecureRandom class, and switch the "bare" random functions in that module over to using random.SystemRandom by default. Then if people want to opt into a faster random that isn't crpytographically secure by default they can use that class. This would essentially be inverting the relationship today, where it defaults to insecure and you have to opt in to secure. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From ron3200 at gmail.com Wed Sep 9 18:59:38 2015 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 9 Sep 2015 11:59:38 -0500 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <87d1xs1fof.fsf@uwakimon.sk.tsukuba.ac.jp> References: <8635FF8B-2C17-4016-BDC0-BF5D775C9F0C@yahoo.com> <55EEEEBB.4080203@biologie.uni-freiburg.de> <87lhcg27ww.fsf@uwakimon.sk.tsukuba.ac.jp> <87d1xs1fof.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 09/08/2015 09:37 PM, Stephen J. Turnbull wrote: > Oscar Benjamin writes: > > > ATM the colon separates the part of the format element that is > > interpreted by the format method to find the formatted object from the > > part that is passed to the __format__ method of the formatted object. > > Perhaps an additional colon could be used to separate the separator > > for when the formatted object is an iterable so that > > > > 'foo {name::} bar'.format(name=) > > I thought about a colon, but that loses if the objects are times. I > guess that kills '/' and '-', too, since the objects might be dates. > Of course there may be a tricky way to use these that I haven't > thought of, or they could be escaped for use in . This seems to me to need a nested format spec. An outer one to format the whole list, and an inner one to format each item. f"foo {', '.join(f'{x:inner_spec}' for x in iter):outer_spec}" Actually this is how I'd rather write it. "foo " + l.fmap(inner_spec).join(', ').fstr(outer_spec) But sequences don't have the methods to write it that way. >>> l = range(10) >>> "foo" + format(','.join(map(lambda x: format(x, '>5'), l)), '>50') 'foo 0, 1, 2, 3, 4, 5, 6, 7, 8, 9' It took me a few times to get that right. Cheers, Ron From guido at python.org Wed Sep 9 19:02:28 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Sep 2015 10:02:28 -0700 Subject: [Python-ideas] Should our default random number generator be secure? Message-ID: I'm just going to forward further missives by Theo. ---------- Forwarded message ---------- From: Theo de Raadt Date: Wed, Sep 9, 2015 at 9:59 AM Subject: Re: getentropy, getrandom, arc4random() To: guido at python.org > Thanks. And one last thing: unless Go and Swift, Python has no significant > corporate resources behind it -- it's mostly volunteers begging their > employers to let them work on Python for a few hours per week. i understand because I find myself in the same situation. however i think you overstate the difficulty involved. high-availibility random is kind of a new issue. so final advice from me; feel free to forward as you like. i think arc4random would be a better API to call on the back side than getentropy/getrandom. arc4random can seed initialize with a single getentropy/getrandom call at startup. that is done automatically. you can then use arc4random's results to initialize the MT. in a system call trace, this will show up as one getentropy/getrandom at process startup, which gets both subsystems going. really cheap. in the case of longer run times, the userland arc4random PRNG folding reduces the system calls required. this helps older kernels with slower entropy creation, taking pressure off their subsystem. driving everyone towards this one API which is so high performance is the right goal. chacha arc4random is really fast. if you were to create such an API in python, maybe this is how it will go: say it becomes arc4random in the back end. i am unsure what advice to give you regarding a python API name. in swift, they chose to use the same prefix "arc4random" (id = arc4random(), id = arc4random_uniform(1..n)"; it is a little bit different than the C API. google has tended to choose other prefixes. we admit the name is a bit strange, but we can't touch the previous attempts like drand48.... I do suggest you have the _uniform and _buf versions. Maybe apple chose to stick to arc4random as a name simply because search engines tend to give above average advice for this search string? so arc4random is natively available in freebsd, macos, solaris, and other systems like andriod libc (bionic). some systems lack it: win32, glibc, hpux, aix, so we wrote replacements for libressl: https://github.com/libressl-portable/openbsd/tree/master/src/lib/libcrypto/crypto https://github.com/libressl-portable/portable/tree/master/crypto/compat the first is the base openbsd tree where we maintain/develop this code for other systems, the 2nd part is scaffold in libressl that makes this available to others. it contains arc4random for those systems, and supplies getentropy() stubs for challenged systems. we'll admit we haven't got solutions for every system known to man. we are trying to handle fork issues, and systems with very bad entropy feeding. that's free code. the heavy lifting is done, and we'll keep maintaining that until the end of days. i hope it helps. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 9 19:10:38 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 12:10:38 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: {Guido] > ... > The two core Python experts on the random module > have given me opinions suggesting that there's not > much wrong with MT, so here I am. There is nothing _right_ about MT in a crypto context - it's entirely unsuitable for any such purpose, and always was. Just to be clear about that ;-) But it's an excellent generator for almost all other purposes. So the real question is: whose use cases do you want to cater to by default? If you answer "crytpo", then realize the Python generator will have to change every time the crypto community changes its mind about what's _currently_ "good enough". There's a long history of that already. Indeed, there are already numerous "chacha" variants. For a brief overview, scroll down to the ChaCha20 section of this exceptionally readable page listing pros and cons of various generators: http://www.pcg-random.org/other-rngs.html There are no answers to vital pragmatic questions (like "is it possible to supply a seed to get reproducible results?") without specifying whose implementation of which chacha variant you're asking about. I've always thought Python should be a follower rather than a leader in this specific area. For example, I didn't push for the Twister before it was well on its way to becoming a de facto standard. Anyway, it's all moot until someone supplies a patch - and that sure ain't gonna be me ;-) On Wed, Sep 9, 2015 at 11:35 AM, Guido van Rossum wrote: > I've received several long emails from Theo de Raadt (OpenBSD founder) about > Python's default random number generator. This is the random module, and it > defaults to a Mersenne Twister (MT) seeded by 2500 bytes of entropy taken > from os.urandom(). > > Theo's worry is that while the starting seed is fine, MT is not good when > random numbers are used for crypto and other security purposes. I've > countered that it's not meant for that (you should use random.SystemRandom() > or os.urandom() for that) but he counters that people don't necessarily know > that and are using the default random.random() setup for security purposes > without realizing how wrong that is. > > There is already a warning in the docs for the random module that it's not > suitable for security, but -- as the meme goes -- nobody reads the docs. > > Theo then went into technicalities that went straight over my head, > concluding with a strongly worded recommendation of the OpenBSD version of > arc4random() (which IIUC is based on something called "chacha", not on "RC4" > despite that being in the name). He says it is very fast (but I don't know > what that means). > > I've invited Theo to join this list but he's too busy. The two core Python > experts on the random module have given me opinions suggesting that there's > not much wrong with MT, so here I am. Who is right? What should we do? Is > there anything we need to do? > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From Stephan.Sahm at gmx.de Wed Sep 9 19:10:18 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Wed, 9 Sep 2015 19:10:18 +0200 Subject: [Python-ideas] BUG in standard while statement Message-ID: Dear all I found a BUG in the standard while statement, which appears both in python 2.7 and python 3.4 on my system. It usually won't appear because I only stumbled upon it after trying to implement a nice repeat structure. Look: ?```? class repeat(object): def __init__(self, n): self.n = n def __bool__(self): self.n -= 1 return self.n >= 0 __nonzero__=__bool__ a = repeat(2) ``` the meaning of the above is that bool(a) returns True 2-times, and after that always False. Now executing ``` while a: print('foo') ``` will in fact print 'foo' two times. HOWEVER ;-) .... ``` while repeat(2): print('foo') ``` will go on and go on, printing 'foo' until I kill it. Please comment, explain or recommend this further if you also think that both while statements should behave identically. hoping for responses, best, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Wed Sep 9 19:15:04 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 9 Sep 2015 13:15:04 -0400 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: References: Message-ID: This appears as intended. The body of the while condition is executed each time the condition is checked. In the first case, you are creating a single instance of repeat, and then calling bool on the expression with each iteration of the loop. With the second case, you are constructing a _new_ repeat instance each time. Think about the difference between: while should_stop(): ... and: a = should_stop() while a: ... One would expect should_stop to be called each time in the first case; but, in the second case it is only called once. With all that said, I think you want to use the __iter__ and __next__ protocols to implement this in a more supported way. On Wed, Sep 9, 2015 at 1:10 PM, Stephan Sahm wrote: > Dear all > > I found a BUG in the standard while statement, which appears both in > python 2.7 and python 3.4 on my system. > > It usually won't appear because I only stumbled upon it after trying to > implement a nice repeat structure. Look: > ?```? > class repeat(object): > def __init__(self, n): > self.n = n > > def __bool__(self): > self.n -= 1 > return self.n >= 0 > > __nonzero__=__bool__ > > a = repeat(2) > ``` > the meaning of the above is that bool(a) returns True 2-times, and after > that always False. > > Now executing > ``` > while a: > print('foo') > ``` > will in fact print 'foo' two times. HOWEVER ;-) .... > ``` > while repeat(2): > print('foo') > ``` > will go on and go on, printing 'foo' until I kill it. > > Please comment, explain or recommend this further if you also think that > both while statements should behave identically. > > hoping for responses, > best, > Stephan > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Sep 9 19:18:39 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 9 Sep 2015 20:18:39 +0300 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: On 09.09.15 19:35, Guido van Rossum wrote: > I've invited Theo to join this list but he's too busy. The two core > Python experts on the random module have given me opinions suggesting > that there's not much wrong with MT, so here I am. Who is right? What > should we do? Is there anything we need to do? Entropy -- limited and slowly recoverable resource (especially if there is no network activity). If you consume it too quickly (for example in a scientific simulation or in a game), it will not have time to recover, that will slow down not only your program, but all consumers of entropy. The use of random.SystemRandom by default looks dangerous. It is unlikely that all existing programs will be rewritten to use random.FastInsecureRandom. From donald at stufft.io Wed Sep 9 19:20:03 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 9 Sep 2015 13:20:03 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: On September 9, 2015 at 1:11:22 PM, Tim Peters (tim.peters at gmail.com) wrote: > > So the real question is: whose use cases do you want to cater to > by default? > > If you answer "crytpo", then realize the Python generator will > have to > change every time the crypto community changes its mind about > what's > _currently_ "good enough". There's a long history of that already.? This is not really true in that sense that Python would need to do anything if the blessed generator changed. We'd use /dev/urandom, one of the syscalls that do the same thing, or the CryptGen API on Windows. Python should not have it's own userland CSPRNG. Then it's up to the platform to follow what generator they are going to provide. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From Stephan.Sahm at gmx.de Wed Sep 9 19:20:53 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Wed, 9 Sep 2015 19:20:53 +0200 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: References: Message-ID: It is so true! thanks for pointing that out. It makes sense to do it that way I probably never used while in different places than ``while 1: pass`` until now. I admit, I am looking for something alternative to a for structure like ``for _ in range(10)`` -- I don't like the ``_`` ;-) How can I use the iterator protocoll to make a nice repeat syntax? On 9 September 2015 at 19:15, Joseph Jevnik wrote: > This appears as intended. The body of the while condition is executed each > time the condition is checked. In the first case, you are creating a single > instance of repeat, and then calling bool on the expression with each > iteration of the loop. With the second case, you are constructing a _new_ > repeat instance each time. Think about the difference between: > > while should_stop(): > ... > > and: > a = should_stop() > while a: > ... > > One would expect should_stop to be called each time in the first case; > but, in the second case it is only called once. > > With all that said, I think you want to use the __iter__ and __next__ > protocols to implement this in a more supported way. > > On Wed, Sep 9, 2015 at 1:10 PM, Stephan Sahm wrote: > >> Dear all >> >> I found a BUG in the standard while statement, which appears both in >> python 2.7 and python 3.4 on my system. >> >> It usually won't appear because I only stumbled upon it after trying to >> implement a nice repeat structure. Look: >> ?```? >> class repeat(object): >> def __init__(self, n): >> self.n = n >> >> def __bool__(self): >> self.n -= 1 >> return self.n >= 0 >> >> __nonzero__=__bool__ >> >> a = repeat(2) >> ``` >> the meaning of the above is that bool(a) returns True 2-times, and after >> that always False. >> >> Now executing >> ``` >> while a: >> print('foo') >> ``` >> will in fact print 'foo' two times. HOWEVER ;-) .... >> ``` >> while repeat(2): >> print('foo') >> ``` >> will go on and go on, printing 'foo' until I kill it. >> >> Please comment, explain or recommend this further if you also think that >> both while statements should behave identically. >> >> hoping for responses, >> best, >> Stephan >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 9 19:28:29 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 12:28:29 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: [Tim] >> So the real question is: whose use cases do you want to cater to >> by default? >> >> If you answer "crytpo", then realize the Python generator will >> have to change every time the crypto community changes its mind >> about what's _currently_ "good enough". There's a long history of >? that already. [Donald Stufft ] > This is not really true in that sense that Python would need to do anything if > the blessed generator changed. I read Guido's message as specifically asking about Theo's "strongly worded recommendation of [Python switching to] the OpenBSD version of arc4random()" as its default generator. In which, case, yes, when that specific implementation falls out of favor, Python would need to change. > We'd use /dev/urandom, one of the syscalls that > do the same thing, or the CryptGen API on Windows. Python should not have it's > own userland CSPRNG. I read Guido's message as asking whether Python should indeed do just that. From dwblas at gmail.com Wed Sep 9 19:30:38 2015 From: dwblas at gmail.com (David Blaschke) Date: Wed, 9 Sep 2015 10:30:38 -0700 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: References: Message-ID: while repeat(2): creates a new repeat instance each time through the loop and initializes the variable as 2 each time through the loop i.e. repeat(2) returns a new, different instance each time. On 9/9/15, Stephan Sahm wrote: > Dear all > > I found a BUG in the standard while statement, which appears both in python > 2.7 and python 3.4 on my system. > > It usually won't appear because I only stumbled upon it after trying to > implement a nice repeat structure. Look: > ?```? > class repeat(object): > def __init__(self, n): > self.n = n > > def __bool__(self): > self.n -= 1 > return self.n >= 0 > > __nonzero__=__bool__ > > a = repeat(2) > ``` > the meaning of the above is that bool(a) returns True 2-times, and after > that always False. > > Now executing > ``` > while a: > print('foo') > ``` > will in fact print 'foo' two times. HOWEVER ;-) .... > ``` > while repeat(2): > print('foo') > ``` > will go on and go on, printing 'foo' until I kill it. > > Please comment, explain or recommend this further if you also think that > both while statements should behave identically. > > hoping for responses, > best, > Stephan > -- With the simplicity of true nature, there shall be no desire. Without desire, one's original nature will be at peace. And the world will naturally be in accord with the right Way. Tao Te Ching From donald at stufft.io Wed Sep 9 19:31:35 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 9 Sep 2015 13:31:35 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: On September 9, 2015 at 1:19:34 PM, Serhiy Storchaka (storchaka at gmail.com) wrote: > On 09.09.15 19:35, Guido van Rossum wrote: > > I've invited Theo to join this list but he's too busy. The two core > > Python experts on the random module have given me opinions suggesting > > that there's not much wrong with MT, so here I am. Who is right? What > > should we do? Is there anything we need to do? > > Entropy -- limited and slowly recoverable resource (especially if there > is no network activity). If you consume it too quickly (for example in a > scientific simulation or in a game), it will not have time to recover, > that will slow down not only your program, but all consumers of entropy. > The use of random.SystemRandom by default looks dangerous. It is > unlikely that all existing programs will be rewritten to use > random.FastInsecureRandom. > This isn?t exactly true. Hardware entropy limited and slowly recovering which is why no sane implementation uses that except to periodically reseed the CSPRNG which is typically based on ARC4 or ChaCha. The standard CSPRNGs that most platforms use are fast enough for most people's use cases. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From guido at python.org Wed Sep 9 19:41:59 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Sep 2015 10:41:59 -0700 Subject: [Python-ideas] Should our default random number generator be secure? Message-ID: ---------- Forwarded message ---------- From: Theo de Raadt Date: Wed, Sep 9, 2015 at 10:36 AM Subject: Re: getentropy, getrandom, arc4random() To: guido at python.org > Yet another thing. Where do you see that Go and Swift have secure random as > a keyword? Searching for "golang random" gives the math/rand package as the > first hit, which has a note reminding the reader to use crypto/rand for > security work. yes, well, look at the other phrase it uses... that produces a deterministic sequence of values each time a program is run it documents itself as being decidely non-random. that documentation change happened soon after this event: https://lwn.net/Articles/625506/ these days, the one people are using is found using "go secure random" https://golang.org/pkg/crypto/rand/ that opens /dev/urandom or uses the getrandom system call depending on system. it also has support for the windows entropy API. it pulls data into a large buffer, a cache. then each subsequent call, it consumes some, until it rus out, and has to do a fresh read. it appears to not clean the buffer behind itself, probably for performance reasons, so the memory is left active. (forward secrecy violated) i don't think they are doing the best they can... i think they should get forward secrecy and higher performance by having an in-process chacha. but you can sense the trend. here's an example of the fallout.. https://github.com/golang/go/issues/9205 > For Swift it's much the same -- there's an arc4random() in > the Darwin package but nothing in the core language. that is what people are led to use. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Wed Sep 9 19:43:53 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 9 Sep 2015 13:43:53 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: On September 9, 2015 at 1:28:46 PM, Tim Peters (tim.peters at gmail.com) wrote: > > I read Guido's message as specifically asking about Theo's > "strongly > worded recommendation of [Python switching to] the OpenBSD > version of > arc4random()" as its default generator. In which, case, yes, > when that > specific implementation falls out of favor, Python would need > to > change. arc4random changes as the underlying implementation changes too, the name is a historical accident really. arc4random no longer uses arc4 it uses chacha, and when/if chacha needs to be replaced, arc4random will still be the name. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From guido at python.org Wed Sep 9 19:43:56 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Sep 2015 10:43:56 -0700 Subject: [Python-ideas] Should our default random number generator be secure? Message-ID: ---------- Forwarded message ---------- From: Theo de Raadt Date: Wed, Sep 9, 2015 at 10:42 AM Subject: Re: getentropy, getrandom, arc4random() To: guido at python.org been speaking to a significant go person. confirmed. it takes data out of that buffer, and does not zero it behind itself. obviously for performance reasons. same type of thing happens with MT-style engines. in practice, they can be would backwards. a proper stream cipher cannot be turned backwards. however, that's just an academic observation. or maybe it indicates that well-financed groups can get it wrong too. by the way, chacha arc4random can create random values faster than a memcpy -- the computation of fresh output is faster than doing gross-cost of "read" from memory (when cache dirtying is accounted for). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Wed Sep 9 19:46:09 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 13:46:09 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: <1441820769.2850642.379075929.5DABF6B4@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 13:18, Serhiy Storchaka wrote: > Entropy -- limited and slowly recoverable resource (especially if there > is no network activity). If you consume it too quickly (for example in a > scientific simulation or in a game), it will not have time to recover, > that will slow down not only your program, but all consumers of entropy. > The use of random.SystemRandom by default looks dangerous. It is > unlikely that all existing programs will be rewritten to use > random.FastInsecureRandom. http://www.2uo.de/myths-about-urandom/ should be required reading. As far as I know, no-one is actually proposing the use of a method that blocks when there's "not enough entropy", nor does arc4random itself appear to do so. From random832 at fastmail.us Wed Sep 9 19:54:14 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 13:54:14 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 13:43, Donald Stufft wrote: > arc4random changes as the underlying implementation changes too, the name > is a > historical accident really. arc4random no longer uses arc4 it uses > chacha, and > when/if chacha needs to be replaced, arc4random will still be the name. The issue is, what should Python do, if the decision is made to not provide its own RNG [presumably would be a forked copy of OpenBSD's current arc4random] on systems that do not provide a function named arc4random? Use /dev/urandom (or CryptGenRandom) every time [more expensive, performs I/O]? rand48? random? rand? I don't see the issue with Python providing its own implementation. If the state of the art changes, we can have another discussion then. From skrah at bytereef.org Wed Sep 9 20:00:59 2015 From: skrah at bytereef.org (Stefan Krah) Date: Wed, 9 Sep 2015 18:00:59 +0000 (UTC) Subject: [Python-ideas] Should our default random number generator be secure? References: Message-ID: Tim Peters writes: > > We'd use /dev/urandom, one of the syscalls that > > do the same thing, or the CryptGen API on Windows. Python should not have it's > > own userland CSPRNG. > > I read Guido's message as asking whether Python should indeed do just that. >From Theo's forwarded mail I also got the impression that he wanted us to use OpenBSD code to implement our own CSPRNG, use that for the default functions in the random module and add new functions for reproducible random numbers that use the MT. My intuition is that if someone just uses a random() function without checking if it's cryptographically secure then the application will probably have other holes as well. I mean, for example no one is going to use C's rand() function for crypto. Stefan Krah From random832 at fastmail.us Wed Sep 9 20:08:56 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 14:08:56 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: <1441822136.2856767.379097753.6E5AA6EA@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 14:00, Stefan Krah wrote: > My intuition is that if someone just uses a random() function > without checking if it's cryptographically secure then the > application will probably have other holes as well. I mean, > for example no one is going to use C's rand() function for crypto. Let's turn the question around - what's the _benefit_ of having a random number generator available that _isn't_ cryptographically secure? One possible argument is performance. If that's the issue - what are our performance targets? How can they be measured? Another argument is that some applications really do need deterministic seeding. Is there a reason not to require them to be explicit about it? From tim.peters at gmail.com Wed Sep 9 20:16:29 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 13:16:29 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: [Stefan Krah ] > From Theo's forwarded mail I also got the impression that he wanted > us to use OpenBSD code to implement our own CSPRNG, use that for > the default functions in the random module and add new functions > for reproducible random numbers that use the MT. I read it the same way on all counts. > My intuition is that if someone just uses a random() function > without checking if it's cryptographically secure then the > application will probably have other holes as well. I mean, > for example no one is going to use C's rand() function for crypto. Yes, if they're not checking the random() docs first, they're a total crypto moron - in which case it's insane to believe they'll do anything else related to crypto-strength requirements right either. It's hard to make something idiot-proof even if your target audience is bona fide crypto experts ;-) From skrah at bytereef.org Wed Sep 9 20:17:27 2015 From: skrah at bytereef.org (Stefan Krah) Date: Wed, 9 Sep 2015 18:17:27 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Should_our_default_random_number_generat?= =?utf-8?q?or_be=09secure=3F?= References: <1441822136.2856767.379097753.6E5AA6EA@webmail.messagingengine.com> Message-ID: writes: > On Wed, Sep 9, 2015, at 14:00, Stefan Krah wrote: > > My intuition is that if someone just uses a random() function > > without checking if it's cryptographically secure then the > > application will probably have other holes as well. I mean, > > for example no one is going to use C's rand() function for crypto. > > Let's turn the question around - what's the _benefit_ of having a random > number generator available that _isn't_ cryptographically secure? One > possible argument is performance. If that's the issue - what are our > performance targets? How can they be measured? Another argument is that > some applications really do need deterministic seeding. Is there a > reason not to require them to be explicit about it? As you say, performance: http://www.pcg-random.org/rng-performance.html Random number generation is a very broad field. I'm not a specialist, so I just entered "Mersenne Twister" into an academic search engine and got many results, but none for arc4random. It's an interesting question you ask. I'd have to do a lot of reading first to get an overview. Stefan Krah From tim.peters at gmail.com Wed Sep 9 20:31:49 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 13:31:49 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> Message-ID: [random832 at fastmail.us] > I don't see the issue with Python providing its own implementation. If > the state of the art changes, It will. Over & over again. That's why it's called "art" ;-) > we can have another discussion then. Also over & over again. If you volunteer to own responsibility for updating all versions of Python each time it changes (in a crypto context, an advance in the state of the art implies the prior state becomes "a bug"), and post a performance bond sufficient to pay someone else to do it if you vanish, then a major pragmatic objection would go away ;-) From skrah at bytereef.org Wed Sep 9 20:43:05 2015 From: skrah at bytereef.org (Stefan Krah) Date: Wed, 9 Sep 2015 18:43:05 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Should_our_default_random_number_generat?= =?utf-8?q?or_be=09secure=3F?= References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> Message-ID: Tim Peters writes: > > we can have another discussion then. > > Also over & over again. If you volunteer to own responsibility for > updating all versions of Python each time it changes (in a crypto > context, an advance in the state of the art implies the prior state > becomes "a bug"), and post a performance bond sufficient to pay > someone else to do it if you vanish, then a major pragmatic objection > would go away The OpenBSD devs could also publish arc4random as a library that works everywhere (like OpenSSH). That would be a nicer solution for everyone (except for the devs perhaps :). Stefan Krah From tim.peters at gmail.com Wed Sep 9 20:47:40 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 13:47:40 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> Message-ID: [Stefan Krah ] > ... > The OpenBSD devs could also publish arc4random as a library that > works everywhere (like OpenSSH). That would be a nicer solution > for everyone (except for the devs perhaps :). Telling Python devs "hey, it will be as easy as dealing with OpenSSH has been!" is indeed a good way to kill the idea at once ;-) From srkunze at mail.de Wed Sep 9 20:50:44 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 9 Sep 2015 20:50:44 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> Message-ID: <55F07F84.9040707@mail.de> Fair enough. On 09.09.2015 18:35, Stefan Krah wrote: > Sven R. Kunze writes: >> I still don't understand what's wrong with deprecating %, but okay. I >> think f-strings will push {} to wide-range adoption. > Then it will probably be hard to explain, so I'll be direct: > > 1) Many Python users are fed up with churn and don't want to do yet > another rewrite of their applications (just after migrating to > 3.x). > > 2) Despite many years of officially preferring {}-formatting in > the docs (and on Stackoverflow), people *still* use %-formatting. > This should be a clue that they actually like it. > > 3) %-formatting often has better performance and is often > easier to read. > > 4) Yes, in other cases {}-formatting is easier to read. So choose > whatever is best. > > > > Stefan Krah > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From random832 at fastmail.us Wed Sep 9 20:55:01 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 14:55:01 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> Message-ID: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 14:31, Tim Peters wrote: > Also over & over again. If you volunteer to own responsibility for > updating all versions of Python each time it changes (in a crypto > context, an advance in the state of the art implies the prior state > becomes "a bug"), and post a performance bond sufficient to pay > someone else to do it if you vanish, then a major pragmatic objection > would go away ;-) I don't see how "Changing Python's RNG implementation today to arc4random as it exists now" necessarily implies "Making a commitment to guarantee the cryptographic suitability of Python's RNG for all time". Those are two separate things. From random832 at fastmail.us Wed Sep 9 20:56:13 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 14:56:13 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441822136.2856767.379097753.6E5AA6EA@webmail.messagingengine.com> Message-ID: <1441824973.2867674.379143865.0237C5FC@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 14:17, Stefan Krah wrote: > Random number generation is a very broad field. I'm not a specialist, > so I just entered "Mersenne Twister" into an academic search engine > and got many results, but none for arc4random. Try "Chacha". The "arc4random" name is a legacy of an older implementation. From tim.peters at gmail.com Wed Sep 9 21:03:33 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 14:03:33 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> Message-ID: [] > I don't see how "Changing Python's RNG implementation today to > arc4random as it exists now" necessarily implies "Making a commitment to > guarantee the cryptographic suitability of Python's RNG for all time". > Those are two separate things. Disagree. The _only_ point to switching today is "to guarantee the cryptographic suitability of Python's RNG" today. It misses the intent of the switch entirely to give a "but tomorrow? eh - that'[s a different issue" dodge. No, no rules of formal logic would be violated by separating the two - it would be a violation of the only _sense_ in making a switch at all. If you don't believe me, try asking Theo ;-) From steve at pearwood.info Wed Sep 9 21:07:57 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 10 Sep 2015 05:07:57 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> Message-ID: <20150909190757.GM19373@ando.pearwood.info> On Wed, Sep 09, 2015 at 02:55:01PM -0400, random832 at fastmail.us wrote: > On Wed, Sep 9, 2015, at 14:31, Tim Peters wrote: > > Also over & over again. If you volunteer to own responsibility for > > updating all versions of Python each time it changes (in a crypto > > context, an advance in the state of the art implies the prior state > > becomes "a bug"), and post a performance bond sufficient to pay > > someone else to do it if you vanish, then a major pragmatic objection > > would go away ;-) > > I don't see how "Changing Python's RNG implementation today to > arc4random as it exists now" necessarily implies "Making a commitment to > guarantee the cryptographic suitability of Python's RNG for all time". > Those are two separate things. Not really. Look at the subject line. It doesn't say "should we change from MT to arc4random?", it asks if the default random number generator should be secure. The only reason we are considering the change from MT to arc4random is to make the PRNG cryptographically secure. "Secure" is a moving target, what is secure today will not be secure tomorrow. Yes, in principle, we could make the change once, then never again. But why bother? We don't gain anything from changing to arc4random if there is no promise to be secure into the future. Question, aimed at anyone, not necessarily random832 -- one desirable property of PRNGs is that you can repeat a sequence of values if you re-seed with a known value. Does arc4random keep that property? I think that it is important that the default RNG be deterministic when given a known seed. (I'm happy for the default seed to be unpredictable.) -- Steve From srkunze at mail.de Wed Sep 9 21:09:05 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 9 Sep 2015 21:09:05 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: <55F083D1.7040401@mail.de> On 09.09.2015 18:53, Donald Stufft wrote: > This would > essentially be inverting the relationship today, where it defaults to insecure > and you have to opt in to secure. Not being an expert on this but I agree with this assessment. You can determine easily whether your program runs fast enough. If not, you can fix it. You cannot determine easily whether something you made is cryptographically secure. The default should be as secure as possible. Best, Sven From encukou at gmail.com Wed Sep 9 21:12:18 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 9 Sep 2015 21:12:18 +0200 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: References: Message-ID: On Wed, Sep 9, 2015 at 7:20 PM, Stephan Sahm wrote: > It is so true! > thanks for pointing that out. It makes sense to do it that way > I probably never used while in different places than ``while 1: pass`` until > now. > > I admit, I am looking for something alternative to a for structure like > ``for _ in range(10)`` -- I don't like the ``_`` ;-) > How can I use the iterator protocoll to make a nice repeat syntax? If you don't like the ``_``, you can use ``for iteration_index in range(10):``. You always need to store the iteration number somewhere (the original post has it in ``self.n``). Chances are you'll want to access it later, when you debug your code. Hiding it in a class is just making it harder to get. It's also making the whole thing less maintainable, because other people, who are used to seeing ``for i in range(...)``, would now need to understand your custom class and new idiom. (And as always with maintainability, "other people" includes you in a few years.) Instead, just use the current syntax. Eventually it will start looking nice to you. > On 9 September 2015 at 19:15, Joseph Jevnik wrote: >> >> This appears as intended. The body of the while condition is executed each >> time the condition is checked. In the first case, you are creating a single >> instance of repeat, and then calling bool on the expression with each >> iteration of the loop. With the second case, you are constructing a _new_ >> repeat instance each time. Think about the difference between: >> >> while should_stop(): >> ... >> >> and: >> a = should_stop() >> while a: >> ... >> >> One would expect should_stop to be called each time in the first case; >> but, in the second case it is only called once. >> >> With all that said, I think you want to use the __iter__ and __next__ >> protocols to implement this in a more supported way. >> >> On Wed, Sep 9, 2015 at 1:10 PM, Stephan Sahm wrote: >>> >>> Dear all >>> >>> I found a BUG in the standard while statement, which appears both in >>> python 2.7 and python 3.4 on my system. >>> >>> It usually won't appear because I only stumbled upon it after trying to >>> implement a nice repeat structure. Look: >>> ``` >>> class repeat(object): >>> def __init__(self, n): >>> self.n = n >>> >>> def __bool__(self): >>> self.n -= 1 >>> return self.n >= 0 >>> >>> __nonzero__=__bool__ >>> >>> a = repeat(2) >>> ``` >>> the meaning of the above is that bool(a) returns True 2-times, and after >>> that always False. >>> >>> Now executing >>> ``` >>> while a: >>> print('foo') >>> ``` >>> will in fact print 'foo' two times. HOWEVER ;-) .... >>> ``` >>> while repeat(2): >>> print('foo') >>> ``` >>> will go on and go on, printing 'foo' until I kill it. >>> >>> Please comment, explain or recommend this further if you also think that >>> both while statements should behave identically. >>> >>> hoping for responses, >>> best, >>> Stephan From skrah at bytereef.org Wed Sep 9 21:13:29 2015 From: skrah at bytereef.org (Stefan Krah) Date: Wed, 9 Sep 2015 19:13:29 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Should_our_default_random_number_generat?= =?utf-8?q?or_be=09secure=3F?= References: <1441822136.2856767.379097753.6E5AA6EA@webmail.messagingengine.com> <1441824973.2867674.379143865.0237C5FC@webmail.messagingengine.com> Message-ID: writes: > On Wed, Sep 9, 2015, at 14:17, Stefan Krah wrote: > > Random number generation is a very broad field. I'm not a specialist, > > so I just entered "Mersenne Twister" into an academic search engine > > and got many results, but none for arc4random. > > Try "Chacha". The "arc4random" name is a legacy of an older > implementation. I know chacha (and most of djb's other works). I thought we were talking about the suitability of cryptographically secure RNGs for traditional scientific applications, in particular whether there are *other* reasons apart from performance not to use them. Stefan Krah From tim.peters at gmail.com Wed Sep 9 21:20:52 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 14:20:52 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150909190757.GM19373@ando.pearwood.info> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: [Steven D'Aprano ] > ... > Question, aimed at anyone, not necessarily random832 -- one desirable > property of PRNGs is that you can repeat a sequence of values if you > re-seed with a known value. Does arc4random keep that property? I think > that it is important that the default RNG be deterministic when given a > known seed. (I'm happy for the default seed to be unpredictable.) "arc4random" is ill-defined. From what I gathered, it's the case that "pure chacha" variants can be seeded to get a reproducible sequence "in theory", but that not all implementations support that. Specifically, the OpenBSD implementation being "sold" here does not and cannot: http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/arc4random.3 "Does not" because there is no API to either request or set a seed. "Cannot" because: The subsystem is re-seeded from the kernel random number subsystem using getentropy(2) on a regular basis Other variants skip that last part. From skrah at bytereef.org Wed Sep 9 21:33:16 2015 From: skrah at bytereef.org (Stefan Krah) Date: Wed, 9 Sep 2015 19:33:16 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Should_our_default_random_number_generat?= =?utf-8?q?or_be=09secure=3F?= References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: Steven D'Aprano writes: > Question, aimed at anyone, not necessarily random832 -- one desirable > property of PRNGs is that you can repeat a sequence of values if you > re-seed with a known value. Does arc4random keep that property? I think > that it is important that the default RNG be deterministic when given a > known seed. (I'm happy for the default seed to be unpredictable.) I think the removal of MT wasn't proposed (at least not by Theo). So we'd still have deterministic sequences in addition to arc4random. Stefan Krah From p.f.moore at gmail.com Wed Sep 9 22:04:32 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Sep 2015 21:04:32 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: On 9 September 2015 at 20:33, Stefan Krah wrote: > Steven D'Aprano writes: >> Question, aimed at anyone, not necessarily random832 -- one desirable >> property of PRNGs is that you can repeat a sequence of values if you >> re-seed with a known value. Does arc4random keep that property? I think >> that it is important that the default RNG be deterministic when given a >> known seed. (I'm happy for the default seed to be unpredictable.) > > I think the removal of MT wasn't proposed (at least not by Theo). > So we'd still have deterministic sequences in addition to > arc4random. I use a RNG quite often. Typically for simulations (games, dierolls, card draws, that sort of thing). Sometimes for many millions of results (Monte Carlo simulations, for example). I would always just use the default RNG supplied by the stdlib - I view my use case as "normal use" and wouldn't go looking for specialist answers. I'd occasionally look for reproducibility, although it's not often a key requirement for me (I would expect it as an option from the stdlib RNG, though). Anyone doing crypto who doesn't fully appreciate that it's a specialist subject and that they should be looking for a dedicated RNG suitable for crypto, is probably going to make a lot of *other* mistakes as well. Leading them away from this one probably isn't going to be enough to make their code something I'd want to use... So as a user, I'm against making a change like this. Let the default RNG in the stdlib be something suitable for simulations, "pick a random question", and similar situations, and provide a crypto-capable RNG for those who need it, but not as the default. (I am, of course, assuming that it's not possible to have a single RNG that is the best option for both uses - nobody on this thread seems to have suggested that I'm wrong in this assumption). Paul From skrah at bytereef.org Wed Sep 9 22:07:32 2015 From: skrah at bytereef.org (Stefan Krah) Date: Wed, 9 Sep 2015 20:07:32 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Should_our_default_random_number_generat?= =?utf-8?q?or_be=09secure=3F?= References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> Message-ID: Stefan Krah writes: > The OpenBSD devs could also publish arc4random as a library that > works everywhere (like OpenSSH). That would be a nicer solution > for everyone (except for the devs perhaps :). And naturally they're already doing that. I missed this in Theo's first mail: https://github.com/libressl-portable/openbsd/tree/master/src/lib/libcrypto/crypto https://github.com/libressl-portable/portable/tree/master/crypto/compat So I guess the whole thing also depends on how popular these libraries will be. Stefan Krah From random832 at fastmail.us Wed Sep 9 22:09:21 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 16:09:21 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150909190757.GM19373@ando.pearwood.info> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: <1441829361.2883366.379212985.164412ED@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 15:07, Steven D'Aprano wrote: > Not really. Look at the subject line. It doesn't say "should we change > from MT to arc4random?", it asks if the default random number generator > should be secure. The only reason we are considering the change from MT > to arc4random is to make the PRNG cryptographically secure. "Secure" is > a moving target, what is secure today will not be secure tomorrow. Right, but we are discussing making it secure today. From guido at python.org Wed Sep 9 22:17:14 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Sep 2015 13:17:14 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 Message-ID: Jukka wrote up a proposal for structural subtyping. It's pretty good. Please discuss. https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Wed Sep 9 22:38:02 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 16:38:02 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: <1441831082.2889011.379232937.1FB72573@webmail.messagingengine.com> The commit message changing libc's random functions to use arc4random is as follows: > Change rand(), random(), drand48(), lrand48(), mrand48(), and srand48() > to returning strong random by default, source from arc4random(3). > Parameters to the seeding functions are ignored, and the subsystems remain > in strong random mode. If you wish the standardized deterministic mode, > call srand_deterministic(), srandom_determistic(), srand48_deterministic(), > seed48_deterministic() or lcong48_deterministic() instead. > The re-entrant functions rand_r(), erand48(), nrand48(), jrand48() are > unaffected by this change and remain in deterministic mode (for now). > > Verified as a good roadmap forward by auditing 8800 pieces of software. > Roughly 60 pieces of software will need adaptation to request the > deterministic mode. > > Violates POSIX and C89, which violate best practice in this century. > ok guenther tedu millert Perhaps someone could ask them for information about that audit, and how many / what of those pieces of software were actually using these functions in ways which made them insecure, but whose security would be notably improved by a better random implementation (I suspect that the main thrust of the audit, though, was on finding which ones would be broken by taking away the default deterministic seeding). That could tell us how typical it is for people to ignorantly use default random functions for security-critical code with no other flaws. From encukou at gmail.com Wed Sep 9 22:37:55 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 9 Sep 2015 22:37:55 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: On Wed, Sep 9, 2015 at 9:33 PM, Stefan Krah wrote: > Steven D'Aprano writes: >> Question, aimed at anyone, not necessarily random832 -- one desirable >> property of PRNGs is that you can repeat a sequence of values if you >> re-seed with a known value. Does arc4random keep that property? I think >> that it is important that the default RNG be deterministic when given a >> known seed. (I'm happy for the default seed to be unpredictable.) The OpenBSD implementation does not allow any kind of reproducible results. Reading http://www.pcg-random.org/other-rngs.html, I see that arc4random is not built for is statistical quality and k-dimensional equidistribution, which are also properties you might not need for crypto, but do want for simulations. So there are two quite different use cases (plus a lot of grey area where any solution is okay). The current situation may be surprising to people who didn't read the docs. Switching away from MT might be a disservice to users that did read and understand them. From njs at pobox.com Wed Sep 9 23:02:19 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Sep 2015 14:02:19 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: On Sep 9, 2015 12:21 PM, "Tim Peters" wrote: > > [Steven D'Aprano ] > > ... > > Question, aimed at anyone, not necessarily random832 -- one desirable > > property of PRNGs is that you can repeat a sequence of values if you > > re-seed with a known value. Does arc4random keep that property? I think > > that it is important that the default RNG be deterministic when given a > > known seed. (I'm happy for the default seed to be unpredictable.) > > "arc4random" is ill-defined. From what I gathered, it's the case that > "pure chacha" variants can be seeded to get a reproducible sequence > "in theory", but that not all implementations support that. > > Specifically, the OpenBSD implementation being "sold" here does not and cannot: > > http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/arc4random.3 > > "Does not" because there is no API to either request or set a seed. > > "Cannot" because: > > The subsystem is re-seeded from the kernel random number > subsystem using getentropy(2) on a regular basis Another reason why it is important *not* to provide a seeding api for a crypto rng is that this means you can later swap out the underlying algorithms easily as the state of the art improves. By contrast, if you have a deterministic seeded mode, then swapping out the algorithm becomes a compatibility break. (You can provide a "mix this extra entropy into the pool" api, which looks rather similar to seeding, but has fundamentally different semantics.) The only real problem that I see with switching the random module to use a crypto rng is exactly this backwards compatibility issue. For scientific users, reproducibility of output streams is really important. (Ironically, this is a variety of "important" that crypto people are very familiar with: universally acknowledged to be the right thing by everyone who's thought about it, a minority do religiously and rely on, and most people ignore out of ignorance. Education is ongoing...) OTOH python has never made strong guarantees of output stream reproducibility -- 3.2 broke all seeds by default (you have to add 'version=1' to your seed call to get the same results on post-3.2 pythons -- which of course gives an error on older versions). And 99% of the methods are documented to be unstable across versions -- the only method that's guaranteed to produce reproducible results across versions is random.random(). In practice the other methods usually don't change so people get away with it, but. See: https://docs.python.org/3/library/random.html#notes-on-reproducibility So in practice the stdlib random module is not super suitable for scientific work anyway. Not that this stops anyone from using it for this purpose... see above. (And to be fair even this limited determinism is still enough to be somewhat useful -- not all projects require reproducibility across years of different python versions.) Plus even a lot of people who know about the importance of seeding don't realize that the stdlib's support has these gotchas. (Numpy unsurprisingly puts higher priority on these issues -- our random module guarantees exact reproducibility of seeded outputs modulo rounding, across versions and systems, except for bugfixes necessary for correctness. This means that we carry around a bunch of old inefficient implementations of the distribution methods, but so be it...) So, all that considered: I can actually see an argument for removing the seeding methods from the the stdib entirely, and directing those who need reproducibility to a third party library like numpy (/ pygsl / ...). This would be pretty annoying for some cases where someone really does have simple needs and wants just a little determinism without a binary extension, but on net it might even be an improvement, given how easy it is to misread the current api as guaranteeing more than it actually promises. OTOH this would actually break the current promise, weak as it is. Keeping that promise in mind, an alternative would be to keep both generators around, use the cryptographically secure one by default, and switch to MT when someone calls seed(1234, generator="INSECURE LEGACY MT") But this would justifiably get us crucified by the security community, because the above call would flip the insecure switch for your entire program, including possibly other modules that were depending on random to provide secure bits. So if we were going to do this then I think it would have to be by switching the global RNG over unconditionally, and to fulfill the promise, provide the MT option as a separate class that the user would have to instantiate explicitly if they wanted it for backcompat. Document that you should replace import random random.seed(12345) if random.whatever(): ... with from random import MTRandom random = MTRandom(12345) if random.whatever(): ... As part of this transition I would also suggest making the seed method on non-seedable RNGs raise an error when given an explicit seed, instead of silently doing nothing like the current SystemRandom. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Wed Sep 9 23:15:39 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 17:15:39 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: <1441833339.2897870.379262849.1477A353@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 17:02, Nathaniel Smith wrote: > Keeping that promise in mind, an alternative would be to keep both > generators around, use the cryptographically secure one by default, and > switch to MT when someone calls > > seed(1234, generator="INSECURE LEGACY MT") > > But this would justifiably get us crucified by the security community, > because the above call would flip the insecure switch for your entire > program, including possibly other modules that were depending on random > to > provide secure bits. Ideally, neither the crypto bits nor the science bits of a big program should be using the module-level functions. A small program either hasn't got both kinds of bits, or won't be using them at the same time. And if you've got non-science bits doing stuff with your RNG then your results probably aren't going to be reproducible anyway. Which suggests a solution: How about exposing a way to switch out the Random instance used by the module-level functions? The instance itself exists now as random._inst, but the module just spews its bound methods all over its namespace. (Long-term, it might make sense to deprecate the module-level functions) From srkunze at mail.de Wed Sep 9 23:16:24 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 9 Sep 2015 23:16:24 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: <55F0A1A8.5010001@mail.de> Thanks for sharing, Guido. Some random thoughts: - "classes should need to be explicitly marked as protocols" If so, why are they classes in the first place? Other languages has dedicated keywords like "interface". - "recursive types"? Yes, please. I am very curious about how to as I am working a similar problem. It would basically require defining of the protocol first and then populating its member as they might use the protocol's name. pyfu is supposed to do exactly this. But it's not going to work 100% when metaclasses come into the game. Best, Sven On 09.09.2015 22:17, Guido van Rossum wrote: > Jukka wrote up a proposal for structural subtyping. It's pretty good. > Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 > > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Sep 9 23:18:06 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 14:18:06 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On Sep 9, 2015, at 01:56, Paul Moore wrote: > > Apart from that issue, which is Windows only (and thus some people > find it less compelling) we have also had reported issues of people > running pip, and it installs things into the "wrong" Python > installation. This is typically because of PATH configuration issues, > where "pip" is being found via one PATH element, but "python" is found > via a different one. I don't have specifics to hand, so I can't > clarify *how* people have managed to construct such breakage, but I > can state that it happens, and the relevant people are usually very > confused by the results. If StackOverflow/SU/TD questions are any indication, a disproportionate number of these people are Mac users using Python 2.7, who have installed a second Python 2.7 (or, in some cases, two of them) alongside Apple's. Many teachers, blog posts, instructions for scientific packages, etc. recommend this, but often don't give enough information for a novice to get it right. Many people don't even realize they already have a Python 2.7; others are making their first foray into serious terminal usage and don't think about PATH issues; others are following old instructions written for OS X 10.5 that don't do the right thing in 10.6, much less 10.10; etc. And even experienced *nix types who aren't Mac experts may not realize the implications of LaunchServices not being launched by the shell (so anything you double-click, schedule, run as a service, etc. won't see your export PATH that you think should be solving things). Even Mac experts are thrown by the fact that Apple's pre-installed Python is in /usr but has a scripts directory in /usr/local, so if you install pip for both Apple's Python and a second one, whichever one goes second is likely to overwrite the first (but that isn't as common as just having /usr/bin ahead of /usr/local/bin on the PATH--because Apple's Python doesn't come with pip, this is enough to have your highest pip and python executables out of sync). Whenever someone has a PATH question, I always start by asking them if they're on a Mac, and using Python 2.7, and, if so, which if any Python installs they've done, and why they can't use virtual environments and/or upgrade to Python 3.x and/or use the system Python. The vast majority say yes, yes, [python.org|Homebrew|the one linked from this blog post|I don't remember], what's a virtual environment, my [book|teacher|friend] says 2.7 is the best version, this blog post says [Apple doesn't include Python|Apple's Python is 2.7.1 and broken|etc.]. As both Python 3 and virtual environments become more common (at least as long as Apple isn't shipping Python 3 or virtualenv for their 2.7), the problem seems to be becoming less common, but it's still depressing how many people are still writing blog posts and SO answers and so on that tell people "you need to install the latest version of Python, 2.7, because your computer doesn't come with it" and then proceed to give instructions that will lead to a screwed up PATH and make no mention of virtualenv... From donald at stufft.io Wed Sep 9 23:25:12 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 9 Sep 2015 17:25:12 -0400 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On September 9, 2015 at 5:22:57 PM, Andrew Barnert via Python-ideas (python-ideas at python.org) wrote: > > Apple's Python doesn't come with pip As of the latest Yosemite release, and in El Capitan, it *does* however come with Python 2.7.10 and thus ``python -m ensurepip`` works. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From skrah at bytereef.org Wed Sep 9 23:36:06 2015 From: skrah at bytereef.org (Stefan Krah) Date: Wed, 9 Sep 2015 21:36:06 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Should_our_default_random_number_generat?= =?utf-8?q?or_be=09secure=3F?= References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: Petr Viktorin writes: > The OpenBSD implementation does not allow any kind of reproducible results. > Reading http://www.pcg-random.org/other-rngs.html, I see that > arc4random is not built for is statistical quality and k-dimensional > equidistribution, which are also properties you might not need for > crypto, but do want for simulations. > So there are two quite different use cases (plus a lot of grey area > where any solution is okay). I can't find much at all when searching for "chacha20 equidistribution". Contrast that with "mersenne twister equidistribution" and it seems that chacha20 hasn't been studied very much in that respect (except for the pcg-random site). So I also think this should preclude us from replacing the current random() functions. Adding an arc4random module with the caveat that its quality will be as good as the current OpenBSD libcrypto/libressl(?) would be okay. Stefan Krah From abarnert at yahoo.com Wed Sep 9 23:50:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 14:50:16 -0700 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55F058B6.9000202@mail.de> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> Message-ID: <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> On Sep 9, 2015, at 09:05, Sven R. Kunze wrote: > >> On 09.09.2015 02:09, Andrew Barnert via Python-ideas wrote: >> I think it's already been established why % formatting is not going away any time soon. >> >> As for de-emphasizing it, I think that's already done pretty well in the current docs. The tutorial has a nice long introduction to str.format, a one-paragraph section on "old string formatting" with a single %5.3f example, and a one-sentence mention of Template. The stdtypes chapter in the library reference explains the difference between the two in a way that makes format sound more attractive for novices, and then has details on each one as appropriate. What else should be done? > > I had difficulties to find what you mean by tutorial. But hey, being a Python user for years and not knowing where the official tutorial resides... If you go to docs.python.org (directly, or by clicking the link to docs for Python 3 or Python 2 from the home page or the documentation menu), Tutorial is the second thing on the list, after What's New. And, as you found, it's the first hit for "Python tutorial" on Google. At any rate, if you're not concerned with the tutorial, which parts of the docs are you worried about? Sure, a lot of people learn Python from various books, websites, and classes that present % instead of (or at least in equal light with) {}, but those are all outside the control of Python itself. You can't write a PEP to get the author of ThinkPython, a guy who wrote 1800 random StackOverflow answers, or the instructor for Programming 101 at Steve University to change what they teach. And if not the docs, what else would it mean to "de-emphasize" %-formatting without deprecating or removing it? > Anyway, Google presented me the version 2.7 of the tutorial. That's a whole other problem. But nobody is going to retroactively change Python 2.7 just to help people who find the 2.7 docs when they should be looking for 3.5. That might seem reasonable today, when 2.7 could heartily recommend str.format because it's nearly the same in 2.7 as in 3.5, but what about next year, when f-strings are the preferred way to do it in 3.6? If 3.6 de-emphasizes str.format (as a feature used only when you need backward compat and/or dynamic formats) and its tutorial, %-formatting docs, and str.format docs all point to f-strings, having 2.7's docs point people to str.format will be misleading at best for 3.6, but having it recommend something that doesn't exist in 2.7 will be actively wrong for 2.7. The solution is to get people to the 3.5 or 3.6 docs in the first place, not to hack up the 2.7 docs. > I still don't understand what's wrong with deprecating %, but okay. Well, have you read the answers given by Nick, me, and others earlier in the thread? If so, what do you disagree with? You've only addressed one point (that % is faster than {} for simple cases--and your solution is just "make {} faster", which may not be possible given that it's inherently more hookable than % and therefore requires more function calls...). What about formatting headers for ASCII wire protocols, sharing tables of format strings between programming languages (e.g., for i18n), or any of the other reasons people have brought up? From abarnert at yahoo.com Wed Sep 9 23:28:46 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 14:28:46 -0700 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: Message-ID: <5B23496D-6DBD-49B3-91D7-E093309A84C7@yahoo.com> On Sep 9, 2015, at 06:41, Wolfgang Maier wrote: > > 3) > Finally, the alternative idea of having the new functionality handled by a new !converter, like: > > "List: {0!j:,}".format([1.2, 3.4, 5.6]) > > I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either: > > - a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or > > - the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. > Please correct me, if I misunderstood something about this alternative proposal. But the format method already _does_ react directly to the conversion flag. As the docs say, the "type coercion" (call to str, repr, or ascii) happens before formatting, and then the __format__ method is called on the result. A new !j would be a "regular converter"; it just calls a new join function (which returns something whose __format__ method then does the right thing) instead of the str, repr, or ascii functions. And random's custom converter idea would work similarly, except that presumably his !join would specify a function registered to handle the "join" conversion in some way rather than being hardcoded to a builtin. From srkunze at mail.de Thu Sep 10 00:02:43 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 00:02:43 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: <55F0AC83.3050505@mail.de> Not specifically about this proposal but about the effort put into Python typehinting in general currently: What are the supposed benefits? I somewhere read that right now tools are able to infer 60% of the types. That seems pretty good to me and a lot of effort on your side to make some additional 20?/30? %. Don't get me wrong, I like the theoretical and abstract discussions around this topic but I feel this type of feature way out of the practical realm. I don't see the effort for adding type hints AND the effort for further parsing (by human eyes) justified by partially better IDE support and 1 single additional test within test suites of about 10,000s of tests. Especially, when considering that correct types don't prove functionality in any case. But tested functionality in some way proves correct typing. Just my two cents since I felt I had to say this and maybe I am missing something. :) Best, Sven On 09.09.2015 22:17, Guido van Rossum wrote: > Jukka wrote up a proposal for structural subtyping. It's pretty good. > Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 > > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Thu Sep 10 00:03:05 2015 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 10 Sep 2015 00:03:05 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <5B23496D-6DBD-49B3-91D7-E093309A84C7@yahoo.com> References: <5B23496D-6DBD-49B3-91D7-E093309A84C7@yahoo.com> Message-ID: <55F0AC99.8030408@biologie.uni-freiburg.de> On 09.09.2015 23:28, Andrew Barnert via Python-ideas wrote: > On Sep 9, 2015, at 06:41, Wolfgang Maier wrote: >> >> 3) >> Finally, the alternative idea of having the new functionality handled by a new !converter, like: >> >> "List: {0!j:,}".format([1.2, 3.4, 5.6]) >> >> I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either: >> >> - a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or >> >> - the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. >> Please correct me, if I misunderstood something about this alternative proposal. > > But the format method already _does_ react directly to the conversion flag. As the docs say, the "type coercion" (call to str, repr, or ascii) happens before formatting, and then the __format__ method is called on the result. A new !j would be a "regular converter"; it just calls a new join function (which returns something whose __format__ method then does the right thing) instead of the str, repr, or ascii functions. > Ah, I see! Thanks for correcting me here. Somehow, I had the mental picture that the format converters would call the object's __str__ and __repr__ methods directly (and so you'd need an additional __join__ method for the new converter), but that's not the case then. > And random's custom converter idea would work similarly, except that presumably his !join would specify a function registered to handle the "join" conversion in some way rather than being hardcoded to a builtin. > How would such a registration work (sorry, I haven't had the time to search the list for his previous mention of this idea)? A new builtin certainly won't fly. Thanks, Wolfgang From rustompmody at gmail.com Wed Sep 9 18:16:22 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Wed, 9 Sep 2015 09:16:22 -0700 (PDT) Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: <0a3342d3-0b60-41cd-9cbb-5fac32c371ef@googlegroups.com> On Wednesday, September 9, 2015 at 2:27:08 PM UTC+5:30, Paul Moore wrote: > > In actual fact, if it weren't for the backward compatibility issues it > > would cause, I'd be tempted to argue that pip shouldn't provide any > wrapper at all, and *only* offer "python -m pip" as a means of > invoking it (precisely because it's so closely tied to the Python > interpreter used to invoke it). But that's never going to happen and I > don't intend it as a serious proposal. > > Paul > > The amount of grief pip is currently causing is IMHO good reason to prefer incompatible changes that remove breakage to try-n-please-everyone and keep breaking. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Sep 10 00:08:04 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 15:08:04 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: <7DC7EA44-0CD8-4F61-8462-8147B8BB8059@yahoo.com> On Sep 9, 2015, at 13:17, Guido van Rossum wrote: > > Jukka wrote up a proposal for structural subtyping. It's pretty good. Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 Are we going to continue to have (both implicit and explicit) ABCs in collections.abc, numbers, etc., and also have protocols that are also ABCs and are largely parallel to them (and implicit at static checking time whether they're implicit or explicit at runtime) In typing? If so, I think we've reached the point where the two parallel hierarchies are a problem. Also, why are both the terminology and implementation so different from what we already have for ABCs? Why not just have a decorator or metaclass that can be added to ABCs that makes them implicit (rather than writing a manual __subclasshook__ for each one), which also makes them implicit at static type checking time, which means there's no need for a whole separate but similar notion? I'm not sure why it's important to also have some times that are implicit at static type checking time but not at runtime, but if there is a good reason, that just means two different decorators/metaclasses/whatever (or a flag passed to the decorator, etc.). Compare: Hashable is an implicit ABC, Sequence is an explicit ABC, Reversible is an implicit-static/explicit-runtime ABC. Hashable is an implicit ABC and also a Protocol that's an explicit ABC, Sequence is an explicit ABC and not a Protocol, Reversible is a Protocol that's an explicit ABC. The first one is clearly simpler; is there some compelling reason that makes the second one better anyway? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 10 00:19:44 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 17:19:44 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: [Stefan Krah ] > I can't find much at all when searching for "chacha20 equidistribution". > Contrast that with "mersenne twister equidistribution" and it seems that > chacha20 hasn't been studied very much in that respect (except for > the pcg-random site). > > So I also think this should preclude us from replacing the current > random() functions. Well, most arguments about random functions rely on fantasies ;-) For example, yes, the Twister is provably equidistributed to 32 bits across 623 dimensions, but ... does it make a difference to anything? That's across the Twister's _entire period_, which couldn't actually be generated across the age of the universe. What may really matter to an application is whether it will see rough equidistribution across the infinitesimally small slice (of the Twister's full period) it actually generates. And you'll find very little about _that_ (last time I searched, I found nothing). For assurances about that, people rely on test suites developed to test generators. The Twister's provably perfect equidistribution across its whole period also has its scary sides. For example, run random.random() often enough, and it's _guaranteed_ you'll eventually reach a state where the output is exactly 0.0 hundreds of times in a row. That will happen as often as it "should happen" by chance, but that's scant comfort if you happen to hit such a state. Indeed, the Twister was patched relatively early in its life to try to prevent it from _starting_ in such miserable states. Such states are nevertheless still reachable from every starting state. But few people know any of that, so they take "equidistribution" as meaning a necessary thing rather than as an absolute guarantee of eventual disaster ;-) What may really matter for most simulations is that the Twister never reaches a state where, in low dimensions, k-tuples fall on "just a few" regularly-spaced hyperplanes forever after. That's a systematic problem with old flavors of linear congruential generators. But that problem is _so_ old that no new generator proposed over the last few decades suffers it either. > Adding an arc4random module with the caveat that its quality will > be as good as the current OpenBSD libcrypto/libressl(?) would be okay. Anyone is welcome to supply such a module today, and distribute it via the usual channels. Python already supplies the platform spelling of `urandom`, and a very capable random.SystemRandom class based on `urandom`, for those needing crypto-strength randomness (relying on what their OS believed that means, and supplied via their `urandom`). Good enough for me. But, no, I'm not a security wonk at heart. From njs at pobox.com Thu Sep 10 00:40:02 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Sep 2015 15:40:02 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On Wed, Sep 9, 2015 at 1:56 AM, Paul Moore wrote: > On 9 September 2015 at 08:33, Ben Finney wrote: >> Contrariwise, I would like to see ?pip? become the canonical invocation >> used in all documentation, discussion, and advice; and if there are any >> technical barriers to that least-surprising method, to see those >> barriers addressed and removed. > > There is at least one fundamental, technical, and (so far) unsolveable > issue with using "pip" as the canonical invocation. > > pip install -U pip > > fails on Windows, because the exe wrapper cannot be replaced by a > process running that wrapper (the "pip" command runs pip.exe which > needs to replace pip.exe, but can't because the OS has it open as the > current running process). > > There have been a number of proposals for fixing this, but none have > been viable so far. We'd need someone to provide working code (not > just suggestions on things that might work, but actual working code) > before we could recommend anything other than "python -m pip install > -U pip" as the correct way of upgrading pip. And recommending one > thing when upgrading pip, but another for "normal use" is also > confusing for beginners. (And we have evidence from the pip issue > tracker people *do* find this confusing, and not just beginners...) At the very least, surely this could be "fixed" by detecting this case and exiting with a message "Sorry, Windows is annoying and this isn't going to work, to upgrade pip please type 'python -m pip ...' instead"? That seems more productive in the short run than trying to get everyone to stop typing "pip" :-). (Though I do agree that having pip as a separate command from python is a big mess -- another case where this comes up is the need for pip versus pip3.) > Apart from that issue, which is Windows only (and thus some people > find it less compelling) we have also had reported issues of people > running pip, and it installs things into the "wrong" Python > installation. This is typically because of PATH configuration issues, > where "pip" is being found via one PATH element, but "python" is found > via a different one. I don't have specifics to hand, so I can't > clarify *how* people have managed to construct such breakage, but I > can state that it happens, and the relevant people are usually very > confused by the results. Again, "python -m pip" avoids any confusion > here - that invocation clearly and unambiguously installs to the > Python installation you invoked. It sounds like this is another place where in the short term, it would help a lot of pip at startup took a peek at $PATH and issued some warnings or errors if it detected the most common types of misconfiguration? (E.g. the first python/python3 in $PATH does not match the one being used to run pip.) -n -- Nathaniel J. Smith -- http://vorpus.org From abarnert at yahoo.com Thu Sep 10 00:39:45 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 15:39:45 -0700 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <55F0AC99.8030408@biologie.uni-freiburg.de> References: <5B23496D-6DBD-49B3-91D7-E093309A84C7@yahoo.com> <55F0AC99.8030408@biologie.uni-freiburg.de> Message-ID: On Sep 9, 2015, at 15:03, Wolfgang Maier wrote: > >> On 09.09.2015 23:28, Andrew Barnert via Python-ideas wrote: >>> On Sep 9, 2015, at 06:41, Wolfgang Maier wrote: >>> >>> 3) >>> Finally, the alternative idea of having the new functionality handled by a new !converter, like: >>> >>> "List: {0!j:,}".format([1.2, 3.4, 5.6]) >>> >>> I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either: >>> >>> - a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or >>> >>> - the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. >>> Please correct me, if I misunderstood something about this alternative proposal. >> >> But the format method already _does_ react directly to the conversion flag. As the docs say, the "type coercion" (call to str, repr, or ascii) happens before formatting, and then the __format__ method is called on the result. A new !j would be a "regular converter"; it just calls a new join function (which returns something whose __format__ method then does the right thing) instead of the str, repr, or ascii functions. > > Ah, I see! Thanks for correcting me here. Somehow, I had the mental picture that the format converters would call the object's __str__ and __repr__ methods directly (and so you'd need an additional __join__ method for the new converter), but that's not the case then. > >> And random's custom converter idea would work similarly, except that presumably his !join would specify a function registered to handle the "join" conversion in some way rather than being hardcoded to a builtin. > > How would such a registration work (sorry, I haven't had the time to search the list for his previous mention of this idea)? A new builtin certainly won't fly. I believe he posted a more detailed version of the idea on one of the other spinoff threads from the f-string thread, but I don't have a link. But there are lots of possibilities, and if you want to start bikeshedding, it doesn't matter that much what his original color was. For example, here's a complete proposal: class MyJoiner: def __init__(self, value): self.value = value def __format__(self, spec): return spec.join(map(str, self.value)) string.register_converter('join', MyJoiner) That last line adds it to some global table (maybe string._converters, or maybe it's not exposed at the Python level at all; whatever). In str.format, instead of reading a single character after a !, it reads until colon or end of field; if that's more than a single character, it looks it up in the global table and calls the registered callable. So, in this case, "{spam!join:-}" would call MyJoiner(spam).__format__('-'). Any more complexity can be added to MyJoiner pretty easily, so this small extension to str.format seems sufficient for anything you might want. For example, if you want a three-part format spec that includes the join string, a format spec to pass to each element, and a format spec to apply to the whole thing: def __format__(self, spec): joinstr, _, spec = spec.partition(':') espec, _, jspec = spec.partition(':') bits = (format(e, espec) for e in self.value) joined = joinstr.join(bits) return format(joined, jspec) Or maybe it would be better to have a standard way to do multi-part format specs--maybe even passing arguments to a converter rather than cramming them in the spec--but this seems simple and flexible enough. It might also be worth having multiple converters called in a chain, but I can't think of a use case for that, so I'll ignore it. Most converters will be classes that just store the constructor argument and use it in __format__, so it seems tedious to repeat that boilerplate for 90% of them, but that's easy to fix with a decorator: def simple_converter(func): class Converter: def __init__(self, value): self.value = value def __format__(self, spec): return func(self.value, spec) Meanwhile, maybe you want the register function to be a decorator: def register_converter(name): def decorator(func): _global_converter_table[name] = func return func return decorator So now, the original example becomes: @string.register_converter('join') @string.simple_converter def my_joiner(values, joinstr): return joinstr.join(map(str, values)) From cannatag at gmail.com Thu Sep 10 00:35:04 2015 From: cannatag at gmail.com (Giovanni Cannata) Date: Thu, 10 Sep 2015 00:35:04 +0200 Subject: [Python-ideas] PyPI search still broken References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: Hi, sorry to bother you again, but the search problem on PyPI is still present after different weeks and it's very annoying. I've just released a new version of my ldap3 project and it doesn't show up when searching with its name. For mine (and I suppose for other emerging project, especially related to Python 3) it's vital to be easily found by other developers that use pip and PyPI as THE only repository for python packages and using the number of download as a ranking of popularity of a project. If search can't be fixed there should be at least a warning on the PyPI homepage to let users know that search is broken and that using Google for searching could help to find more packages. Bye, Giovanni -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Sep 10 00:45:11 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 00:45:11 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> Message-ID: <55F0B677.3090500@mail.de> On 09.09.2015 23:50, Andrew Barnert wrote: > And if not the docs, what else would it mean to "de-emphasize" %-formatting without deprecating or removing it? The docs are most important. Sorry, if that didn't come across clearly. > >> Anyway, Google presented me the version 2.7 of the tutorial. > That's a whole other problem. But nobody is going to retroactively change Python 2.7 just to help people who find the 2.7 docs when they should be looking for 3.5. The Python docs are not Python. So, what's in the way of adding this note to Python 2.7 docs? The pride of the Python core devs? I anticipate better of you. > That might seem reasonable today, when 2.7 could heartily recommend str.format because it's nearly the same in 2.7 as in 3.5, but what about next year, when f-strings are the preferred way to do it in 3.6? If 3.6 de-emphasizes str.format (as a feature used only when you need backward compat and/or dynamic formats) and its tutorial, %-formatting docs, and str.format docs all point to f-strings, having 2.7's docs point people to str.format will be misleading at best for 3.6, but having it recommend something that doesn't exist in 2.7 will be actively wrong for 2.7. str.format teaches people how to use {}. That should be encouraged. Switching from str.format to f-strings is going to work like charm. So, it's the syntax I am concerned with, not how to execute the magic behind. > The solution is to get people to the 3.5 or 3.6 docs in the first place, not to hack up the 2.7 docs. You have absolutely no idea why people use 2.7 over 3.5, right? I promise you that is going to take time. And what could you do in the meantime? Call it hacking; to me it's improving. > >> I still don't understand what's wrong with deprecating %, but okay. > Well, have you read the answers given by Nick, me, and others earlier in the thread? If so, what do you disagree with? All "blockers" I read so far classify as a) personal preference of % over {} or b) fixable. Both classes do not qualify as real blockers; they can be overcome. > You've only addressed one point (that % is faster than {} for simple cases--and your solution is just "make {} faster", which may not be possible given that it's inherently more hookable than % and therefore requires more function calls...). Try harder. (If {} is too slow for you.) I've read Python 3 is significantly slower than Python 2. So what? I can live with that, when we will make the transition. If we recognize the performance penalty, rest assured I come back here to seek your advice but until that it's no reason not to switch to Python 3. Same goes for string formatting. > What about formatting headers for ASCII wire protocols, sharing tables of format strings between programming languages (e.g., for i18n), or any of the other reasons people have brought up? Both fixable in some way or another, the rest classifies as described above. Best, Sven From abarnert at yahoo.com Thu Sep 10 00:48:39 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 15:48:39 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: <039C0DAE-2F13-4A0A-B923-4812CBD4FAFC@yahoo.com> On Sep 9, 2015, at 14:25, Donald Stufft wrote: > > On September 9, 2015 at 5:22:57 PM, Andrew Barnert via Python-ideas (python-ideas at python.org) wrote: >>> Apple's Python doesn't come with pip > > As of the latest Yosemite release, and in El Capitan, it *does* however come with Python 2.7.10 and thus ``python -m ensurepip`` works. Sure, and all the way back to 10.5, you could just "easy_install pip". That never solved the problem of people who've been told to install a second Python 2.7 without any explanation of why or any consideration of what problems that might lead to; in fact, it just means they're even more likely to end up installing two colliding pips. I don't think there's much chance that anything Apple or the Python community does will get people to stop writing blog posts/class notes/installation web pages/SO answers/etc. to do this. There is a chance that proselytizing for virtualenv and/or Python 3 will make the problem irrelevant (and it already seems to be having that effect, just not as fast as would be ideal). Of course there will still be some people who really do need two Python 2.7 installations and aren't expert enough to manage it, and some people who manage to make a mess of things even with separate 2 and 3 or with venvs or with only Apple's Python. But, based on my (admittedly anecdotal) experience and my educated-guess-but-still-a-guess, I think it'll become not much more common than the equivalent problems for Fedora or Ubuntu or FreeBSD, which is a huge improvement. From dw+python-ideas at hmmz.org Thu Sep 10 01:01:30 2015 From: dw+python-ideas at hmmz.org (David Wilson) Date: Wed, 9 Sep 2015 23:01:30 +0000 Subject: [Python-ideas] PyPI search still broken In-Reply-To: References: <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: <20150909230130.GA14415@k3> Hi there, My 2.5 year old offer to retrofit the old codebase with a new search system still stands[1]. :) There is no reason for this to be a complex affair, the prototype built back then took only a few hours to complete. No doubt the long term answer is probably "Warehouse fixes this", but Warehouse seems no nearer a reality than it did in 2013. David [1] https://groups.google.com/forum/#!search/%22david$20wilson%22$20search$20pypi/pypa-dev/ZjUNkczsKos/2et8926YOQYJ On Thu, Sep 10, 2015 at 12:35:04AM +0200, Giovanni Cannata wrote: > Hi, sorry to bother you again, but the search problem on PyPI is still present > after different weeks and it's very annoying. I've just released a new version > of my ldap3 project and it doesn't show up when searching with its name. For > mine (and I suppose for other emerging project, especially related to Python 3) > it's vital to be easily found by other developers that use pip and PyPI as THE > only repository for python packages and using the number of download as a > ranking of popularity of a project. > > If search can't be fixed there should be at least a warning on the PyPI > homepage to let users know that search is broken and that using Google for > searching could help to find more packages. > > Bye, > Giovanni > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From tritium-list at sdamon.com Thu Sep 10 01:13:19 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Wed, 09 Sep 2015 19:13:19 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: Message-ID: <55F0BD0F.10508@sdamon.com> In a word - No. There is zero reason for people doing crypto to use the random module, therefor we should not change the random module to be cryptographically secure. Don't break things and slow my code down by default for dubious reasons, please. On 9/9/2015 12:35, Guido van Rossum wrote: > I've received several long emails from Theo de Raadt (OpenBSD founder) > about Python's default random number generator. This is the random > module, and it defaults to a Mersenne Twister (MT) seeded by 2500 > bytes of entropy taken from os.urandom(). > > Theo's worry is that while the starting seed is fine, MT is not good > when random numbers are used for crypto and other security purposes. > I've countered that it's not meant for that (you should use > random.SystemRandom() or os.urandom() for that) but he counters that > people don't necessarily know that and are using the default > random.random() setup for security purposes without realizing how > wrong that is. > > There is already a warning in the docs for the random module that it's > not suitable for security, but -- as the meme goes -- nobody reads the > docs. > > Theo then went into technicalities that went straight over my head, > concluding with a strongly worded recommendation of the OpenBSD > version of arc4random() (which IIUC is based on something called > "chacha", not on "RC4" despite that being in the name). He says it is > very fast (but I don't know what that means). > > I've invited Theo to join this list but he's too busy. The two core > Python experts on the random module have given me opinions suggesting > that there's not much wrong with MT, so here I am. Who is right? What > should we do? Is there anything we need to do? > > -- > --Guido van Rossum (python.org/~guido ) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Sep 10 01:14:17 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 16:14:17 -0700 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55F0B677.3090500@mail.de> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> <55F0B677.3090500@mail.de> Message-ID: <2D5621A7-0676-489D-886E-76E7D953870D@yahoo.com> On Sep 9, 2015, at 15:45, Sven R. Kunze wrote: > >> On 09.09.2015 23:50, Andrew Barnert wrote: >> And if not the docs, what else would it mean to "de-emphasize" %-formatting without deprecating or removing it? > > The docs are most important. Sorry, if that didn't come across clearly. No problem. >>> Anyway, Google presented me the version 2.7 of the tutorial. >> That's a whole other problem. But nobody is going to retroactively change Python 2.7 just to help people who find the 2.7 docs when they should be looking for 3.5. > > The Python docs are not Python. So, what's in the way of adding this note to Python 2.7 docs? The pride of the Python core devs? I anticipate better of you. First, the Python docs are part of Python. They're owned by the same foundation, managed by the same team, and updated with a similar process. Second, I'm not a core dev, and since the Python docs are maintained by the core devs, that stands in the way of me personally making that change. :) Of course I can easily file a docs bug, with a patch, and possibly start a discussion on the relevant list to get wider discussion. But you can do that as easily as I can, and I don't know why you should anticipate better of me than you do of yourself. (If you don't feel capable of writing the change, because you're not a native speaker or your tech writing skills aren't as good as your coding skills or whatever, I won't argue that your English seems good enough to me; just write a "draft" patch and then ask for people to improve it. There are docs changes that have been done this way in the past, and I think there are more than enough people who'd be happy to help.) >> The solution is to get people to the 3.5 or 3.6 docs in the first place, not to hack up the 2.7 docs. > > You have absolutely no idea why people use 2.7 over 3.5, right? I promise you that is going to take time. Of course I don't know why _every_ person still using 2.7 is doing so. For myself personally, off the top of my head, recent reasons have included: maintaining an existing, working app that wouldn't gain any benefit from upgrading it; writing a simple script to be deployed on servers that have 2.7 pre-installed; and writing portable libraries to share on PyPI that work on both 2.7 and 3.3+ to make them useful to as many devs as possible. I know other people do it for similarly good reasons, or different ones (e.g., depending on some library that hasn't been ported yet), and also for bad reasons (related to outdated teaching materials or FUD or depending on some library that has been ported but they checked a 6-year-old blog instead of current information). I know that we're still a few years away from the end of the initial transition period, so none of this surprises me much. But how is any of that, or any additional factors I don't know about, relevant to the fact that using the 2.7 docs (and especially the tutorial) when coding for 3.5 is a bad idea, and a problem to be fixed? How does any of it mean that making the 2.7 docs apply better to 3.5 but worse to 2.7 is a solution? >>> I still don't understand what's wrong with deprecating %, but okay. >> Well, have you read the answers given by Nick, me, and others earlier in the thread? If so, what do you disagree with? > > All "blockers" I read so far classify as a) personal preference of % over {} or b) fixable. Both classes do not qualify as real blockers; they can be overcome. > >> You've only addressed one point (that % is faster than {} for simple cases--and your solution is just "make {} faster", which may not be possible given that it's inherently more hookable than % and therefore requires more function calls...). > > Try harder. (If {} is too slow for you.) > > I've read Python 3 is significantly slower than Python 2. So what? I can live with that, when we will make the transition. If we recognize the performance penalty, rest assured I come back here to seek your advice but until that it's no reason not to switch to Python 3. Same goes for string formatting. > >> What about formatting headers for ASCII wire protocols, sharing tables of format strings between programming languages (e.g., for i18n), or any of the other reasons people have brought up? > Both fixable in some way or another, the rest classifies as described above. Just saying "I want % deprecated, and I declare that all of the apparent blocking problems are solvable, and therefore I demand that someone else solve them and then deprecate %" is not very useful. If you think that's the way Python should go, come up with solutions for all of the ones that need to be fixed (and file bugs and ideally patches), and good arguments to dismiss the ones you don't think need to be fixed. Then you can argue that the only remaining reason not to deprecate % is backward compatibility, which isn't compelling enough, and that may well convince everyone. From njs at pobox.com Thu Sep 10 01:15:31 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Sep 2015 16:15:31 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: On Wed, Sep 9, 2015 at 3:19 PM, Tim Peters wrote: > Well, most arguments about random functions rely on fantasies ;-) For > example, yes, the Twister is provably equidistributed to 32 bits > across 623 dimensions, but ... does it make a difference to anything? > That's across the Twister's _entire period_, which couldn't actually > be generated across the age of the universe. > > What may really matter to an application is whether it will see rough > equidistribution across the infinitesimally small slice (of the > Twister's full period) it actually generates. And you'll find very > little about _that_ (last time I searched, I found nothing). For > assurances about that, people rely on test suites developed to test > generators. Yeah, equidistribution is not a guarantee of anything on its own. For example, an integer counter modulo 2**(623*32) is equidistributed to 32 bits across 623 dimensions, just like the Mersenne Twister. Mostly people talk about equidistribution because for a deterministic RNG, (a) being non-equidistributed would be bad, (b) equidistribution is something you can reasonably hope to prove for simple non-cryptographic generators, and mathematicians like writing proofs. OTOH equidistribution is not even well-defined for the OpenBSD "arc4random" generator, because it is genuinely non-deterministic -- it regularly mixes new entropy into its state as it goes -- and equidistribution by definition requires determinism. So it "fails" this test of "randomness" because it is too random... In practice, the chances that your Monte Carlo simulation is going to give bad results because of systematic biases in arc4random are much, much lower than the chances that it will give bad results because of subtle hardware failures that corrupt your simulation. And hey, if arc4random *does* mess up your simulation, then congratulations, your simulation is publishable as a cryptographic attack and will probably get written up in the NYTimes :-). The real reasons to prefer non-cryptographic RNGs are the auxiliary features like determinism, speed, jumpahead, multi-thread friendliness, etc. But the stdlib random module doesn't really provide any of these (except determinism in strictly limited cases), so I'm not sure it matters much. > The Twister's provably perfect equidistribution across its whole > period also has its scary sides. For example, run random.random() > often enough, and it's _guaranteed_ you'll eventually reach a state > where the output is exactly 0.0 hundreds of times in a row. That will > happen as often as it "should happen" by chance, but that's scant > comfort if you happen to hit such a state. Indeed, the Twister was > patched relatively early in its life to try to prevent it from > _starting_ in such miserable states. Such states are nevertheless > still reachable from every starting state. This criticism seems a bit unfair though -- even a true stream of random bits (e.g. from a true unbiased quantum source) has this property, and trying to avoid this happening would introduce bias that really could cause problems in practice. A good probabilistic program is one that has a high probability of returning some useful result, but they always have some low probability of returning something weird. So this is just saying that most people don't understand probability. Which is true, but there isn't much that the random module can do about it :-) -n -- Nathaniel J. Smith -- http://vorpus.org From random832 at fastmail.us Thu Sep 10 01:24:55 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 09 Sep 2015 19:24:55 -0400 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: References: <5B23496D-6DBD-49B3-91D7-E093309A84C7@yahoo.com> <55F0AC99.8030408@biologie.uni-freiburg.de> Message-ID: <1441841095.2587236.379354345.340FCF95@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 18:39, Andrew Barnert via Python-ideas wrote: > I believe he posted a more detailed version of the idea on one of the > other spinoff threads from the f-string thread, but I don't have a link. > But there are lots of possibilities, and if you want to start > bikeshedding, it doesn't matter that much what his original color was. > For example, here's a complete proposal: > > class MyJoiner: > def __init__(self, value): > self.value = value > def __format__(self, spec): > return spec.join(map(str, self.value)) > string.register_converter('join', MyJoiner) Er, I wanted it to be something more like def __format__(self, spec): sep, fmt = # 'somehow' break up spec into two parts return sep.join(map(lambda x: x.__format__(fmt))) And I wasn't the one who actually proposed user-registered converters; I'm not sure who did. At one point in the f-string thread I suggested using a _different_ special !word for stuff like a string that can be inserted into HTML without quoting. I'm also not 100% sure how good an idea it is (since it means either using global state or moving formatting to a class instead of str). The Joiner class wouldn't have to exist as a builtin, it could be private to the format function. From rob.cliffe at btinternet.com Thu Sep 10 01:32:35 2015 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Thu, 10 Sep 2015 00:32:35 +0100 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55F058B6.9000202@mail.de> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> Message-ID: <55F0C193.6000606@btinternet.com> I use %-formatting. Not because I think it's so wonderful and solves all problems (although it's pretty good), but because it appeared to be the recommended method at the time I learned Python in earnest. If I were only learning Python now, I would probably learn str.format or whatever it is. I *could* learn to use something else *and* change all my working code, but do you really want to force me to do that? I would guess that there are quite a lot of Python users in the same position. Rob Cliffe On 09/09/2015 17:05, Sven R. Kunze wrote: > On 09.09.2015 02:09, Andrew Barnert via Python-ideas wrote: >> I think it's already been established why % formatting is not going >> away any time soon. >> >> As for de-emphasizing it, I think that's already done pretty well in >> the current docs. The tutorial has a nice long introduction to >> str.format, a one-paragraph section on "old string formatting" with a >> single %5.3f example, and a one-sentence mention of Template. The >> stdtypes chapter in the library reference explains the difference >> between the two in a way that makes format sound more attractive for >> novices, and then has details on each one as appropriate. What else >> should be done? > > I had difficulties to find what you mean by tutorial. But hey, being a > Python user for years and not knowing where the official tutorial > resides... > > Anyway, Google presented me the version 2.7 of the tutorial. Thus, the > link to the stdtypes documentation does not exhibit the note of, say, > 3.5: > > "Note: The formatting operations described here exhibit a variety of > quirks that lead to a number of common errors (such as failing to > display tuples and dictionaries correctly). Using the newer > str.format() interface helps avoid these errors, and also provides a > generally more powerful, flexible and extensible approach to > formatting text." > > So, adding it to the 2.7 docs would be a start. > > > I still don't understand what's wrong with deprecating %, but okay. I > think f-strings will push {} to wide-range adoption. > > > Best, > Sven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2014.0.4830 / Virus Database: 4365/10609 - Release Date: > 09/09/15 > > From donald at stufft.io Thu Sep 10 02:01:16 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 9 Sep 2015 20:01:16 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux Message-ID: Ok, I reached out to Theo de Raadt to talk to him about what he was suggesting without Guido having to play messenger and forward fragments of the email conversation. I'm starting a new thread because this email is rather long, and I'm hoping to divorce it a bit from the back and forth about a proposal that wasn't exactly what Theo was suggesting that is being discussed in the other thread. Essentially, there are three basic types of uses of random (the concept, not the module). Those are: 1. People/usecases who absolutely need deterministic output given a seed and ? ?for whom security properties don't matter. 2. People/usecases who absolutely need a cryptographically random output and ? ?for whom having a deterministic output is a downside. 3. People/usecases that fall somewhere in between where it may or may not be ? ?security sensitive or it may not be known if it's security sensitive. The people in group #1 are currently, in the Python standard library, best served using the MT random source as it provides exactly the kind of determinsm they need. The people in group #2 are currently, in the Python standard library, best served using os.urandom (either directly or via random.SystemRandom). However, the third case is the one that Theo's suggestion is attempting to solve. In the current landscape, the security minded folks will tell these people to use os.urandom/random.SystemRandom and the performance or otherwise less security minded folks will likely tell them to just use random.py. Leaving these people with a random that is not cryptographically safe. The questin then is, does it matter if #3 are using a cryptographically safe source of randomness? The answer is obviously that we don't know, and it's possible that the user doesn't know. In these cases it's typically best if we default to the more secure option and expect people to opt in to insecurity. In the case of randomness, a lot of languages (Python included) don't do that and instead they opt to pick the more peformant option first, often with the argument (as seen in the other thread) that if people need a cryptographically secure source of random, they'll know how to look for it and if they don't know how to look for it, then it's likely they'll have some other security problem. I think (and I believe Theo thinks) this sort of thinking is short sighted. Let's take an example of a web application, it's going to need session identifiers to put into a cookie, you'll want these to be random and it's not obvious on the tin for a non-expert that you can't just use the module level functions in the random module to do this. Another examples are generating API keys or a password. Looking on google, the first result for "python random password" is StackOverflow which suggests: ? ? ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N)) However, it was later edited to, after that, include: ? ? ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(N)) So it wasn't obvious to the person who answered that question that the random module's module scoped functions were not appropiate for this use. It appears that the original answer lasted for roughly 4 years before it was corrected, so who knows how many people used that in those 4 years. The second result has someone asking if there is a better way to generate a random password in Python than: ? ? import os, random, string ? ? length = 13 ? ? chars = string.ascii_letters + string.digits + '!@#$%^&*()' ? ? random.seed = (os.urandom(1024)) ? ? print ''.join(random.choice(chars) for i in range(length)) This person obviously knew that os.urandom existed and that he should use it, but failed to correctly identify that the random module's module scoped functions were not what he wanted to use here. The third result has this code: ? ? import string ? ? import random ? ? def randompassword(): ? ? ? ? chars=string.ascii_uppercase + string.ascii_lowercase + string.digits ? ? ? ? size=8? ? ? ? ? return ''.join(random.choice(chars) for x in range(size,12)) I'm not going to keep pasting snippets, but going through the results it is clear that in the bulk of cases, this search turns up code snippets that suggest there is likely to be a lot of code out there that is unknownly using the random module in a very insecure way. I think this is a failing of the random.py module to provide an API that guides users to be safe which was attempted to be papered over by adding a warning to the documentation, however like has been said before, you can't solve a UX problem with documentation. Then we come to why might we want to not provide a safe random by default for the folks in the #3 group. As we've seen in the other thread, this basically boils down to the fact that for a lot of users they don't care about the security properties and they just want a fast random-esque value. This particular case is made stronger by the fact that there is a lot of code out there using Python's random module in a completely safe way that would regress in a meaningful way if the random module slowed down. The fact that speed is the primary reason not to give people in #3 a cryptographically secure source of random by default is where we come back to the meat of Theo's suggestion. His claim is that invoking os.urandom through any of the interfaces imposes a performance penalty because it has to round trip through the kernel crypto sub system for every request. His suggestion is essentially that we provide an interface to a modern, good, userland? cryptographically secure source of random that is running within the same process as Python itself. One such example of this is the arc4random function (which doesn't actually provide ARC4 on OpenBSD, it provides ChaCha, it's not tied to one specific algorithm) which comes from libc on many platforms. According to Theo, modern userland CSPRNGs can create random bytes faster than memcpy which eliminates the argument of speed for why a CSPRNG shouldn't be the "default" source of randomness. Thus the proposal is essentially: * Provide an API to access a modern userland CSPRNG. * Provide an implementation of random.SomeKindOfRandom that utilizes this. * Move the MT based implementation of the random module to ? random.DeterministicRandom. * Deprecate the module scoped functions, instructing people to use the new ? random.SomeKindofRandom unless they need deterministic random, in which case ? use random.DeterministicRandom. This can of course be tweaked one way or the other, but that's the general idea translated into something actionable for Python. I'm not sure exactly how I feel about it, but I certainly do think that the current situation is confusing to end users and leaving them in an insecure state, and that a minimum we should move MT to something like random.DeterministicRandom and deprecate the module scoped functions because it seems obvious to me that the idea of a "default" random function that isn't safe is a footgun for users. As an additional consideration, there are security experts who believe that userland CSPRNGs should not be used at all. One of those is Thomas Ptacek who wrote a blog post [1] on the subject. In this, Thomas makes the case that a userland CSPRNG pretty much always depends on the cryptographic security of the system random, but that it itself may be broken which means you're adding a second, single point of failure where a mistake can cause you to get non-random data out of the system. I had asked Theo about this, and he stated that he disagreed with Thomas about never using a userland CSPRNG and in his opinion that blog post was mostly warning people away from using something like MT in the userland and away from /dev/random (which is often the cause of people reaching for MT because /dev/random blocks which makes programs even slower). It seems to boil down to, do we want to try to protect users by default or at least make it more obvious in the API which one they want to use (I think yes), and if so do we think that /dev/urandom is "fast enough" for most people in group #3 and if not, do we agree with Theo that a modern userland CSPRNG is safe enough to use, or do we agree with Thomas that it's not and if we think that it is, do we use arc4random and what do we do on systems that don't have a modern userland CSPRNG in their libc. [1] http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From abarnert at yahoo.com Thu Sep 10 02:03:17 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 17:03:17 -0700 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <1441841095.2587236.379354345.340FCF95@webmail.messagingengine.com> References: <5B23496D-6DBD-49B3-91D7-E093309A84C7@yahoo.com> <55F0AC99.8030408@biologie.uni-freiburg.de> <1441841095.2587236.379354345.340FCF95@webmail.messagingengine.com> Message-ID: <461D4C7C-6C32-480D-B065-295A623E11D7@yahoo.com> On Sep 9, 2015, at 16:24, random832 at fastmail.us wrote: > >> On Wed, Sep 9, 2015, at 18:39, Andrew Barnert via Python-ideas wrote: >> I believe he posted a more detailed version of the idea on one of the >> other spinoff threads from the f-string thread, but I don't have a link. >> But there are lots of possibilities, and if you want to start >> bikeshedding, it doesn't matter that much what his original color was. >> For example, here's a complete proposal: >> >> class MyJoiner: >> def __init__(self, value): >> self.value = value >> def __format__(self, spec): >> return spec.join(map(str, self.value)) >> string.register_converter('join', MyJoiner) > > Er, I wanted it to be something more like > > def __format__(self, spec): > sep, fmt = # 'somehow' break up spec into two parts I covered later in the same message how this simple version could be extended to a smarter version that does that, or even more, without requiring any further changes to str.format. I just wanted to show the simplest version first, and then show that designing for that doesn't lose any flexibility. > And I wasn't the one who actually proposed user-registered converters; > I'm not sure who did. Well, that does make it a bit harder to search for... But anyway, I think the idea is obvious enough once someone's mentioned it that it only matters if everyone decides we should do it, when we want to figure out who to give the credit to. > At one point in the f-string thread I suggested > using a _different_ special !word for stuff like a string that can be > inserted into HTML without quoting. I'm also not 100% sure how good an > idea it is (since it means either using global state or moving > formatting to a class instead of str). I don't see why global state is more of a problem here than for any other global registry (sys.modules, pickle/copy, ABCs, registries, etc.). In fact, it seems less likely that, e.g., a multithreaded app container would run into problems with this than with most of those other things, not more likely. And the same ideas for solving those problems (subinterpreters, better IPC so multithreaded app containers aren't necessary, easier switchable contexts, whatever) seem like they'd solve this one just as easily. And meanwhile, the alternative seems to be having something similar, but not exposing it publicly, and just baking in a handful of hardcoded converters for join, html, re-escape, etc., and I don't see why str should know about all of those things, or why extending that set when we realize that we forgot about shlex should require a patch to str and a new Python version. > The Joiner class wouldn't have to exist as a builtin, it could be > private to the format function. If it's custom-registerable, it can be on PyPI, or in the middle of your app, although of course there could be some converters, maybe including your Joiner, somewhere in the stdlib, or even private to format, as well. From abarnert at yahoo.com Thu Sep 10 02:19:20 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 17:19:20 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> Deprecating the module-level functions has one problem for backward compatibility: if you're using random across multiple modules, changing them all from this: import random ... to this: from random import DeterministicRandom random = DeterministicRandom() ... gives a separate MT for each module. You can work around that by, e.g., providing your own myrandom.py that does that and then using "from myrandom import random" everywhere, or by stashing a random_inst inside the random module or builtins or something and only creating it if it doesn't exist, etc., but all of these are things that people will rightly complain about. One possible solution is to make DeterministicRandom a module instead of a class, and move all the module-level functions there, so people can just change their import to "from random import DeterministicRandom as random". (Or, alternatively, give it classmethods that create a singleton just like the module global.) For people who decide they want to switch to SystemRandom, I don't think it's as much of a problem, as they probably won't care that they have a separate instance in each module. (And I don't think there's any security problem with using multiple instances, but I haven't thought it through...) So, the change is probably only needed in DeterministicRandom. There are hopefully better solutions than that. But I think some solution is needed. People who have existing code (or textbooks, etc.) that do things the "wrong" way and get a DeprecationWarning should be able to easily figure out how to make their code correct. Sent from my iPhone > On Sep 9, 2015, at 17:01, Donald Stufft wrote: > > Ok, I reached out to Theo de Raadt to talk to him about what he was suggesting > without Guido having to play messenger and forward fragments of the email > conversation. I'm starting a new thread because this email is rather long, and > I'm hoping to divorce it a bit from the back and forth about a proposal that > wasn't exactly what Theo was suggesting that is being discussed in the other > thread. > > Essentially, there are three basic types of uses of random (the concept, not > the module). Those are: > > 1. People/usecases who absolutely need deterministic output given a seed and > for whom security properties don't matter. > 2. People/usecases who absolutely need a cryptographically random output and > for whom having a deterministic output is a downside. > 3. People/usecases that fall somewhere in between where it may or may not be > security sensitive or it may not be known if it's security sensitive. > > The people in group #1 are currently, in the Python standard library, best > served using the MT random source as it provides exactly the kind of determinsm > they need. The people in group #2 are currently, in the Python standard > library, best served using os.urandom (either directly or via > random.SystemRandom). > > However, the third case is the one that Theo's suggestion is attempting to > solve. In the current landscape, the security minded folks will tell these > people to use os.urandom/random.SystemRandom and the performance or otherwise > less security minded folks will likely tell them to just use random.py. Leaving > these people with a random that is not cryptographically safe. > > The questin then is, does it matter if #3 are using a cryptographically safe > source of randomness? The answer is obviously that we don't know, and it's > possible that the user doesn't know. In these cases it's typically best if we > default to the more secure option and expect people to opt in to insecurity. > > In the case of randomness, a lot of languages (Python included) don't do that > and instead they opt to pick the more peformant option first, often with the > argument (as seen in the other thread) that if people need a cryptographically > secure source of random, they'll know how to look for it and if they don't > know how to look for it, then it's likely they'll have some other security > problem. I think (and I believe Theo thinks) this sort of thinking is short > sighted. Let's take an example of a web application, it's going to need session > identifiers to put into a cookie, you'll want these to be random and it's not > obvious on the tin for a non-expert that you can't just use the module level > functions in the random module to do this. Another examples are generating API > keys or a password. > > Looking on google, the first result for "python random password" is > StackOverflow which suggests: > > ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N)) > > However, it was later edited to, after that, include: > > ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(N)) > > So it wasn't obvious to the person who answered that question that the random > module's module scoped functions were not appropiate for this use. It appears > that the original answer lasted for roughly 4 years before it was corrected, > so who knows how many people used that in those 4 years. > > The second result has someone asking if there is a better way to generate a > random password in Python than: > > import os, random, string > > length = 13 > chars = string.ascii_letters + string.digits + '!@#$%^&*()' > random.seed = (os.urandom(1024)) > > print ''.join(random.choice(chars) for i in range(length)) > > This person obviously knew that os.urandom existed and that he should use it, > but failed to correctly identify that the random module's module scoped > functions were not what he wanted to use here. > > The third result has this code: > > import string > import random > > def randompassword(): > chars=string.ascii_uppercase + string.ascii_lowercase + string.digits > size=8 > return ''.join(random.choice(chars) for x in range(size,12)) > > I'm not going to keep pasting snippets, but going through the results it is > clear that in the bulk of cases, this search turns up code snippets that > suggest there is likely to be a lot of code out there that is unknownly using > the random module in a very insecure way. I think this is a failing of the > random.py module to provide an API that guides users to be safe which was > attempted to be papered over by adding a warning to the documentation, however > like has been said before, you can't solve a UX problem with documentation. > > Then we come to why might we want to not provide a safe random by default for > the folks in the #3 group. As we've seen in the other thread, this basically > boils down to the fact that for a lot of users they don't care about the > security properties and they just want a fast random-esque value. This > particular case is made stronger by the fact that there is a lot of code out > there using Python's random module in a completely safe way that would regress > in a meaningful way if the random module slowed down. > > The fact that speed is the primary reason not to give people in #3 a > cryptographically secure source of random by default is where we come back to > the meat of Theo's suggestion. His claim is that invoking os.urandom through > any of the interfaces imposes a performance penalty because it has to round > trip through the kernel crypto sub system for every request. His suggestion is > essentially that we provide an interface to a modern, good, userland > cryptographically secure source of random that is running within the same > process as Python itself. One such example of this is the arc4random function > (which doesn't actually provide ARC4 on OpenBSD, it provides ChaCha, it's not > tied to one specific algorithm) which comes from libc on many platforms. > According to Theo, modern userland CSPRNGs can create random bytes faster than > memcpy which eliminates the argument of speed for why a CSPRNG shouldn't be > the "default" source of randomness. > > Thus the proposal is essentially: > > * Provide an API to access a modern userland CSPRNG. > * Provide an implementation of random.SomeKindOfRandom that utilizes this. > * Move the MT based implementation of the random module to > random.DeterministicRandom. > * Deprecate the module scoped functions, instructing people to use the new > random.SomeKindofRandom unless they need deterministic random, in which case > use random.DeterministicRandom. > > This can of course be tweaked one way or the other, but that's the general idea > translated into something actionable for Python. I'm not sure exactly how I > feel about it, but I certainly do think that the current situation is confusing > to end users and leaving them in an insecure state, and that a minimum we > should move MT to something like random.DeterministicRandom and deprecate the > module scoped functions because it seems obvious to me that the idea of a > "default" random function that isn't safe is a footgun for users. > > As an additional consideration, there are security experts who believe that > userland CSPRNGs should not be used at all. One of those is Thomas Ptacek who > wrote a blog post [1] on the subject. In this, Thomas makes the case that a > userland CSPRNG pretty much always depends on the cryptographic security of > the system random, but that it itself may be broken which means you're adding > a second, single point of failure where a mistake can cause you to get > non-random data out of the system. I had asked Theo about this, and he stated > that he disagreed with Thomas about never using a userland CSPRNG and in his > opinion that blog post was mostly warning people away from using something like > MT in the userland and away from /dev/random (which is often the cause of > people reaching for MT because /dev/random blocks which makes programs even > slower). > > It seems to boil down to, do we want to try to protect users by default or at > least make it more obvious in the API which one they want to use (I think yes), > and if so do we think that /dev/urandom is "fast enough" for most people in > group #3 and if not, do we agree with Theo that a modern userland CSPRNG is > safe enough to use, or do we agree with Thomas that it's not and if we think > that it is, do we use arc4random and what do we do on systems that don't have > a modern userland CSPRNG in their libc. > > [1] http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From random832 at fastmail.com Thu Sep 10 03:25:34 2015 From: random832 at fastmail.com (Random832) Date: Wed, 09 Sep 2015 21:25:34 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux References: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> Message-ID: Andrew Barnert via Python-ideas writes: > You can work around that by, > e.g., providing your own myrandom.py that does that and then using > "from myrandom import random" everywhere, or by stashing a random_inst > inside the random module or builtins or something and only creating it > if it doesn't exist, etc., but all of these are things that people > will rightly complain about. Of course, this brings to mind the fact that there's *already* an instance stashed inside the random module. At that point, you might as well just keep the module-level functions, and rewrite them to be able to pick up on it if you replace _inst (perhaps suitably renamed as it would be a public variable) with an instance of a different class. Proof-of-concept implementation: class _method: def __init__(self, name): self.__name__ = name def __call__(self, *args, **kwargs): return getattr(_inst, self.__name__)(*args, **kwargs) def __repr__(self): return "" _inst = Random() seed = _method('seed') random = _method('random') ...etc... From steve at pearwood.info Thu Sep 10 03:27:07 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 10 Sep 2015 11:27:07 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <1441829361.2883366.379212985.164412ED@webmail.messagingengine.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <1441829361.2883366.379212985.164412ED@webmail.messagingengine.com> Message-ID: <20150910012707.GN19373@ando.pearwood.info> On Wed, Sep 09, 2015 at 04:09:21PM -0400, random832 at fastmail.us wrote: > On Wed, Sep 9, 2015, at 15:07, Steven D'Aprano wrote: > > Not really. Look at the subject line. It doesn't say "should we change > > from MT to arc4random?", it asks if the default random number generator > > should be secure. The only reason we are considering the change from MT > > to arc4random is to make the PRNG cryptographically secure. "Secure" is > > a moving target, what is secure today will not be secure tomorrow. > > Right, but we are discussing making it secure today. No, *you* are discussing making it secure today. The rest of us are discussing making it secure for all time. -- Steve From donald at stufft.io Thu Sep 10 03:30:16 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 9 Sep 2015 21:30:16 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 9, 2015 at 8:01:17 PM, Donald Stufft (donald at stufft.io) wrote: > > It seems to boil down to, do we want to try to protect users by default or at > least make it more obvious in the API which one they want to use (I think yes), > and if so do we think that /dev/urandom is "fast enough" for most people in > group #3 and if not, do we agree with Theo that a modern userland CSPRNG is > safe enough to use, or do we agree with Thomas that it's not and if we think > that it is, do we use arc4random and what do we do on systems that don't have > a modern userland CSPRNG in their libc. > > Ok, I've talked to an honest to god cryptographer as well as some other smart folks! Here's the general gist: Using a userland CSPRNG like arc4random is not advisable for things that you absolutely need cryptographic security for (this is group #2 from my original email). These people should use os.urandom or random.SystemRandom as they should be doing now. In addition os.urandom or random.SystemRandom is probably fast enough for most use cases of the random.py module, however it is true that using os.urandom/random.SystemRandom would be slower than MT. It is reasonable to use a userland CSPRNG as a "default" source of randomness or in cases where people care about speed but maybe not about security and don't need determinism. However, they've said that the primary benefit in using a userland CSPRNG for a faster cryptographically secure source of randomness is if we can make it the? default source of randomness for a "probably safe depending on your app" safety net for people who didn't read or understand the documentation. This would make most uses of random.random and friends secure but not deterministic. If we're unwilling to change the default, but we are willing to deprecate the module scoped functions and force users to make a choice between random.SystemRandom and random.DeterministicRandom then there is unlikely to be much benefit to also adding a userland CSPRNG into the mix since there's no class of people who are using an ambiguous "random" that we don't know if they need it to be secure or deterministic/fast. So I guess my suggestion would be, let's deprecate the module scope functions and rename random.Random to random.DeterministicRandom. This absolves us of needing to change the behavior of people's existing code (besides deprecating it) and we don't need to decide if a userland CSPRNG is safe or not while still moving us to a situation that is far more likely to have users doing the right thing. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From brenbarn at brenbarn.net Thu Sep 10 03:50:42 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Wed, 09 Sep 2015 18:50:42 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: <55F0E1F2.6040709@brenbarn.net> On 2015-09-09 13:17, Guido van Rossum wrote: > Jukka wrote up a proposal for structural subtyping. It's pretty good. > Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 I'm not totally hip to all the latest typing developments, but I'm not sure I fully understand the benefit of this protocol concept. At the beginning it says that classes have to be explicitly marked to support these protocols. But why is that? Doesn't the existing __subclasshook__ already allow an ABC to use any criteria it likes to determine if a given class is considered a subclass? So couldn't ABCs like the ones we already have inspect the type annotations and decide a class "counts" as an iterable (or whatever) if it defines the right methods with the right type hints? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From abarnert at yahoo.com Thu Sep 10 03:50:43 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 18:50:43 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> Message-ID: On Sep 9, 2015, at 18:25, Random832 wrote: > > Andrew Barnert via Python-ideas > writes: > >> You can work around that by, >> e.g., providing your own myrandom.py that does that and then using >> "from myrandom import random" everywhere, or by stashing a random_inst >> inside the random module or builtins or something and only creating it >> if it doesn't exist, etc., but all of these are things that people >> will rightly complain about. > > Of course, this brings to mind the fact that there's *already* an > instance stashed inside the random module. > > At that point, you might as well just keep the module-level functions, > and rewrite them to be able to pick up on it if you replace _inst > (perhaps suitably renamed as it would be a public variable) with an > instance of a different class. The whole point is to make people using the top-level functions see a DeprecationWarning that leads them to make a choice between SystemRandom and DeterministicRandom. Just making inst public (and dynamically switchable) doesn't do that, so it doesn't solve anything. However, it seems like there's a way to extend it to do that: First, rename Random to DeterministicRandom. Then, add a subclass called Random that raises a DeprecationWarning whenever its methods are called. Then preinitialize inst to Random(), just as we already to. Existing code will work, but with a warning. And the text of that warning or the help it leads to or the obvious google result or whatever can just suggest "add random.inst = random.DeterministicRandom() or random.inst = random.SystemRandom() at the start of your program". That has most of the benefit of deprecating the top-level functions, without the cost of the solution being non-obvious (and the most obvious solution being wrong for some use cases). Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst. From steve at pearwood.info Thu Sep 10 03:55:05 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 10 Sep 2015 11:55:05 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: <20150910015505.GO19373@ando.pearwood.info> On Wed, Sep 09, 2015 at 04:15:31PM -0700, Nathaniel Smith wrote: > The real reasons to prefer non-cryptographic RNGs are the auxiliary > features like determinism, speed, jumpahead, multi-thread > friendliness, etc. But the stdlib random module doesn't really provide > any of these (except determinism in strictly limited cases), so I'm > not sure it matters much. The default MT is certainly deterministic, and although only the output of random() itself is guaranteed to be reproducible, the other methods are *usually* stable in practice. There's a jumpahead method too, and for use with multiple threads, you can (and should) create your own instances that don't share state. I call that "multi-thread friendliness" :-) I think Paul Moore's position is a good one. Anyone writing crypto code without reading the docs and understanding what they are doing are surely making more mistakes than just using the wrong PRNG. There may be a good argument for adding arc4random support to the stdlib, but making it the default (with the disadvantages discussed, breaking backwards compatibility, surprising non-crypto users, etc.) won't fix the broken crypto code. It will just give people a false sense of security and encourage them to ignore the docs and write broken crypto code. -- Steve From tim.peters at gmail.com Thu Sep 10 03:55:06 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 20:55:06 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F0BD0F.10508@sdamon.com> References: <55F0BD0F.10508@sdamon.com> Message-ID: [Alexander Walters ] > In a word - No. > > There is zero reason for people doing crypto to use the random module, > therefor we should not change the random module to be cryptographically > secure. > > Don't break things and slow my code down by default for dubious reasons, > please. Would your answer change if a crypto generator were _faster_ than MT? MT isn't speedy by modern standards, and is cache-hostile (about 2500 bytes of mutable state). Not claiming a crypto hash _would_ be faster. But it is possible. From brenbarn at brenbarn.net Thu Sep 10 04:07:05 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Wed, 09 Sep 2015 19:07:05 -0700 Subject: [Python-ideas] One way to do format and print In-Reply-To: <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> Message-ID: <55F0E5C9.6030509@brenbarn.net> On 2015-09-09 14:50, Andrew Barnert via Python-ideas wrote: > Well, have you read the answers given by Nick, me, and others earlier > in the thread? If so, what do you disagree with? You've only > addressed one point (that % is faster than {} for simple cases--and > your solution is just "make {} faster", which may not be possible > given that it's inherently more hookable than % and therefore > requires more function calls...). What about formatting headers for > ASCII wire protocols, sharing tables of format strings between > programming languages (e.g., for i18n), or any of the other reasons > people have brought up? This getting off on a tangent, but I don't see most of those as super compelling. Any programming language can use whatever formatting scheme it likes. Keeping %-substitutions around helps in sharing format strings only with other languages that use exactly the same formatting style. So it's not like % has any intrinsic gain; it just happens to interoperate with some other particular stuff. That's nice, but I don't think it makes sense to keep things in Python just so it can interoperate in specific ways with specific other languages that use less-readable syntax. To me the main advantage of {} is it's more readable. Readability is relevant in any application. The other things you're mentioning seem to be basically about making certain particular applications easier, and I see that as less important. In other words, if to write a wire protocol or share format strings you have to write your own functions to do stuff in a more roundabout way instead of using a (or the!) built-in formatting mechanism, I'm fine with that if it streamlines the built-in formatting mechanism(s). (The main DISadvantage of {} so far is that its readability is limited because you have to pass in all that stuff with the format call at the end. I think if one of these string-interpolation PEPs settles down and we get something like "I like {this} and {that}" --- where the names are drawn directly from the enclosing scope without having to pass them in --- that will be a huge win over both the existing {} formatting and the % formatting.) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From steve at pearwood.info Thu Sep 10 04:11:08 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 10 Sep 2015 12:11:08 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <55F0BD0F.10508@sdamon.com> Message-ID: <20150910021108.GP19373@ando.pearwood.info> On Wed, Sep 09, 2015 at 08:55:06PM -0500, Tim Peters wrote: > [Alexander Walters ] > > In a word - No. > > > > There is zero reason for people doing crypto to use the random module, > > therefor we should not change the random module to be cryptographically > > secure. > > > > Don't break things and slow my code down by default for dubious reasons, > > please. > > Would your answer change if a crypto generator were _faster_ than MT? > MT isn't speedy by modern standards, and is cache-hostile (about 2500 > bytes of mutable state). > > Not claiming a crypto hash _would_ be faster. But it is possible. If the crypto PRNG were comparable in speed to what we have now (not significantly slower), or faster, *and* gave reproducible results with the same seed, *and* had no known/detectable statistical biases), and we could promise that those properties would continue to hold even when the state of the art changed and we got a new default crypto PRNG, then I'd still be -0.5 on the change due to the "false sense of security" factor. As I've already mentioned in another comment, I'm with Paul Moore -- I think anyone foolish/ignorant/lazy/malicious enough to use the default PRNG for crypto is surely making more than one mistake, and fixing that one thing for them will just give people a false sense of security. -- Steve From tim.peters at gmail.com Thu Sep 10 04:23:23 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 21:23:23 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150910015505.GO19373@ando.pearwood.info> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: [Nathaniel Smith] >> The real reasons to prefer non-cryptographic RNGs are the auxiliary >> features like determinism, speed, jumpahead, multi-thread >> friendliness, etc. But the stdlib random module doesn't really provide >> any of these (except determinism in strictly limited cases), so I'm >> not sure it matters much. [Steven D'Aprano] > The default MT is certainly deterministic, and although only the output > of random() itself is guaranteed to be reproducible, the other methods > are *usually* stable in practice. > > There's a jumpahead method too, Not in Python. There was for the ancient Wichmann-Hill generator, but not for MT. A detailed sketch of ways to implement efficient jumpahead for MT are given here: A Fast Jump Ahead Algorithm for Linear Recurrences in a Polynomial Space http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/jump-seta-lfsr.pdf But because MT isn't a _simple_ recurrence, they're all depressingly complex :-( For Wichmann-Hill it was just a few integer modular exponentiations. > and for use with multiple threads, you can (and should) create your > own instances that don't share state. I call that "multi-thread friendliness" :-) That's what people do, but MT's creators don't recommend it anymore (IIRC, their FAQ did recommend it some years ago). Then they switched to recommending using jumpahead with a large value (to guarantee different instances' states would never overlap). Now (well, last I saw) they recommend a parameterized scheme creating a distinct variant of MT per thread (not just different state, but a different (albeit related) algorithm): http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf So I'd say it's clear as mud ;-) > ... From tim.peters at gmail.com Thu Sep 10 05:23:22 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 22:23:22 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: [Nathaniel Smith ] > Yeah, equidistribution is not a guarantee of anything on its own. For > example, an integer counter modulo 2**(623*32) is equidistributed to > 32 bits across 623 dimensions, just like the Mersenne Twister. The analogy is almost exact. If you view MT's state as a single 19937-bit integer (=623*32 + 1), then MT's state "simply" cycles through a specific permutation of range(1, 2**19937) (with a different orbit consisting solely of 0). That was the "hard part" to prove. Everything about equidistribution was more of an observation following from that. Doesn't say anything about distribution "in the small" (across small slices), alas. > ... And hey, if > arc4random *does* mess up your simulation, then congratulations, your > simulation is publishable as a cryptographic attack and will probably > get written up in the NYTimes :-). Heh. In the NYT or a security wonk's blog, maybe. But why would a reputable journal believe me? By design, the results of using the OpenBSD arc4random can't be reproduced ;-) > The real reasons to prefer non-cryptographic RNGs are the auxiliary > features like determinism, speed, jumpahead, multi-thread > friendliness, etc. But the stdlib random module doesn't really provide > any of these (except determinism in strictly limited cases), so I'm > not sure it matters much. Python's implementation of MT has never changed anything about the sequence produced from a given seed state, and indeed gives the same sequence from the same seed state as every other correct implementation of the same flavor of MT. That is "strictly limited", to perfection ;-) At a higher level, depends on the app. People are quite creative at defeating efforts to be helpful ;-) >> The Twister's provably perfect equidistribution across its whole >> period also has its scary sides. For example, run random.random() >> often enough, and it's _guaranteed_ you'll eventually reach a state >> where the output is exactly 0.0 hundreds of times in a row. That will >> happen as often as it "should happen" by chance, but that's scant >> comfort if you happen to hit such a state. Indeed, the Twister was >> patched relatively early in its life to try to prevent it from >> _starting_ in such miserable states. Such states are nevertheless >> still reachable from every starting state. > This criticism seems a bit unfair though Those are facts, not criticisms. I like the Twister very much. But those who have no fear of it are dreaming - while those who have significant fear of it are also dreaming. It's my job to ensure nobody is either frightened or complacent ;-) > -- even a true stream of random bits (e.g. from a true unbiased > quantum source) has this property, But good generators with astronomically smaller periods do not. In a sense, MT made it possible to get results "far more random" than any widely used deterministic generator before it. The patch I mentioned above was to fix real problems in real life, where people using simple seeding schemes systematically ended up with results so transparently ludicrous nobody could possibly accept them for any purpose. "The fix" consisted of using scrambling functions to spray the bits in the user-*supplied* seed all over the place, in a pseudo-random way, to probabilistically ensure "the real" state wouldn't end up with "too many" zero bits. "A lot of zero bits" tends to persist across MT state transitions for a long time. > and trying to avoid this happening would introduce bias that really > could cause problems in practice. For example, nobody _needs_ a generator capable of producing hundreds of 0.0 in a row. Use a variant even of MT with a much smaller period, and that problem goes away, with no bad effects on any app. Push that more, and what many Monte Carlo applications _really_ want is "low discrepancy": some randomness is essential, but becomes a waste of cycles and assaults "common sense" if it gets too far from covering the domain more-or-less uniformly. So there are many ways known of generating "quasi-random" sequences instead, melding a notion of randomness with guarantees of relatively low discrepancy (low deviation from uniformity). Nothing works best for all purposes - except Python. > A good probabilistic program is one that has a high probability > of returning some useful result, but they always have some > low probability of returning something weird. So this is just > saying that most people don't understand probability. Or nobody does. Probability really is insufferably subtle. Guido should ban it. > Which is true, but there isn't much that the random module can do > about it :-) Python should just remove it. It's an "attractive nuisance" to everyone ;-) From stephen at xemacs.org Thu Sep 10 05:25:29 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 10 Sep 2015 12:25:29 +0900 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: <87613j0xcm.fsf@uwakimon.sk.tsukuba.ac.jp> Nathaniel Smith writes: > That seems more productive in the short run than trying to > get everyone to stop typing "pip" :-). FWIW, I did as soon as I realized python_i_want_to_install -m pip worked; it's obvious that it DTRTs, and I felt like I'd just dropped the hammer I'd been whacking my head with. > (Though I do agree that having pip as a separate command from > python is a big mess -- another case where this comes up is the > need for pip versus pip3.) Ah, that's the name of my hammer, although it's come up in 3.2 vs 3.3 as well. > It sounds like this is another place where in the short term, it would > help a lot of pip at startup took a peek at $PATH and issued some > warnings or errors if it detected the most common types of > misconfiguration? (E.g. the first python/python3 in $PATH does not > match the one being used to run pip.) I don't understand the logic for trying to save the pip command by making its environment checking more complex than the app itself. "python -m pip" suffers from no problems that pip itself doesn't suffer from, and is far more reliable, without blaming the user. Sure, people used to using a pip command shouldn't be deprived of it, but I'll never miss it, and I don't see why anybody who isn't already using it would miss it. The only problem with "python -m pip" is discoverability/memorability, and the fact that interactive use of "from pip import main" is not properly supported IIUC (not to mention clumsy). Thus the proposal for a builtin named "install" or similar. From stephen at xemacs.org Thu Sep 10 05:32:23 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 10 Sep 2015 12:32:23 +0900 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: <874mj30x14.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert via Python-ideas writes: > If StackOverflow/SU/TD questions are any indication, a > disproportionate number of these people are Mac users using Python > 2.7, who have installed a second Python 2.7 (or, in some cases, two > of them) alongside Apple's. Often enough it's the other way around: the distro catches up to the user as they upgrade. I didn't even realize "10.10 Yosemite" had 2.7, this box has been upgraded from "10.7 Lion" or so, and I just use MacPorts 2.7 all the time. I haven't worried about what Apple supplies as /usr/bin/python in 6 or 7 years. I don't know if this matters to the effect on pip, but I thought it should be mentioned. From njs at pobox.com Thu Sep 10 05:32:45 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Sep 2015 20:32:45 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: [Sorry to Tim and Steven if they get multiple copies of this... Gmail recently broke their Android app's handling of from addresses, so resending, sigh] On Sep 9, 2015 7:24 PM, "Tim Peters" wrote: [...] > [Steven D'Aprano] [...] > > and for use with multiple threads, you can (and should) create your > > own instances that don't share state. I call that "multi-thread friendliness" :-) > > That's what people do, but MT's creators don't recommend it anymore > (IIRC, their FAQ did recommend it some years ago). Then they switched > to recommending using jumpahead with a large value (to guarantee > different instances' states would never overlap). Now (well, last I > saw) they recommend a parameterized scheme creating a distinct variant > of MT per thread (not just different state, but a different (albeit > related) algorithm): > > http:// www.math.sci.hiroshima-u.ac.jp /~m-mat/MT/DC/ dgene.pdf > > So I'd say it's clear as mud ;-) Yeah, the independent-seed-for-each-thread approach is an option with any RNG, but just like people feel better if they have a 100% certified guarantee that the RNG output in a single thread will pass through every combination of possible values (if you wait some cosmological time), they also feel better if there is some 100% certified guarantee that the RNG values in two threads will also be uncorrelated with each other. With something like MT, if two threads did end up with nearby seeds, then that would be bad: each thread individually would see values that looked like high quality randomness, but if you compared across the two threads, they would be identical modulo some lag. So all the nice theoretical analysis of the single threaded stream falls apart. However, for two independently seeded threads to end up anywhere near each other in the MT state space requires that you have picked two numbers between 0 and 2**19937 and gotten values that were "close". Assuming your seeding procedure is functional at all, then this is not a thing that will ever actually happen in this universe. So AFAICT the rise of explicitly multi-threaded RNG designs is one of those fake problems that exists only so people can write papers about solving it. (Maybe this is uncharitable.) So there exist RNG designs that handle multi-threading explicitly, and it shows up on feature comparison checklists. I don't think it should really affect Python's decisions at all though. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 10 05:35:55 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 22:35:55 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: [Nathaniel Smith ] > Yeah, the independent-seed-for-each-thread approach works for any RNG, but > just like people feel better if they have a 100% certified guarantee that > the RNG output in a single thread will pass through every combination of > possible values (if you wait some cosmological time), they also feel better > if there is some 100% certified guarantee that the RNG values in two threads > will also be uncorrelated with each other. > > With something like MT, if two threads did end up with nearby seeds, then > that would be bad: each thread individually would see values that looked > like high quality randomness, but if you compared across the two threads, > they would be identical modulo some lag. So all the nice theoretical > analysis of the single threaded stream falls apart. > > However, for two independently seeded threads to end up anywhere near each > other in the MT state space requires that you have picked two numbers > between 0 and 2**19937 and gotten values that were "close". Assuming your > seeding procedure is functional at all, then this is not a thing that will > ever actually happen in this universe. I think it's worse than that. MT is based on a linear recurrence. Take two streams "far apart" in MT space, and their sum also satisfies the recurrence. So a possible worry about a third stream isn't _just_ about correlation or overlap with the first two streams, but, depending on the app, also about correlation/overlap with the sum of the first two streams. Move to N streams, and there are O(N**2) direct sums to worry about, and then sums of sums, and ... Still won't cause a problem in _my_ statistical life expectancy, but I only have 4 cores ;-) > So AFAICT the rise of explicitly multi-threaded RNG designs is one of > those fake problems that exists only so people can write papers about > solving it. (Maybe this is uncharitable.) Uncharitable, but fair :-) > So there exist RNG designs that handle multi-threading explicitly, and it > shows up on feature comparison checklists. I don't think it should really > affect Python's decisions at all though. There are some clean and easy approaches to this based on crypto-inspired schemes, but giving up crypto strength for speed. If you haven't read it, this paper is delightful: http://www.thesalmons.org/john/random123/papers/random123sc11.pdf From ben+python at benfinney.id.au Thu Sep 10 05:37:55 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 10 Sep 2015 13:37:55 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: <858u8f7xm4.fsf@benfinney.id.au> Nathaniel Smith writes: > [?] in the short term, it would help a lot of pip at startup took a > peek at $PATH and issued some warnings or errors if it detected the > most common types of misconfiguration? (E.g. the first python/python3 > in $PATH does not match the one being used to run pip.) Isn't that something that would be better in the Python executable itself? Many commands would be better with a (overridable) default behaviour as you describe. -- \ ?Considering the current sad state of our computer programs, | `\ software development is clearly still a black art, and cannot | _o__) yet be called an engineering discipline.? ?Bill Clinton | Ben Finney From jlehtosalo at gmail.com Thu Sep 10 05:40:47 2015 From: jlehtosalo at gmail.com (Jukka Lehtosalo) Date: Wed, 9 Sep 2015 20:40:47 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F0A1A8.5010001@mail.de> References: <55F0A1A8.5010001@mail.de> Message-ID: On Wed, Sep 9, 2015 at 2:16 PM, Sven R. Kunze wrote: > Thanks for sharing, Guido. Some random thoughts: > > - "classes should need to be explicitly marked as protocols" > If so, why are they classes in the first place? Other languages has > dedicated keywords like "interface". > I want to preserve compatibility with earlier Python versions (down to 3.2), and this makes it impossible to add any new syntax. Also, there is no need to add a keyword as there are other existing mechanisms which are good enough, including base classes (as in the proposal) and class decorators. I don't think that this will become a very commonly used language feature, and thus adding special syntax for this doesn't seem very important. My expectation is that structural subtyping would be primarily useful for libraries and frameworks. Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Sep 10 05:44:05 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Sep 2015 20:44:05 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <858u8f7xm4.fsf@benfinney.id.au> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> <858u8f7xm4.fsf@benfinney.id.au> Message-ID: On Wed, Sep 9, 2015 at 8:37 PM, Ben Finney wrote: > Nathaniel Smith writes: > >> [?] in the short term, it would help a lot of pip at startup took a >> peek at $PATH and issued some warnings or errors if it detected the >> most common types of misconfiguration? (E.g. the first python/python3 >> in $PATH does not match the one being used to run pip.) > > Isn't that something that would be better in the Python executable > itself? Many commands would be better with a (overridable) default > behaviour as you describe. While that's debatable, any plan that only benefits users of python 3.6+ is a non-starter is a non-starter, given that the goal here is short-term harm reduction. -n -- Nathaniel J. Smith -- http://vorpus.org From steve at pearwood.info Thu Sep 10 05:46:08 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 10 Sep 2015 13:46:08 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <20150910034608.GQ19373@ando.pearwood.info> On Wed, Sep 09, 2015 at 08:01:16PM -0400, Donald Stufft wrote: [...] > Looking on google, the first result for "python random password" is > StackOverflow which suggests: > > ? ? ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N)) > > However, it was later edited to, after that, include: > > ? ? ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(N)) You're worried about attacks on the random number generator that produces the characters in the password? I think I'm going to have to see an attack before I believe that this is meaningful. Excluding PRNGs that are hopelessly biased ("nine, nine, nine, nine...") or predictable, how does knowing the PRNG help in an attack? Here's a password I just generated using your "corrected" version using SystemRandom: 06XW0X0X (Honest, that's exactly what I got on my first try.) Here's one I generated using the "bad" code snippet: V6CFKCF2 How can you tell them apart, or attack one but not the other based on the PRNG? > So it wasn't obvious to the person who answered that question that the random > module's module scoped functions were not appropiate for this use. It appears > that the original answer lasted for roughly 4 years before it was corrected, Shouldn't it be using a single instance of SystemRandom rather than a new instance for each call? [...] > According to Theo, modern userland CSPRNGs can create random bytes faster than > memcpy That is an astonishing claim, and I'd want to see evidence for it before accepting it. -- Steve From tritium-list at sdamon.com Thu Sep 10 05:51:42 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Wed, 09 Sep 2015 23:51:42 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150910021108.GP19373@ando.pearwood.info> References: <55F0BD0F.10508@sdamon.com> <20150910021108.GP19373@ando.pearwood.info> Message-ID: <55F0FE4E.5040802@sdamon.com> On 9/9/2015 22:11, Steven D'Aprano wrote: > If the crypto PRNG were comparable in speed to what we have now (not > significantly slower), or faster,*and* gave reproducible results with > the same seed,*and* had no known/detectable statistical biases), and we > could promise that those properties would continue to hold even when the > state of the art changed and we got a new default crypto PRNG, then I'd > still be -0.5 on the change due to the "false sense of security" factor. +1 Exactly this. If you can give me the same functionality (including seeding), make it faster *and* more secure, I have zero objections. I *still* do not think we should go out of our way to make random a good source of cryptographic data, since... Lets be frank about this, Guido is not a security expert. I am not a security expert. Tim, I suspect you are not a security expert. Lets leave actually attempting to be at the cutting edge of cryptographic randomness to modules by security experts. I have far too much use for randomness outside of a cryptographic context to sacrifice the API and feature set we have for, in my opinion, a myopic focus on one, already discouraged, use of the random module. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Sep 10 05:56:56 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Sep 2015 20:56:56 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: On Wed, Sep 9, 2015 at 8:35 PM, Tim Peters wrote: > There are some clean and easy approaches to this based on > crypto-inspired schemes, but giving up crypto strength for speed. If > you haven't read it, this paper is delightful: > > http://www.thesalmons.org/john/random123/papers/random123sc11.pdf It really is! As AES acceleration instructions become more common (they're now standard IIUC on x86, x86-64, and even recent ARM?), even just using AES in CTR mode becomes pretty compelling -- it's fast, deterministic, provably equidistributed, *and* cryptographically secure enough for many purposes. (Compared to a true state-of-the-art CPRNG the naive version fails due to lack of incremental mixing, and the use of a reversible transition function. But even these are mostly only important to protect against attackers who have access to your memory -- which is not trivial as heartbleed shows, but still, it's *waaay* ahead of something like MT on basically every important axis.) -n -- Nathaniel J. Smith -- http://vorpus.org From random832 at fastmail.com Thu Sep 10 05:59:22 2015 From: random832 at fastmail.com (Random832) Date: Wed, 09 Sep 2015 23:59:22 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux References: <20150910034608.GQ19373@ando.pearwood.info> Message-ID: Steven D'Aprano writes: > On Wed, Sep 09, 2015 at 08:01:16PM -0400, Donald Stufft wrote: > [...] > > You're worried about attacks on the random number generator that > produces the characters in the password? I think I'm going to have to > see an attack before I believe that this is meaningful. Isn't the only difference between generating a password and generating a key the length (and base) of the string? Where is the line? > That is an astonishing claim, and I'd want to see evidence for it before > accepting it. I assume it's comparing a CSPRNG all of whose state is in cache (or registers, if a large block of random bytes is requested from the CSPRNG in one go, with memcpy of data which must be retrieved from main memory. From jlehtosalo at gmail.com Thu Sep 10 06:12:24 2015 From: jlehtosalo at gmail.com (Jukka Lehtosalo) Date: Wed, 9 Sep 2015 21:12:24 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F0AC83.3050505@mail.de> References: <55F0AC83.3050505@mail.de> Message-ID: On Wed, Sep 9, 2015 at 3:02 PM, Sven R. Kunze wrote: > Not specifically about this proposal but about the effort put into Python > typehinting in general currently: > > > What are the supposed benefits? > This has been discussed almost to the death before, but there are some of main the benefits as I see them: - Code becomes more readable. This is especially true for code that doesn't have very detailed docstrings. This may go against the intuition of some people, but my experience strongly suggests this, and many others who've used optional typing have shared the sentiment. It probably takes a couple of days before you get used to the type annotations, after which they likely won't distract you any more but will actually improve code understanding by providing important contextual information that is often difficult to infer otherwise. - Tools can automatically find most (simple) bugs of certain common kinds in statically typed code. A lot of production code has way below 100% test coverage, so this can save many manual testing iterations and help avoid breaking stuff in production due to stupid mistakes (that humans are bad at spotting). - Refactoring becomes way less scary, especially if you don't have close to 100% test coverage. A type checker can find many mistakes that are commonly introduced when refactoring code. You'll get the biggest benefits if you are working on a large code base mostly written by other people with limited test coverage and little comments or documentation. You get extra credit if your tests are slow to run and flaky, as this slows down your iteration speed, whereas type checking can be quick (with the right tools, which might not exist as of now ;-). If you have a small (say, less than 10k lines) code base you've mostly written yourself and have meticuously documented everything and have 95% test coverage and your full test suite runs in 10 seconds, you'll probably get less out of it. Context matters. > > I somewhere read that right now tools are able to infer 60% of the types. > That seems pretty good to me and a lot of effort on your side to make some > additional 20?/30? %. Don't get me wrong, I like the theoretical and > abstract discussions around this topic but I feel this type of feature way > out of the practical realm. > Such a tool can't infer 40% of the types. This probably includes most of the tricky parts of the program that I'd actually like to statically check. A type checker that uses annotations might understand 95% of the types, i.e. it would miss 5% of the types. This seems like a reasonable figure for code that has been written with some thought about type checkability. I consider that difference pretty significant. I wouldn't want to increase the fraction of unchecked parts of my annotated code by a factor of 8, and I want to have control over which parts can be type checked. Jukka > > I don't see the effort for adding type hints AND the effort for further > parsing (by human eyes) justified by partially better IDE support and 1 > single additional test within test suites of about 10,000s of tests. > > Especially, when considering that correct types don't prove functionality > in any case. But tested functionality in some way proves correct typing. > > Just my two cents since I felt I had to say this and maybe I am missing > something. :) > > Best, > Sven > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Sep 10 06:32:39 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 10 Sep 2015 14:32:39 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <87613j0xcm.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> <87613j0xcm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Sep 10, 2015 at 1:25 PM, Stephen J. Turnbull wrote: > Nathaniel Smith writes: > > > That seems more productive in the short run than trying to > > get everyone to stop typing "pip" :-). > > FWIW, I did as soon as I realized python_i_want_to_install -m pip > worked; it's obvious that it DTRTs, and I felt like I'd just dropped > the hammer I'd been whacking my head with. If the problem with this is the verbosity of it ("python -m pip install packagename" - five words), would there be benefit in blessing pip with some core interpreter functionality, allowing either: $ python install packagename or $ python -p packagename to do the one most common operation, installation? (And since it's new syntax, it could default to --upgrade, which would match the behaviour of other package managers like apt-get.) Since the base command is "python", it automatically uses the same interpreter and environment as you otherwise would. It's less verbose than bouncing through -m. It gives Python the feeling of having an integrated package manager, which IMO wouldn't be a bad thing. Of course, that wouldn't help with the 2.7 people, but it might allow the deprecation of the 'pip' wrapper. Would it actually help? ChrisA From jlehtosalo at gmail.com Thu Sep 10 06:34:47 2015 From: jlehtosalo at gmail.com (Jukka Lehtosalo) Date: Wed, 9 Sep 2015 21:34:47 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <7DC7EA44-0CD8-4F61-8462-8147B8BB8059@yahoo.com> References: <7DC7EA44-0CD8-4F61-8462-8147B8BB8059@yahoo.com> Message-ID: On Wed, Sep 9, 2015 at 3:08 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > On Sep 9, 2015, at 13:17, Guido van Rossum wrote: > > Jukka wrote up a proposal for structural subtyping. It's pretty good. > Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 > > > Are we going to continue to have (both implicit and explicit) ABCs in > collections.abc, numbers, etc., and also have protocols that are also ABCs > and are largely parallel to them (and implicit at static checking time > whether they're implicit or explicit at runtime) In typing? If so, I think > we've reached the point where the two parallel hierarchies are a problem. > I'm not proposing creating protocols for numbers or most collection types. I'd change some of the existing ABCs (mentioned in the proposal, including things like Sized) in typing into equivalent protocols, but they'd still support isinstance as before and would be functionally almost identical to the existing ABCs. I clarified the latter fact in the github issue. > > Also, why are both the terminology and implementation so different from > what we already have for ABCs? Why not just have a decorator or metaclass > that can be added to ABCs that makes them implicit (rather than writing a > manual __subclasshook__ for each one), which also makes them implicit at > static type checking time, which means there's no need for a whole separate > but similar notion? > Protocol would use a metaclass that is derived from the ABC metaclass, and it would be similar to the Generic class that we already have. The reason why the proposal doesn't use an explicit metaclass or a class decorator is consistency. It's possible to define generic protocols by having Protocol[t, ...] as a base class, which is consistent with how Generic[...] works. The latter is already part of typing, and introducing a similar concept with a different syntax seems inelegant to me. Consider a generic class: class Bucket(Generic[T]): ... Now we can have a generic protocol using a very similar syntax: class BucketProtocol(Protocol[T]): ... I wonder how we'd use a metaclass or a class decorator to represent generic protocols. Maybe something like this: @protocol[T] class BucketProtocol: ... However, this looks quite different from the Generic[...] case and thus I'd rather not use it. I guess if we'd have picked this syntax for generic classes it would make more sense: @generic[T] class Bucket: ... > I'm not sure why it's important to also have some times that are implicit > at static type checking time but not at runtime, but if there is a good > reason, that just means two different decorators/metaclasses/whatever (or a > flag passed to the decorator, etc.). Compare: > > Hashable is an implicit ABC, Sequence is an explicit ABC, Reversible is an > implicit-static/explicit-runtime ABC. > > Hashable is an implicit ABC and also a Protocol that's an explicit ABC, > Sequence is an explicit ABC and not a Protocol, Reversible is a Protocol > that's an explicit ABC. > > The first one is clearly simpler; is there some compelling reason that > makes the second one better anyway? > I'm not sure if I fully understand what you mean by implicit vs. explicit ABCs (and the static/runtime distinction). Could you define these terms and maybe give some examples of each? Note that in my proposal a protocol is just a kind of ABC, as GenericMeta is a subclass of ABCMeta and protocol would have a similar metaclass (or maybe even the same one), even though I'm not sure if I explicitly mentioned that. Every protocol is also an ABC. Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Sep 10 01:23:13 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 10 Sep 2015 11:23:13 +1200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150909190757.GM19373@ando.pearwood.info> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: <55F0BF61.6050205@canterbury.ac.nz> Steven D'Aprano wrote: > one desirable > property of PRNGs is that you can repeat a sequence of values if you > re-seed with a known value. Does arc4random keep that property? Another property that's important for some applications is to be able to efficiently "jump ahead" some number of steps in the sequence, to produce multiple independent streams of numbers. It would be good to know if that is possible with arc4random. -- Greg From tim.peters at gmail.com Thu Sep 10 06:47:53 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 23:47:53 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: [Tim, on parallel PRNGs] >> There are some clean and easy approaches to this based on >> crypto-inspired schemes, but giving up crypto strength for speed. If >> you haven't read it, this paper is delightful: >> >> http://www.thesalmons.org/john/random123/papers/random123sc11.pdf [Nathaniel Smith] > It really is! As AES acceleration instructions become more common > (they're now standard IIUC on x86, x86-64, and even recent ARM?), even > just using AES in CTR mode becomes pretty compelling -- it's fast, > deterministic, provably equidistributed, *and* cryptographically > secure enough for many purposes. Excellent - we're going to have a hard time finding something real to disagree about :-) > (Compared to a true state-of-the-art CPRNG the naive version fails due > to lack of incremental mixing, and the use of a reversible transition > function. But even these are mostly only important to protect against > attackers who have access to your memory -- which is not trivial as > heartbleed shows, but still, it's *waaay* ahead of something like MT > on basically every important axis.) Except for wide adoption. Most people I bump into never even heard of this kind of approach. Nobody ever got fired for buying IBM, and nobody ever got fired for recommending MT - it's darned near a checklist item when shopping for a language. I may have to sneak the code in while you distract Guido with a provocative rant about the inherent perfidy of the Dutch character ;-) From rosuav at gmail.com Thu Sep 10 06:57:29 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 10 Sep 2015 14:57:29 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F0BF61.6050205@canterbury.ac.nz> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> Message-ID: On Thu, Sep 10, 2015 at 9:23 AM, Greg Ewing wrote: > Steven D'Aprano wrote: >> >> one desirable property of PRNGs is that you can repeat a sequence of >> values if you re-seed with a known value. Does arc4random keep that >> property? > > > Another property that's important for some applications is > to be able to efficiently "jump ahead" some number of steps > in the sequence, to produce multiple independent streams of > numbers. It would be good to know if that is possible with > arc4random. If arc4random reseeds with entropy periodically, then jumping ahead past such a reseed is simply a matter of performing a reseed, isn't it? ChrisA From tim.peters at gmail.com Thu Sep 10 06:58:33 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 9 Sep 2015 23:58:33 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F0BF61.6050205@canterbury.ac.nz> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> Message-ID: [Steven D'Aprano] >> one desirable property of PRNGs is that you can repeat a sequence of >> values if you re-seed with a known value. Does arc4random keep that >> property? [Greg Ewing] [> Another property that's important for some applications is > to be able to efficiently "jump ahead" some number of steps > in the sequence, to produce multiple independent streams of > numbers. It would be good to know if that is possible with > arc4random. No for "arc4random" based on RC4, yes for "arc4random" based on ChaCha20, "mostly yes" for "arc4random" in the OpenBSD implementation, wholly unknown for whatever functions that will may be_called_ "arc4random" in the future. The fly in the ointment for the OpenBSD version is that it periodically fiddles its internal state with "entropy" obtained from the kernel. It's completely unreproducible for that reason. However, you can still jump ahead in the state. It's just impossible to say that it's the same state you would have arrived at had you invoked the function that many times instead (the kernel could change the state in unpredictable ways any number of times while you were doing that).; From njs at pobox.com Thu Sep 10 06:59:08 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Sep 2015 21:59:08 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F0BF61.6050205@canterbury.ac.nz> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> Message-ID: On Wed, Sep 9, 2015 at 4:23 PM, Greg Ewing wrote: > Steven D'Aprano wrote: >> >> one desirable property of PRNGs is that you can repeat a sequence of >> values if you re-seed with a known value. Does arc4random keep that >> property? > > Another property that's important for some applications is > to be able to efficiently "jump ahead" some number of steps > in the sequence, to produce multiple independent streams of > numbers. It would be good to know if that is possible with > arc4random. The answer to both of these questions is no. For modern cryptographic PRNGs, full determinism is considered a flaw, and determinism is a necessary precondition to supporting jumpahead. The reason is that even if an attacker learns your secret RNG state at time t, then you want this to have a limited impact -- they'll obviously be able to predict your RNG output for a while, but you don't want them to be able to predict it from now until the end of time. So determinism is considered bad, and high-quality CPRNGs automatically reseed themselves with new entropy according to some carefully designed schedule. And OpenBSD's "arc4random" generator is a high-quality CPRNG in this sense. -- Nathaniel J. Smith -- http://vorpus.org From tim.peters at gmail.com Thu Sep 10 07:00:45 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 10 Sep 2015 00:00:45 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> Message-ID: [Chris Angelico] > If arc4random reseeds with entropy periodically, then jumping ahead > past such a reseed is simply a matter of performing a reseed, isn't > it? The OpenBSD version supplies no functionality related to seeds (you can't set one, and you can't ask for one). From tim.peters at gmail.com Thu Sep 10 07:10:10 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 10 Sep 2015 00:10:10 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> Message-ID: [Tim] > ... > The fly in the ointment for the OpenBSD version is that it > periodically fiddles its internal state with "entropy" obtained from > the kernel. It's completely unreproducible for that reason. However, > you can still jump ahead in the state. I should add: but only if they supplied a jumpahead function. Which they don't. From abarnert at yahoo.com Thu Sep 10 07:15:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 9 Sep 2015 22:15:16 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <874mj30x14.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> <874mj30x14.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <7A4752B7-90B8-49CE-9EF6-FF182CE0411D@yahoo.com> On Sep 9, 2015, at 20:32, Stephen J. Turnbull wrote: > > Andrew Barnert via Python-ideas writes: > >> If StackOverflow/SU/TD questions are any indication, a >> disproportionate number of these people are Mac users using Python >> 2.7, who have installed a second Python 2.7 (or, in some cases, two >> of them) alongside Apple's. > > Often enough it's the other way around: the distro catches up to the > user as they upgrade. I didn't even realize "10.10 Yosemite" had 2.7, > this box has been upgraded from "10.7 Lion" or so, No, that's not the problem. Lion came with 2.7.1, so you already had it before upgrading it, and it's hard to imagine Apple upgrading your system 2.7.1 to 2.7.6 or 2.7.10 broke anything. More likely, Apple screwed up your PATH, or broke your MacPorts so you had to reinstall or repair it? > and I just use > MacPorts 2.7 all the time. I haven't worried about what Apple > supplies as /usr/bin/python in 6 or 7 years. I'd assume most people on this list know what they're doing with their PATH. If you don't, then you just got lucky for a few years. Well, not just lucky--MacPorts does go out of its way to make things easier for you in various ways (hammering home keeping /opt/local/bin at the the start of your PATH, trying to adjust the PATH system-wide and for LaunchServices as well as shells, providing many packages as ports so you don't need pip, offering a python-select tool that autodetects Apple and PSF Pythons and tries to make them play nice with MacPorts Pythons, etc.). So MacPorts users don't see such problems nearly as often as Homebrew (or Fink or Gentoo Prefix, if anyone still uses those), PSF installers, and third-party extra-batteries installers. But they still can come up--and if you didn't even realize you were running multiple Python 2.7 versions in parallel, that just means you never tried anything that MacPorts didn't anticipate. And, of course, most people with two Python 2.7s on Mac are not using MacPorts anyway. From njs at pobox.com Thu Sep 10 07:55:39 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 9 Sep 2015 22:55:39 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: On Wed, Sep 9, 2015 at 9:47 PM, Tim Peters wrote: > [Tim, on parallel PRNGs] >>> There are some clean and easy approaches to this based on >>> crypto-inspired schemes, but giving up crypto strength for speed. If >>> you haven't read it, this paper is delightful: >>> >>> http://www.thesalmons.org/john/random123/papers/random123sc11.pdf > > [Nathaniel Smith] >> It really is! As AES acceleration instructions become more common >> (they're now standard IIUC on x86, x86-64, and even recent ARM?), even >> just using AES in CTR mode becomes pretty compelling -- it's fast, >> deterministic, provably equidistributed, *and* cryptographically >> secure enough for many purposes. > > Excellent - we're going to have a hard time finding something real to > disagree about :-) > > >> (Compared to a true state-of-the-art CPRNG the naive version fails due >> to lack of incremental mixing, and the use of a reversible transition >> function. But even these are mostly only important to protect against >> attackers who have access to your memory -- which is not trivial as >> heartbleed shows, but still, it's *waaay* ahead of something like MT >> on basically every important axis.) > > Except for wide adoption. Most people I bump into never even heard of > this kind of approach. Nobody ever got fired for buying IBM, and > nobody ever got fired for recommending MT - it's darned near a > checklist item when shopping for a language. I may have to sneak the > code in while you distract Guido with a provocative rant about the > inherent perfidy of the Dutch character ;-) :-) Srsly though, we've talked about switching to some kind of CTR-mode RNG as the default in NumPy (where speed differences are pretty visible, b/c we support generating big blocks of random numbers at once), and would probably accept a patch. (Just in case Guido is undisturbed by scurrilous allegations.) -- Nathaniel J. Smith -- http://vorpus.org From rosuav at gmail.com Thu Sep 10 08:08:09 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 10 Sep 2015 16:08:09 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> Message-ID: On Thu, Sep 10, 2015 at 11:50 AM, Andrew Barnert via Python-ideas wrote: > Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst. +1. A single function call that replaces all the methods adds a minuscule constant to code size, run time, etc, and it's no less readable than assignment to a module attribute. (If anything, it makes it more clearly a supported operation - I've seen novices not realize that "module.xyz = foo" is valid, but nobody would misunderstand the validity of a function call.) ChrisA From tjreedy at udel.edu Thu Sep 10 09:07:11 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 Sep 2015 03:07:11 -0400 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: References: Message-ID: On 9/9/2015 1:10 PM, Stephan Sahm wrote: > I found a BUG in the standard while statement, which appears both in > python 2.7 and python 3.4 on my system. No you did not, but aside from that: python-ideas is for ideas about future versions of python, not for bug reports, valid or otherwise. You should have sent this to python-list, which is a place to report possible bugs. -- Terry Jan Reedy From encukou at gmail.com Thu Sep 10 09:35:11 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 10 Sep 2015 09:35:11 +0200 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Thu, Sep 10, 2015 at 3:30 AM, Donald Stufft wrote: [...] > > So I guess my suggestion would be, let's deprecate the module scope functions > and rename random.Random to random.DeterministicRandom. This absolves us of > needing to change the behavior of people's existing code (besides deprecating > it) and we don't need to decide if a userland CSPRNG is safe or not while still > moving us to a situation that is far more likely to have users doing the right > thing. There is one use case that would be hit by that: the kid writing their first rock-paper-scissors game. A beginner who just learned the `if` statement isn't ready for a discussion of cryptography vs. reproducible results, and random.SystemRandom.random() would just become a magic incantation to learn. It would feel like requiring sys.stdout.write() instead of print(). Functions like paretovariate(), getstate(), or seed(), which require some understanding of (pseudo)randomness, can be moved to a specific class, but I don't think deprecating random(), randint(), randrange(), choice(), and shuffle() would not be a good idea. Switching them to a cryptographically safe RNG is OK from this perspective, though. From cory at lukasa.co.uk Thu Sep 10 09:51:58 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 10 Sep 2015 08:51:58 +0100 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: On 9 September 2015 at 21:17, Guido van Rossum wrote: > Jukka wrote up a proposal for structural subtyping. It's pretty good. Please > discuss. Some good feedback has been provided in this thread already, but I want to provide an enthusiastic +1 for this change. I'm one of the people who has been extremely lukewarm towards the Python type hints proposal, but I believe this addresses one of my major areas of concern. Overall the proposal seems like a graceful solution to many of the duck typing problems. It does not address all of them, particularly around classes that may dynamically (but deterministically) modify themselves to satisfy the constraints of the Protocol (e.g. by generating methods for themselves at instantiation-time), but that's a pretty hairy use-case and there's not much that a static type checker could do about it anyway. Altogether this looks great (modulo a couple of small concerns raised by others), and it's enough for me to consider using static type hints on basically all my projects with the ongoing exception of Requests (which has duck typing problems that this cannot solve, I think). Great work Jukka! From p.f.moore at gmail.com Thu Sep 10 10:01:58 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Sep 2015 09:01:58 +0100 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On 9 September 2015 at 23:40, Nathaniel Smith wrote: > At the very least, surely this could be "fixed" by detecting this case > and exiting with a message "Sorry, Windows is annoying and this isn't > going to work, to upgrade pip please type 'python -m pip ...' > instead"? That seems more productive in the short run than trying to > get everyone to stop typing "pip" :-). (Though I do agree that having > pip as a separate command from python is a big mess -- another case > where this comes up is the need for pip versus pip3.) That's already done (without the unnecessary passive-aggressive sniping at Windows) and we still get users raising bugs because they didn't read the message, or because they misinterpreted something. As I said, we've tried lots of solutions. What we haven't had yet is anyone come up with an actual working PR that fixes the issue (in the sense of addressing the bug reports we get) better than the current code (if we had, we'd have applied the PR). Paul From p.f.moore at gmail.com Thu Sep 10 10:06:30 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Sep 2015 09:06:30 +0100 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On 9 September 2015 at 23:40, Nathaniel Smith wrote: > It sounds like this is another place where in the short term, it would > help a lot of pip at startup took a peek at $PATH and issued some > warnings or errors if it detected the most common types of > misconfiguration? (E.g. the first python/python3 in $PATH does not > match the one being used to run pip.) People (including the pip devs) have talked about this type of thing before. To my knowledge no-one has actually implemented it. Care to provide a PR for this? Paul From abarnert at yahoo.com Thu Sep 10 10:17:11 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 01:17:11 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> Message-ID: <8A294D36-C40F-405F-BB2E-94CD379B8165@yahoo.com> On Sep 9, 2015, at 23:08, Chris Angelico wrote: > > On Thu, Sep 10, 2015 at 11:50 AM, Andrew Barnert via Python-ideas > wrote: >> Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst. > > +1. A single function call that replaces all the methods adds a > minuscule constant to code size, run time, etc, and it's no less > readable than assignment to a module attribute. (If anything, it makes > it more clearly a supported operation - I've seen novices not realize > that "module.xyz = foo" is valid, but nobody would misunderstand the > validity of a function call.) I was only half-serious about this, but now I think I like it: it provides exactly the fix people are hoping to fix by deprecating the top-level functions, but with less risk, less user code churn, a smaller patch, and a much easier fix for novice users. (And it's much better than my earlier suggestion, too.) See https://gist.github.com/abarnert/e0fced7569e7d77f7464 for the patch, and a patched copy of random.py. The source comments in the patch should be enough to understand everything that's changed. A couple things: I'm not sure the normal deprecation path makes sense here. For a couple versions, everything continues to work (because most novices, the people we're thing to help, don't see DeprecationWarnings), and then suddenly their code breaks. Maybe making it a UserWarning makes more sense here? I made Random a synonym for UnsafeRandom (the class that warns and then passes through to DeterministicRandom). But is that really necessary? Someone who's explicitly using an instance of class Random rather than the top-level functions probably isn't someone who needs this warning, right? Also, if this is the way we'd want to go, the docs change would be a lot more substantial than the code change. I think the docs should be organized around choosing a random generator and using its methods, and only then mention set_default_instance as being useful for porting old code (and for making it easy for multiple modules to share a single generator, but that shouldn't be a common need for novices). From abarnert at yahoo.com Thu Sep 10 10:20:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 01:20:15 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <9DC28BE8-4444-4D92-A72A-7AC945C90005@yahoo.com> On Sep 10, 2015, at 00:35, Petr Viktorin wrote: > >> On Thu, Sep 10, 2015 at 3:30 AM, Donald Stufft wrote: >> [...] >> >> So I guess my suggestion would be, let's deprecate the module scope functions >> and rename random.Random to random.DeterministicRandom. This absolves us of >> needing to change the behavior of people's existing code (besides deprecating >> it) and we don't need to decide if a userland CSPRNG is safe or not while still >> moving us to a situation that is far more likely to have users doing the right >> thing. > > There is one use case that would be hit by that: the kid writing their > first rock-paper-scissors game. > A beginner who just learned the `if` statement isn't ready for a > discussion of cryptography vs. reproducible results, and > random.SystemRandom.random() would just become a magic incantation to > learn. It would feel like requiring sys.stdout.write() instead of > print(). > > Functions like paretovariate(), getstate(), or seed(), which require > some understanding of (pseudo)randomness, can be moved to a specific > class, but I don't think deprecating random(), randint(), randrange(), > choice(), and shuffle() would not be a good idea. Switching them to a > cryptographically safe RNG is OK from this perspective, though. Silently switching them could break a lot of code. I don't think there's any way around making them warn the user that they need to do something. I think the patch I just sent is a good way of doing that: the minimum thing they need to do is a one-liner, which is explained in the warning, and it also gives them enough information to check the docs or google the message and get some understanding of the choice if they're at all inclined to do so. (And if they aren't, well, either one works for the use case you're talking about, so let them flip a coin, or call random.choice.;)) From mal at egenix.com Thu Sep 10 10:26:23 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 10 Sep 2015 10:26:23 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> Message-ID: <55F13EAF.5040500@egenix.com> Reading this thread is fun, but it doesn't seem to be getting anywhere - perhaps that's part of the fun ;-) Realistically, I see two options: 1. Someone goes and implements the OpenBSD random function in C and put a package up on PyPI, updating it whenever OpenBSD thinks that a new algorithm is needed or a security issue has to be fixed (from my experience with other crypto software like OpenSSL, this should be on the order of every 2-6 months ;-)) 2. Ditto, but we put the module in the stdlib and then run around issuing patch level security releases every 2-6 months. Replacing our deterministic default PRNG with a non-deterministic one doesn't really fly, since we'd break an important feature of random.random(). You may remember that we already ran a similar stunt with the string hash function, with very mixed results. Calling the result of such a switch-over "secure" is even worse, since it's a promise we cannot keep (probably not even fully define). Better leave the promise at "insecure" - that's something we can promise forever and don't have to define :-) Regardless of what we end up with, I think Python land can do better than name it "arc4random". We're great at bike shedding, so how about we start the fun with "randomYMMV" :-) Overall, I think having more options for good PRNGs is great. Whether this "arc4random" is any good remains to be seen, but given that OpenBSD developed it, chances are higher than usual. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 10 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 8 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From abarnert at yahoo.com Thu Sep 10 10:26:20 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 01:26:20 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> <87613j0xcm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sep 9, 2015, at 21:32, Chris Angelico wrote: > >> On Thu, Sep 10, 2015 at 1:25 PM, Stephen J. Turnbull wrote: >> Nathaniel Smith writes: >> >>> That seems more productive in the short run than trying to >>> get everyone to stop typing "pip" :-). >> >> FWIW, I did as soon as I realized python_i_want_to_install -m pip >> worked; it's obvious that it DTRTs, and I felt like I'd just dropped >> the hammer I'd been whacking my head with. > > If the problem with this is the verbosity of it ("python -m pip > install packagename" - five words), would there be benefit in blessing > pip with some core interpreter functionality, allowing either: > > $ python install packagename > > or > > $ python -p packagename > > to do the one most common operation, installation? (And since it's new > syntax, it could default to --upgrade, which would match the behaviour > of other package managers like apt-get.) > > Since the base command is "python", it automatically uses the same > interpreter and environment as you otherwise would. It's less verbose > than bouncing through -m. It gives Python the feeling of having an > integrated package manager, which IMO wouldn't be a bad thing. > > Of course, that wouldn't help with the 2.7 people, but it might allow > the deprecation of the 'pip' wrapper. Would it actually help? > What about leaving the pip wrapper, but having it display a banner telling people to use python -m pip (and maybe suggesting they add an alias to their profile, if not Windows) and then do its thing as it currently does. (Maybe with some way to suppress the message if people want to say "I know what I'm doing; if my PATH is screwy I'll fix it".) If we also add the python -p, it can instead suggest that if version >= (3, 6). That seems like an easier way to get the message out there than trying to convince everyone to spread the word everywhere they teach anyone, or deprecating it and leaving people wondering what they're supposed to do instead. From storchaka at gmail.com Thu Sep 10 10:32:21 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 10 Sep 2015 11:32:21 +0300 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <8A294D36-C40F-405F-BB2E-94CD379B8165@yahoo.com> References: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> <8A294D36-C40F-405F-BB2E-94CD379B8165@yahoo.com> Message-ID: On 10.09.15 11:17, Andrew Barnert via Python-ideas wrote: > On Sep 9, 2015, at 23:08, Chris Angelico wrote: >> On Thu, Sep 10, 2015 at 11:50 AM, Andrew Barnert via Python-ideas >> wrote: >>> Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst. >> >> +1. A single function call that replaces all the methods adds a >> minuscule constant to code size, run time, etc, and it's no less >> readable than assignment to a module attribute. (If anything, it makes >> it more clearly a supported operation - I've seen novices not realize >> that "module.xyz = foo" is valid, but nobody would misunderstand the >> validity of a function call.) > > I was only half-serious about this, but now I think I like it: it provides exactly the fix people are hoping to fix by deprecating the top-level functions, but with less risk, less user code churn, a smaller patch, and a much easier fix for novice users. (And it's much better than my earlier suggestion, too.) > > See https://gist.github.com/abarnert/e0fced7569e7d77f7464 for the patch, and a patched copy of random.py. The source comments in the patch should be enough to understand everything that's changed. This doesn't work with the idiom "from random import random". From p.f.moore at gmail.com Thu Sep 10 10:41:54 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Sep 2015 09:41:54 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 10 September 2015 at 01:01, Donald Stufft wrote: > Essentially, there are three basic types of uses of random (the concept, not > the module). Those are: > > 1. People/usecases who absolutely need deterministic output given a seed and > for whom security properties don't matter. > 2. People/usecases who absolutely need a cryptographically random output and > for whom having a deterministic output is a downside. > 3. People/usecases that fall somewhere in between where it may or may not be > security sensitive or it may not be known if it's security sensitive. Wrong. There is a fourth basic type. People (like me!) whose code absolutely doesn't have any security issues, but want a simple, convenient, fast RNG. Determinism is not an absolute requirement, but is very useful (for writing tests, maybe, or for offering a deterministic rerun option to the program). Simulation-style games often provide a way to find the "map seed", which allows users to share interesting maps - this is non-essential but a big quality-of-life benefit in such games. IMO, the current module perfectly serves this fourth group. While I accept your point that far too many people are using insecure RNGs in "generate a random password" scripts, they are *not* the core target audience of the default module-level functions in the random module (did you find any examples of insecure use that *weren't* password generators?). We should educate people that this is bad practice, not change the module. Also, while it may be imperfect, it's still better than what many people *actually* do, which is to use "password" as a password on sensitive systems :-( Maybe what Python *actually* needs is a good-quality "random password generator" module in the stdlib? (Semi-serious suggestion...) Paul From wolfgang.maier at biologie.uni-freiburg.de Thu Sep 10 10:58:04 2015 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 10 Sep 2015 10:58:04 +0200 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <461D4C7C-6C32-480D-B065-295A623E11D7@yahoo.com> References: <5B23496D-6DBD-49B3-91D7-E093309A84C7@yahoo.com> <55F0AC99.8030408@biologie.uni-freiburg.de> <1441841095.2587236.379354345.340FCF95@webmail.messagingengine.com> <461D4C7C-6C32-480D-B065-295A623E11D7@yahoo.com> Message-ID: <55F1461C.70607@biologie.uni-freiburg.de> On 10.09.2015 02:03, Andrew Barnert via Python-ideas wrote: > On Sep 9, 2015, at 16:24, random832 at fastmail.us wrote: >> >>> On Wed, Sep 9, 2015, at 18:39, Andrew Barnert via Python-ideas wrote: >>> I believe he posted a more detailed version of the idea on one of the >>> other spinoff threads from the f-string thread, but I don't have a link. >>> But there are lots of possibilities, and if you want to start >>> bikeshedding, it doesn't matter that much what his original color was. >>> For example, here's a complete proposal: >>> >>> class MyJoiner: >>> def __init__(self, value): >>> self.value = value >>> def __format__(self, spec): >>> return spec.join(map(str, self.value)) >>> string.register_converter('join', MyJoiner) >> >> Er, I wanted it to be something more like >> >> def __format__(self, spec): >> sep, fmt = # 'somehow' break up spec into two parts > > I covered later in the same message how this simple version could be extended to a smarter version that does that, or even more, without requiring any further changes to str.format. I just wanted to show the simplest version first, and then show that designing for that doesn't lose any flexibility. > Ok, I think I got the idea. One question though: how would you prevent this from getting competely out of hand? > And meanwhile, the alternative seems to be having something similar, but not exposing it publicly, and just baking in a handful of hardcoded converters for join, html, re-escape, etc., and I don't see why str should know about all of those things, or why extending that set when we realize that we forgot about shlex should require a patch to str and a new Python version. > >> The Joiner class wouldn't have to exist as a builtin, it could be >> private to the format function. > > If it's custom-registerable, it can be on PyPI, or in the middle of your app, although of course there could be some converters, maybe including your Joiner, somewhere in the stdlib, or even private to format, as well. > The strength of this idea - flexibility - could also be called its biggest weakness and that is scaring me. Essentially, such converters would be completely free to do anything they want: change their input at will, return something completely unrelated, have side-effects. All of that hidden behind a simple !token in a replacement field. While the idea is really cool and certainly powerful if used responsibly, it could also create completely unreadable code. Just adding one single hardcoded converter for joining iterables looks like a much more reasonable and realistic idea and now that I understand the concept I have to say I really like it. Just paraphrasing once more to see if a understood things correctly this time: The !j converter converts the iterable to an instance of a Joiner class just like !s, !r and !a convert to a str instance. After that conversion the __format__ method of the new object gets called with the format_spec string (which specifies the separator and the inner format spec) as argument and that method produces the joint string. So everything follows the existing logic of a converter and no really new replacement field syntax is required. Great and +1! From tritium-list at sdamon.com Thu Sep 10 11:20:48 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Thu, 10 Sep 2015 05:20:48 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <55F14B70.2080901@sdamon.com> Can I just ask what is the actual problem we are trying to solve here? Python has third party cryptography modules, that bring their own sources of randomness (or cryptography libraries that do the same). Python has a good random library for everything other than cryptography. Why in the heck are we trying to make the random module do something that it is already documented as being a poor choice, where there is already third party modules that do just this? Who needs cryptographic randomness in the standard library anyways (even though one line of code give you access to it)? Have we identified even ONE person who does cryptography in python who is kicking themselves that they cant use the random module as implemented? Is this just indulging a paranoid developer? From stephen at xemacs.org Thu Sep 10 11:41:28 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 10 Sep 2015 18:41:28 +0900 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <7A4752B7-90B8-49CE-9EF6-FF182CE0411D@yahoo.com> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> <874mj30x14.fsf@uwakimon.sk.tsukuba.ac.jp> <7A4752B7-90B8-49CE-9EF6-FF182CE0411D@yahoo.com> Message-ID: <871te61uif.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > No, that's not the problem. Lion came with 2.7.1, so you already > had it before upgrading it, and it's hard to imagine Apple > upgrading your system 2.7.1 to 2.7.6 or 2.7.10 broke anything. More > likely, Apple screwed up your PATH, or broke your MacPorts so you > had to reinstall or repair it? I've had no problems with PATH, personally. I'm just saying that learning that pip was actually version-specific, and then getting the right pip for the current Python of interest, has been an annoyance for me over the years, and I was very happy to switch to "python -m pip" because it Just Works. As far as the question of order of installation, I just wanted to point out that system upgrades do sometimes catch up to the user, resulting in duplicate installations, rather than the user following some blog to the letter and installing a verson they don't need. > I'd assume most people on this list know what they're doing with > their PATH. If you don't, then you just got lucky for a few years. For me, PATH is easy. -m pip is easy. is hard. :-/ > and if you didn't even realize you were running multiple Python 2.7 > versions in parallel, that just means you never tried anything that > MacPorts didn't anticipate. No, it just means that since forever my personal PATH has been set up to give precedence to /usr/local/bin and /opt/local/bin, and since the days of Python 2 I avoid the system Python at all costs. Specifically, I never invoke Python without a full 2-digit version number (except in venvs), and my shebangs specify it too (ditto). (It works out to the same semantics as "not surprising MacPorts", of course.) From luciano at ramalho.org Thu Sep 10 12:01:35 2015 From: luciano at ramalho.org (Luciano Ramalho) Date: Thu, 10 Sep 2015 07:01:35 -0300 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <55F0AC83.3050505@mail.de> Message-ID: Jukka, thank you very much for working on such a hard topic and being patient enough to respond to issues that I am sure were exhaustively discussed before (but I was not following the discussions then since I was in the final sprint for my book, Fluent Python, at the time). I have two questions which were probably already asked before, so feel free to point me to relevant past messages: 1) Why is a whole new hierarchy of types being created in the typing module, instead of continuing the hierarchy in the collections module while enhancing the ABCs already there? For example, why weren't the List and Dict type created under the existing MutableSequence and MutableMapping types in collections.abc? 2) Similarly, I note that PEP-484 shuns existing ABCs like those in the numbers module, and the ByteString ABC. The reasons given are pragmatic, so that users don't need to import the numbers module, and would not "have to write typing.ByteString everywhere." as the PEP says... I don not understand these arguments because: a) as you just wrote in another message, the users will be primarily the authors of libraries and frameworks, who will always be forced to import typing anyhow, so it does not seem such a burden to have them import other modules get the benefits of type hinting; b) alternatively, there could be aliases of the relevant ABCs in the typing module for convenience So the second question is: what's wrong with points (a) and (b), and why did PEP-484 keep such a distance form existing ABCs in general? I understand pragmatic choices, but as a teacher and writer I know such choices are often obstacles to learning because they seem arbitrary to anyone who is not privy to the reasons behind them. So I'd like to better understand the reasoning, and I think PEP-484 is not very persuasive when it comes to the issues I mentioned. Thanks! Best, Luciano -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg From abarnert at yahoo.com Thu Sep 10 12:27:23 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 03:27:23 -0700 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <55F1461C.70607@biologie.uni-freiburg.de> References: <5B23496D-6DBD-49B3-91D7-E093309A84C7@yahoo.com> <55F0AC99.8030408@biologie.uni-freiburg.de> <1441841095.2587236.379354345.340FCF95@webmail.messagingengine.com> <461D4C7C-6C32-480D-B065-295A623E11D7@yahoo.com> <55F1461C.70607@biologie.uni-freiburg.de> Message-ID: <79AE4A57-B698-45A3-84F5-DC65E03C25CA@yahoo.com> On Sep 10, 2015, at 01:58, Wolfgang Maier wrote: > >> On 10.09.2015 02:03, Andrew Barnert via Python-ideas wrote: >>> On Sep 9, 2015, at 16:24, random832 at fastmail.us wrote: >>> >>>> On Wed, Sep 9, 2015, at 18:39, Andrew Barnert via Python-ideas wrote: >>>> I believe he posted a more detailed version of the idea on one of the >>>> other spinoff threads from the f-string thread, but I don't have a link. >>>> But there are lots of possibilities, and if you want to start >>>> bikeshedding, it doesn't matter that much what his original color was. >>>> For example, here's a complete proposal: >>>> >>>> class MyJoiner: >>>> def __init__(self, value): >>>> self.value = value >>>> def __format__(self, spec): >>>> return spec.join(map(str, self.value)) >>>> string.register_converter('join', MyJoiner) >>> >>> Er, I wanted it to be something more like >>> >>> def __format__(self, spec): >>> sep, fmt = # 'somehow' break up spec into two parts >> >> I covered later in the same message how this simple version could be extended to a smarter version that does that, or even more, without requiring any further changes to str.format. I just wanted to show the simplest version first, and then show that designing for that doesn't lose any flexibility. > > Ok, I think I got the idea. > One question though: how would you prevent this from getting competely out of hand? Same way we keep types with weird __format__ methods, nested or multi-clause comprehensions, import hooks, operator overloads like using __ror__ to partial functions, metaclasses, subclass hooks, multiple inheritance, dynamic method lookup, descriptors, etc. from getting completely out of hand: trust users to have some taste, and don't write bad documentation that would convince them to abuse it. :) >> And meanwhile, the alternative seems to be having something similar, but not exposing it publicly, and just baking in a handful of hardcoded converters for join, html, re-escape, etc., and I don't see why str should know about all of those things, or why extending that set when we realize that we forgot about shlex should require a patch to str and a new Python version. >> >>> The Joiner class wouldn't have to exist as a builtin, it could be >>> private to the format function. >> >> If it's custom-registerable, it can be on PyPI, or in the middle of your app, although of course there could be some converters, maybe including your Joiner, somewhere in the stdlib, or even private to format, as well. > > The strength of this idea - flexibility - could also be called its biggest weakness and that is scaring me. Essentially, such converters would be completely free to do anything they want: change their input at will, return something completely unrelated, have side-effects. All of that hidden behind a simple !token in a replacement field. > While the idea is really cool and certainly powerful if used responsibly, it could also create completely unreadable code. There aren't any obvious reasons for anyone to write such unreadable code, so I don't see it being a real attractive nuisance. > Just adding one single hardcoded converter for joining iterables looks like a much more reasonable and realistic idea and now that I understand the concept I have to say I really like it. > > Just paraphrasing once more to see if a understood things correctly this time: > The !j converter converts the iterable to an instance of a Joiner class just like !s, !r and !a convert to a str instance. After that conversion the __format__ method of the new object gets called with the format_spec string (which specifies the separator and the inner format spec) as argument and that method produces the joint string. > > So everything follows the existing logic of a converter and no really new replacement field syntax is required. Great and +1! Yep, and I'm +1 on it as well. But in also at least +0.5 on the custom converter idea, because joining is the fourth idea people have come up with for converters in the past few weeks, and I'd bet there are another few widely-usable ideas, plus some good uses for specific applications (different web frameworks, scientific computing, etc.). When I get a chance, I'll hack something up to play with it and see if it's as useful as I'm expecting. From abarnert at yahoo.com Thu Sep 10 12:33:08 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 03:33:08 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> <8A294D36-C40F-405F-BB2E-94CD379B8165@yahoo.com> Message-ID: <178619C8-5587-4069-AC6F-D7AC8A65C6CD@yahoo.com> On Sep 10, 2015, at 01:32, Serhiy Storchaka wrote: > >> On 10.09.15 11:17, Andrew Barnert via Python-ideas wrote: >>> On Sep 9, 2015, at 23:08, Chris Angelico wrote: >>> On Thu, Sep 10, 2015 at 11:50 AM, Andrew Barnert via Python-ideas >>> wrote: >>>> Of course it adds the cost of making the module slower, and also more complex. Maybe a better solution would be to add a random.set_default_instance function that replaced all of the top-level functions with bound methods of the instance (just like what's already done at startup in random.py)? That's simple, and doesn't slow down anything, and it seems like it makes it more clear what you're doing than setting random.inst. >>> >>> +1. A single function call that replaces all the methods adds a >>> minuscule constant to code size, run time, etc, and it's no less >>> readable than assignment to a module attribute. (If anything, it makes >>> it more clearly a supported operation - I've seen novices not realize >>> that "module.xyz = foo" is valid, but nobody would misunderstand the >>> validity of a function call.) >> >> I was only half-serious about this, but now I think I like it: it provides exactly the fix people are hoping to fix by deprecating the top-level functions, but with less risk, less user code churn, a smaller patch, and a much easier fix for novice users. (And it's much better than my earlier suggestion, too.) >> >> See https://gist.github.com/abarnert/e0fced7569e7d77f7464 for the patch, and a patched copy of random.py. The source comments in the patch should be enough to understand everything that's changed. > > This doesn't work with the idiom "from random import random". Well, the goal of the deprecation idea was to eventually get people to explicitly use instances, so the fact that doesn't work out of the box is a good thing, not a problem. But for people just trying to retrofit existing code, all they have to do is call random.set_default_instance at the top of the main module, and all their other modules can just import what they need this way. Which is why it's better than straightforward deprecation. From donald at stufft.io Thu Sep 10 13:26:56 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 10 Sep 2015 07:26:56 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 10, 2015 at 4:41:56 AM, Paul Moore (p.f.moore at gmail.com) wrote: > On 10 September 2015 at 01:01, Donald Stufft wrote: > > Essentially, there are three basic types of uses of random (the concept, not > > the module). Those are: > > > > 1. People/usecases who absolutely need deterministic output given a seed and > > for whom security properties don't matter. > > 2. People/usecases who absolutely need a cryptographically random output and > > for whom having a deterministic output is a downside. > > 3. People/usecases that fall somewhere in between where it may or may not be > > security sensitive or it may not be known if it's security sensitive. > > Wrong. > > There is a fourth basic type. People (like me!) whose code absolutely > doesn't have any security issues, but want a simple, convenient, fast > RNG. Determinism is not an absolute requirement, but is very useful > (for writing tests, maybe, or for offering a deterministic rerun > option to the program). Simulation-style games often provide a way to > find the "map seed", which allows users to share interesting maps - > this is non-essential but a big quality-of-life benefit in such games. This group is the same as #3 except for the map seed thing which is group #1. In particular, it wouldn?t hurt you if the random you were using was cryptographically secure as long as it was fast and if you needed determinism, it would hurt you to say so. Which is the?point that Theo was making. > > IMO, the current module perfectly serves this fourth group. Making the user pick between Deterministic and Secure random would serve this purpose too, especially in a language where "In the face of ambiguity, refuse the temptation to guess" is one of the core tenets of the language. The largest downside would be typing a few extra characters, which Python is not a language that attempts to do things in the fewest number of characters.? > > While I accept your point that far too many people are using insecure > RNGs in "generate a random password" scripts, they are *not* the core > target audience of the default module-level functions in the random > module (did you find any examples of insecure use that *weren't* > password generators?). We should educate people that this is bad > practice, not change the module. Also, while it may be imperfect, it's > still better than what many people *actually* do, which is to use > "password" as a password on sensitive systems :-( You cannot document your way out of a UX problem. The problem isn?t people doing this once on the command line to generate a password, the problem is people doing it in applications where they generate an API key, a session identifier, a random password which they then give to their users. If you give a way to get the output of the?MT base random enough times, it can be used to determine?what every random it generated was and will be. Here?s a game a friend of mine created where the purpose of the game is to essentially unrandomize some random data, which is only possible because it?s (purposely) using MT to make it possible https://github.com/reaperhulk/dsa-ctf. This is not an ivory tower paranoia case, it?s a real concern that will absolutely fix some insecure software out there instead of telling them ?welp typing a little bit extra once an import is too much of a burden for me and really it?s your own fault anyways?. >? > Maybe what Python *actually* needs is a good-quality "random password > generator" module in the stdlib? (Semi-serious suggestion...) > > Paul > ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From donald at stufft.io Thu Sep 10 13:40:41 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 10 Sep 2015 07:40:41 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <55F14B70.2080901@sdamon.com> References: <55F14B70.2080901@sdamon.com> Message-ID: On September 10, 2015 at 5:21:29 AM, Alexander Walters (tritium-list at sdamon.com) wrote: > > Why in the heck are we trying to make the random module do something > that it is already documented as being a poor choice, where there > is > already third party modules that do just this? > > Who needs cryptographic randomness in the standard library > anyways (even > though one line of code give you access to it)? Have we identified > even > ONE person who does cryptography in python who is kicking themselves > that they cant use the random module as implemented? Because there are a situations where you need a securely generated randomness where you are *NOT* "doing cryptography". Blaming people for the fact that the random module has a bad UX that naturally leads them to use it when it isn't appropriate is a shitty thing to do. What harm is there in making people explicitly choose between deterministic randomness and secure randomness? Is your use case so much better than theirs that you thing you deserve to type a few characters less to the detriment of people who don't know any better? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From p.f.moore at gmail.com Thu Sep 10 14:29:13 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Sep 2015 13:29:13 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 10 September 2015 at 12:26, Donald Stufft wrote: >> There is a fourth basic type. People (like me!) whose code absolutely >> doesn't have any security issues, but want a simple, convenient, fast >> RNG. Determinism is not an absolute requirement, but is very useful >> (for writing tests, maybe, or for offering a deterministic rerun >> option to the program). Simulation-style games often provide a way to >> find the "map seed", which allows users to share interesting maps - >> this is non-essential but a big quality-of-life benefit in such games. > > This group is the same as #3 except for the map seed thing which is > group #1. In particular, it wouldn?t hurt you if the random you were > using was cryptographically secure as long as it was fast and if you > needed determinism, it would hurt you to say so. Which is the point > that Theo was making. I don't understand the phrase "if you needed determinism, it would hurt you to say so". Could you clarify? >> >> IMO, the current module perfectly serves this fourth group. > > Making the user pick between Deterministic and Secure random would serve > this purpose too, especially in a language where "In the face of ambiguity, > refuse the temptation to guess" is one of the core tenets of the language. The > largest downside would be typing a few extra characters, which Python is not > a language that attempts to do things in the fewest number of characters. And yet I know that I would routinely, and (this is the problem) without thinking, choose Deterministic, because I know that my use cases all get a (small) benefit from being able to capture the seed, but I also know I'm not doing security-related stuff. No amount of making me choose is going to help me spot security implications that I've missed. And also, calling the non-crypto choice "Deterministic" is unhelpful, because I *don't* want something deterministic, I want something random (I understand PRNGs aren't truly random, but "good enough for my purposes" is what I want, and "deterministic" reads to me as saying it's *not* good enough...) >> While I accept your point that far too many people are using insecure >> RNGs in "generate a random password" scripts, they are *not* the core >> target audience of the default module-level functions in the random >> module (did you find any examples of insecure use that *weren't* >> password generators?). We should educate people that this is bad >> practice, not change the module. Also, while it may be imperfect, it's >> still better than what many people *actually* do, which is to use >> "password" as a password on sensitive systems :-( > > You cannot document your way out of a UX problem. What I'm trying to say is that this is an education problem more than a UX problem. Personally, I think I know enough about security for my (not a security specialist) purposes. To that extent, if I'm working on something with security implications, I'm looking for things that say "Crypto" in the name. The rest of the time, I just use non-specialist stuff. It's a similar situation to that of the "statistics" module. If I'm doing "proper" maths, I'd go for numpy/scipy. If I just want some averages and I'm not bothered about numerical stability, rounding behaviour, etc, I'd go for the stdlib statistics package. > The problem isn?t people doing this once on the command line to generate > a password, the problem is people doing it in applications where they > generate an API key, a session identifier, a random password which they > then give to their users. If you give a way to get the output of the MT > base random enough times, it can be used to determine what every random > it generated was and will be. To me, that's crypto and I'd look to the cryptography module, or to something in the stdlib that explicitly said it was suitable for crypto. Saying people write bad code isn't enough - how does the current module *encourage* them to write bad code? How much API change must we allow to cater for people who won't read the statement in the docs (in a big red box) "Warning: The pseudo-random generators of this module should not be used for security purposes." (Specifically people writing security related code who won't read the docs). > Here?s a game a friend of mine created where the purpose of the game is > to essentially unrandomize some random data, which is only possible > because it?s (purposely) using MT to make it possible > https://github.com/reaperhulk/dsa-ctf. This is not an ivory tower paranoia > case, it?s a real concern that will absolutely fix some insecure software > out there instead of telling them ?welp typing a little bit extra once > an import is too much of a burden for me and really it?s your own fault > anyways?. I don't understand how that game (which is an interesting way of showing people how attacks on crypto work, sure, but that's just education, which you dismissed above) relates to the issue here. And I hope you don't really think that your quote is even remotely what I'm trying to say (I'm not that selfish) - my point is that not everything is security related. Not every application people write, and not every API in the stdlib. You're claiming that the random module is security related. I'm claiming it's not, it's documented as not being, and that's clear to the people who use it for its intended purpose. Telling those people that you want to make a module designed for their use harder to use because people for whom it's not intended can't read the documentation which explicitly states that it's not suitable for them, is doing a disservice to those people who are already using the module correctly for its stated purpose. By the same argument, we should remove the statistics module because it can be used by people with numerically unstable problems. (I doubt you'll find StackOverflow questions along these lines yet, but that's only because (a) the module's pretty new, and (b) it actually works pretty hard to handle the hard corner cases, but I bet they'll start turning up in due course, if only from the people who don't understand floating point...) Paul From contact at ionelmc.ro Thu Sep 10 14:07:14 2015 From: contact at ionelmc.ro (=?UTF-8?Q?Ionel_Cristian_M=C4=83rie=C8=99?=) Date: Thu, 10 Sep 2015 15:07:14 +0300 Subject: [Python-ideas] PyPI search still broken In-Reply-To: <20150909230130.GA14415@k3> References: <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> <20150909230130.GA14415@k3> Message-ID: Wouldn't it be better if you'd just build an external search service? Getting a list of packages and descriptions should be possible no? (just asking, not 100% sure) I doubt the maintainers are just going to come out and say "ok, this guy has waited long enough, lets take his contribution in". If they didn't care about the search 2.5 years ago why would they care now. Sorry for being snide here but my impression is that Warehouse could had been shipped a while ago instead of getting rewritten ? ? s ?everal times.? I'm not saying that's bad, it's just that there's a mismatch in goals here. Thanks, -- Ionel Cristian M?rie? On Thu, Sep 10, 2015 at 2:01 AM, David Wilson wrote: > Hi there, > > My 2.5 year old offer to retrofit the old codebase with a new search > system still stands[1]. :) There is no reason for this to be a complex > affair, the prototype built back then took only a few hours to complete. > > No doubt the long term answer is probably "Warehouse fixes this", but > Warehouse seems no nearer a reality than it did in 2013. > > > David > > [1] > https://groups.google.com/forum/#!search/%22david$20wilson%22$20search$20pypi/pypa-dev/ZjUNkczsKos/2et8926YOQYJ > > On Thu, Sep 10, 2015 at 12:35:04AM +0200, Giovanni Cannata wrote: > > Hi, sorry to bother you again, but the search problem on PyPI is still > present > > after different weeks and it's very annoying. I've just released a new > version > > of my ldap3 project and it doesn't show up when searching with its name. > For > > mine (and I suppose for other emerging project, especially related to > Python 3) > > it's vital to be easily found by other developers that use pip and PyPI > as THE > > only repository for python packages and using the number of download as a > > ranking of popularity of a project. > > > > If search can't be fixed there should be at least a warning on the PyPI > > homepage to let users know that search is broken and that using Google > for > > searching could help to find more packages. > > > > Bye, > > Giovanni > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Thu Sep 10 15:02:43 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 10 Sep 2015 09:02:43 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F0BF61.6050205@canterbury.ac.nz> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> Message-ID: <1441890163.3120507.379846857.49842A96@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 19:23, Greg Ewing wrote: > Another property that's important for some applications is > to be able to efficiently "jump ahead" some number of steps > in the sequence, to produce multiple independent streams of > numbers. It would be good to know if that is possible with > arc4random. Being able to produce multiple independent streams of numbers is the important feature. Doing it by "jumping ahead" seems less so. And the need for doing it "efficiently" isn't as clear either - how many streams do you need? From donald at stufft.io Thu Sep 10 15:10:09 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 10 Sep 2015 09:10:09 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 10, 2015 at 8:29:16 AM, Paul Moore (p.f.moore at gmail.com) wrote: > On 10 September 2015 at 12:26, Donald Stufft wrote: > >> There is a fourth basic type. People (like me!) whose code absolutely > >> doesn't have any security issues, but want a simple, convenient, fast > >> RNG. Determinism is not an absolute requirement, but is very useful > >> (for writing tests, maybe, or for offering a deterministic rerun > >> option to the program). Simulation-style games often provide a way to > >> find the "map seed", which allows users to share interesting maps - > >> this is non-essential but a big quality-of-life benefit in such games. > > > > This group is the same as #3 except for the map seed thing which is > > group #1. In particular, it wouldn?t hurt you if the random you were > > using was cryptographically secure as long as it was fast and if you > > needed determinism, it would hurt you to say so. Which is the point > > that Theo was making. > > I don't understand the phrase "if you needed determinism, it would > hurt you to say so". Could you clarify? I transposed some words, fixed: "If you needed determinism, would it hurt you to say so?"" Essentially, other than typing a little bit more, why is: ? ? import random ? ? print(random.choice([?a?, ?b?, ?c?])) better than ? ? import random; ? ? print(random.DetereministicRandom().choice([?a?, ?b?, ?C?])) As far as I can tell, you've made your code and what properties it has much clearer to someone reading it at the cost of 22 characters. If you're going to reuse the DeterministicRandom class you can assign it to a variable and actually end up saving characters if the variable you save it to can be accessed at less than 6 characters. > > >> > >> IMO, the current module perfectly serves this fourth group. > > > > Making the user pick between Deterministic and Secure random would serve > > this purpose too, especially in a language where "In the face of ambiguity, > > refuse the temptation to guess" is one of the core tenets of the language. The > > largest downside would be typing a few extra characters, which Python is not > > a language that attempts to do things in the fewest number of characters. > > And yet I know that I would routinely, and (this is the problem) > without thinking, choose Deterministic, because I know that my use > cases all get a (small) benefit from being able to capture the seed, > but I also know I'm not doing security-related stuff. > > No amount of making me choose is going to help me spot security > implications that I've missed. You're allowed to pick DeterministicRandom, you're even allowed to do it without thinking. This isn't about making it impossible to ever insecurely use random numbers, that's obviously a boil the ocean level of problem, this is about trying to make it more likely that someone won't be hit by a fairly easy to hit footgun if it does matter for them, even if they don't know it. It's also about making code that is easier to understand on the surface, for example without using the prior knowledge that it's using MT, tell me how you'd know if this was safe or not: ? ? import random ? ? import string ? ? password = "".join(random.choice(string.ascii_letters) for _ in range(9)) ? ? print("Your random password is",) > > And also, calling the non-crypto choice "Deterministic" is unhelpful, > because I *don't* want something deterministic, I want something > random (I understand PRNGs aren't truly random, but "good enough for > my purposes" is what I want, and "deterministic" reads to me as saying > it's *not* good enough?) But you *DO* want something deterministic, the *ONLY* way you can get this small benefit of capturing the seed is if you can put that seed back into the system and get a deterministic result. If the seed didn?t exactly determine the output of the randomness then you wouldn't be able to do that. If you don't need to be able to capture the seed and essentially "replay" the PRNG in a deterministic way then there is exactly zero downsides to using a CSPRNG other than speed, which is why Theo suggested using a very fast, modern CSPRNG to solve the speed issues. Can you point out one use case where cryptographically safe random numbers, assuming we could generate them as quickly as you asked for them, would hurt you unless you needed/wanted to be able to save the seed and thus require or want deterministic results? > > >> While I accept your point that far too many people are using insecure > >> RNGs in "generate a random password" scripts, they are *not* the core > >> target audience of the default module-level functions in the random > >> module (did you find any examples of insecure use that *weren't* > >> password generators?). We should educate people that this is bad > >> practice, not change the module. Also, while it may be imperfect, it's > >> still better than what many people *actually* do, which is to use > >> "password" as a password on sensitive systems :-( > > > > You cannot document your way out of a UX problem. > > What I'm trying to say is that this is an education problem more than > a UX problem. > > Personally, I think I know enough about security for my (not a > security specialist) purposes. To that extent, if I'm working on > something with security implications, I'm looking for things that say > "Crypto" in the name. The rest of the time, I just use non-specialist > stuff. It's a similar situation to that of the "statistics" module. If > I'm doing "proper" maths, I'd go for numpy/scipy. If I just want some > averages and I'm not bothered about numerical stability, rounding > behaviour, etc, I'd go for the stdlib statistics package. > > > The problem isn?t people doing this once on the command line to generate > > a password, the problem is people doing it in applications where they > > generate an API key, a session identifier, a random password which they > > then give to their users. If you give a way to get the output of the MT > > base random enough times, it can be used to determine what every random > > it generated was and will be. > > To me, that's crypto and I'd look to the cryptography module, or to > something in the stdlib that explicitly said it was suitable for > crypto. > > Saying people write bad code isn't enough - how does the current > module *encourage* them to write bad code? How much API change must we > allow to cater for people who won't read the statement in the docs (in > a big red box) "Warning: The pseudo-random generators of this module > should not be used for security purposes." (Specifically people > writing security related code who won't read the docs). Reminder that this warning does not show up (in any color, much less red) if you?re using ``help(random)`` or ``dir(random)`` to explore the random module. It also does not show up in code review when you see someone doing random.random. It encourages you to write bad code, because it has a baked in assumption that there is a sane default for a random number generator and expects people to understand a fairly dificult concept, which is that not all "random" is equal. For instance, you've already made the mistake of saying you wanted "random" not deterministic, but the two are not mutually exlusive and deterministic is a property that a source of random can have, and one that you need for one of the features you say you like.? > > > Here?s a game a friend of mine created where the purpose of the game is > > to essentially unrandomize some random data, which is only possible > > because it?s (purposely) using MT to make it possible > > https://github.com/reaperhulk/dsa-ctf. This is not an ivory tower paranoia > > case, it?s a real concern that will absolutely fix some insecure software > > out there instead of telling them ?welp typing a little bit extra once > > an import is too much of a burden for me and really it?s your own fault > > anyways?. > > I don't understand how that game (which is an interesting way of > showing people how attacks on crypto work, sure, but that's just > education, which you dismissed above) relates to the issue here. > > And I hope you don't really think that your quote is even remotely > what I'm trying to say (I'm not that selfish) - my point is that not > everything is security related. Not every application people write, > and not every API in the stdlib. You're claiming that the random > module is security related. I'm claiming it's not, it's documented as > not being, and that's clear to the people who use it for its intended > purpose. Telling those people that you want to make a module designed > for their use harder to use because people for whom it's not intended > can't read the documentation which explicitly states that it's not > suitable for them, is doing a disservice to those people who are > already using the module correctly for its stated purpose. I'm claiming that the term random is ambiguously both security related and not security related and we should either get rid of the default and expect people to pick whether or not their use case is security related, or we should assume that it is unless otherwise instructed. I don't particularly care what the exact spelling of this looks like, random.(System|Secure)Random and random.DeterministicRandom is just one option. Another option is to look at something closer to what Go did and deprecate the "random" module and move the MT based thing to ``math.random`` and the CSPRNG can be moved to something like crypto.random. > > By the same argument, we should remove the statistics module because > it can be used by people with numerically unstable problems. (I doubt > you'll find StackOverflow questions along these lines yet, but that's > only because (a) the module's pretty new, and (b) it actually works > pretty hard to handle the hard corner cases, but I bet they'll start > turning up in due course, if only from the people who don't understand > floating point...) > No, by this argument we shouldn't have a function called statistics in the statistics module because there is no globally "right" answer for what the default should be. Should it be mean? mode? median? Why is *your* use case the "right" use case for the default option, particularly in a situation where picking the wrong option can be disastrous. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From random832 at fastmail.us Thu Sep 10 15:13:39 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 10 Sep 2015 09:13:39 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <1441890819.3122699.379856193.3A628B5B@webmail.messagingengine.com> On Thu, Sep 10, 2015, at 08:29, Paul Moore wrote: > And also, calling the non-crypto choice "Deterministic" is unhelpful, > because I *don't* want something deterministic, I want something > random (I understand PRNGs aren't truly random, but "good enough for > my purposes" is what I want, and "deterministic" reads to me as saying > it's *not* good enough...) I don't understand why. What other word would you use to describe a generator that can be given a specific set of inputs to generate the same exact sequence of numbers every single time? If you want that feature, then you're not going to think "deterministic" means "not good enough". And if you don't want it, you, well, don't want it, so there's really no harm in the fact that you don't choose it. Personally, though, I don't see why we're not talking about calling it MersenneTwister. From skrah at bytereef.org Thu Sep 10 15:39:31 2015 From: skrah at bytereef.org (Stefan Krah) Date: Thu, 10 Sep 2015 13:39:31 +0000 (UTC) Subject: [Python-ideas] Should our default random number generator be secure? References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> Message-ID: M.-A. Lemburg writes: > Reading this thread is fun, but it doesn't seem to be getting > anywhere - perhaps that's part of the fun > > Realistically, I see two options: > > 1. Someone goes and implements the OpenBSD random function in C > and put a package up on PyPI, updating it whenever OpenBSD > thinks that a new algorithm is needed or a security issue > has to be fixed (from my experience with other crypto software > like OpenSSL, this should be on the order of every 2-6 months ) The sane option would be to use the OpenBSD libcrypto, which seems to be part of their OpenSSL fork (libressl), just like libcrypto is part of OpenSSL. Then the crypto maintenance would be delegated to the distributions. I would even be interested in writing such a package, but it would be external and non-redistributable for well-known reasons. :) Stefan Krah From p.f.moore at gmail.com Thu Sep 10 15:44:11 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Sep 2015 14:44:11 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 10 September 2015 at 14:10, Donald Stufft wrote: >> I don't understand the phrase "if you needed determinism, it would >> hurt you to say so". Could you clarify? > > I transposed some words, fixed: > > "If you needed determinism, would it hurt you to say so?"" Thanks. In one sense, no it wouldn't. Nor would it matter to me if "the default random number generator" was fast and cryptographically secure. What matters is just that I get a load of random (enough) numbers. What hurts somewhat (not enormously, I'll admit) is up front having to think about whether I need to be able to capture a seed and replay it. That's nearly always something I'd think of way down the line, as a "wouldn't it be nice if I could get the user to send me a reproducible test case" or something like that. And of course it's just a matter of switching the underlying RNG at that point. None of this is hard. But once again, I'm currently using the module correctly, as documented. I've omitted most of the rest of your response largely because we're probably just going to have to agree to differ. I'm probably too worn out being annoyed at the way that everything ends up needing to be security related, and the needs of people who won't read the docs determines API design, to respond clearly and rationally :-( Paul From graffatcolmingov at gmail.com Thu Sep 10 15:44:26 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Thu, 10 Sep 2015 08:44:26 -0500 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <1441890819.3122699.379856193.3A628B5B@webmail.messagingengine.com> References: <1441890819.3122699.379856193.3A628B5B@webmail.messagingengine.com> Message-ID: On Thu, Sep 10, 2015 at 8:13 AM, wrote: > On Thu, Sep 10, 2015, at 08:29, Paul Moore wrote: >> And also, calling the non-crypto choice "Deterministic" is unhelpful, >> because I *don't* want something deterministic, I want something >> random (I understand PRNGs aren't truly random, but "good enough for >> my purposes" is what I want, and "deterministic" reads to me as saying >> it's *not* good enough...) > > I don't understand why. What other word would you use to describe a > generator that can be given a specific set of inputs to generate the > same exact sequence of numbers every single time? > > If you want that feature, then you're not going to think "deterministic" > means "not good enough". And if you don't want it, you, well, don't want > it, so there's really no harm in the fact that you don't choose it. > > Personally, though, I don't see why we're not talking about calling it > MersenneTwister. Because while we want to reduce foot guns, we don't want to reduce usability. DeterministicRandom is fairly easy for anyone to understand. I would venture a guess that most people looking for that wouldn't know (or care) what the backing algorithm is. Further, if we stop using mersenne twister in the future, we would have to remove that class name. DeterministicRandom can be agnostic of the underlying algorithm and is friendlier to people who don't need to know or care about what algorithm is generating the numbers, they only need to understand the properties of that generator. From random832 at fastmail.us Thu Sep 10 15:55:03 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 10 Sep 2015 09:55:03 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <1441890819.3122699.379856193.3A628B5B@webmail.messagingengine.com> Message-ID: <1441893303.3132414.379896217.7C80B332@webmail.messagingengine.com> On Thu, Sep 10, 2015, at 09:44, Ian Cordasco wrote: > Because while we want to reduce foot guns, we don't want to reduce > usability. DeterministicRandom is fairly easy for anyone to > understand. I would venture a guess that most people looking for that > wouldn't know (or care) what the backing algorithm is. Further, if we > stop using mersenne twister in the future, we would have to remove > that class name. If we're serious about being deterministic, then we should keep that class under that name and provide a new class for the new algorithm. What's the point of having a deterministic algorithm if you can't reproduce your results in the new version because the algorithm was deleted? From graffatcolmingov at gmail.com Thu Sep 10 15:56:05 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Thu, 10 Sep 2015 08:56:05 -0500 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Thu, Sep 10, 2015 at 8:44 AM, Paul Moore wrote: > On 10 September 2015 at 14:10, Donald Stufft wrote: >>> I don't understand the phrase "if you needed determinism, it would >>> hurt you to say so". Could you clarify? >> >> I transposed some words, fixed: >> >> "If you needed determinism, would it hurt you to say so?"" > > Thanks. > > In one sense, no it wouldn't. Nor would it matter to me if "the > default random number generator" was fast and cryptographically > secure. What matters is just that I get a load of random (enough) > numbers. > > What hurts somewhat (not enormously, I'll admit) is up front having to > think about whether I need to be able to capture a seed and replay it. > That's nearly always something I'd think of way down the line, as a > "wouldn't it be nice if I could get the user to send me a reproducible > test case" or something like that. And of course it's just a matter of > switching the underlying RNG at that point. > > None of this is hard. But once again, I'm currently using the module > correctly, as documented. No one in this thread is accusing everyone of using the module incorrectly. The fact that you do use it correctly is a testament to the fact that you read the docs carefully and have some level of experience with the module to know that you're using it correctly. > I've omitted most of the rest of your response largely because we're > probably just going to have to agree to differ. I'm probably too worn > out being annoyed at the way that everything ends up needing to be > security related, and the needs of people who won't read the docs > determines API design, to respond clearly and rationally :-( I think the people Theo, Donald, and others (including myself) are worried about are the people who have used some book or online tutorial to write games in Python and have seen random.random() or random.choice() used. Later on they start working on something else (including but not limited to the examples of what Donald has otherwise pointed out). They also have enough experience with the random module to know it produced randomness (what kind, they don't know... in fact they probably don't know there are different kinds yet) and they use what they know because Python has batteries included and they're awesome and easy to use. The reality is that past experiences bias current decisions. If that person went and read the docs, they probably won't know if what they're doing warrants using a CSPRNG instead of the default Python one. If they're not willing to learn, or read enough (and I stress enough) (or just really don't have the time because this is a side project) about the topic before making a decision, they'll say "Well the module level functions seemed random enough to me, so I'll just use those". That could end up being rather awful for them. The reality is that your past experiences (and other people's past experiences, especially those who refuse to do some research and are demanding others prove that these are insecure with examples) are biasing this discussion because they fail to empathize with new users whose past experiences are coloring their decisions. People choose Python for a variety of reasons, and one of those reasons is that in their past experience it was "fast enough" to be an acceptable choice. This is how most people behave. Being angry at people for reading a two sentence long warning in the middle of the docs isn't helping anyone or arguing the validity of this discussion. From graffatcolmingov at gmail.com Thu Sep 10 15:57:40 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Thu, 10 Sep 2015 08:57:40 -0500 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <1441893303.3132414.379896217.7C80B332@webmail.messagingengine.com> References: <1441890819.3122699.379856193.3A628B5B@webmail.messagingengine.com> <1441893303.3132414.379896217.7C80B332@webmail.messagingengine.com> Message-ID: On Thu, Sep 10, 2015 at 8:55 AM, wrote: > On Thu, Sep 10, 2015, at 09:44, Ian Cordasco wrote: >> Because while we want to reduce foot guns, we don't want to reduce >> usability. DeterministicRandom is fairly easy for anyone to >> understand. I would venture a guess that most people looking for that >> wouldn't know (or care) what the backing algorithm is. Further, if we >> stop using mersenne twister in the future, we would have to remove >> that class name. > > If we're serious about being deterministic, then we should keep that > class under that name and provide a new class for the new algorithm. > What's the point of having a deterministic algorithm if you can't > reproduce your results in the new version because the algorithm was > deleted? This is totally off topic. That said as a counter-point: What's the point of carrying around code you don't want people to use if they're just going to use it anyway? From donald at stufft.io Thu Sep 10 16:21:11 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 10 Sep 2015 10:21:11 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 10, 2015 at 9:44:13 AM, Paul Moore (p.f.moore at gmail.com) wrote: > On 10 September 2015 at 14:10, Donald Stufft wrote: > >> I don't understand the phrase "if you needed determinism, it would > >> hurt you to say so". Could you clarify? > > > > I transposed some words, fixed: > > > > "If you needed determinism, would it hurt you to say so?"" > > Thanks. > > In one sense, no it wouldn't. Nor would it matter to me if "the > default random number generator" was fast and cryptographically > secure. What matters is just that I get a load of random (enough) > numbers. > > What hurts somewhat (not enormously, I'll admit) is up front having to > think about whether I need to be able to capture a seed and replay it. > That's nearly always something I'd think of way down the line, as a > "wouldn't it be nice if I could get the user to send me a reproducible > test case" or something like that. And of course it's just a matter of > switching the underlying RNG at that point. >? > None of this is hard. But once again, I'm currently using the module > correctly, as documented. This is actually exactly why Theo suggested using a modern, userland CSPRNG because it can generate random numbers faster than /dev/urandom can and, unless you need deterministic results, there's little downside to doing so.? There's really two possible ideas here that depends on what sort of balance we'd want to strike. We can make a default "I don't want to think about it" implementation of random that is both *generally* secure and fast, however it won't be deterministic and you won't be able to explicitly seed it. This would be a backwards compatible change [1] for people who are simply calling these functions [2]: ? ? random.getrandbits ? ? random.randrange ? ? random.randint ? ? random.choice ? ? random.shuffle ? ? random.sample ? ? random.random ? ? random.uniform ? ? random.triangular ? ? random.betavariate ? ? random.expovariate ? ? random.gammavariate ? ? random.gauss ? ? random.lognormvariate ? ? random.normalvariate ? ? random.vonmisesvariate ? ? random.paretovariate ? ? random.weibullvariate If this were all that the top level functions in random.py provided we could simply replace the default and people wouldn't notice, they'd just automatically get safer randomness whether that's actually useful for their use case or not. However, random.py also has these functions: ? ? random.seed ? ? random.getstate ? ? random.setstate ? ? random.jumpahead and these functions are where the problem comes. These functions only really make sense for deterministic sources of random which are not "safe" for use in security sensitive applications. So pretending for a moment that we've already decided to do "something" about this, the question boils down to what do we do about these 4 functions. Either we can change the default to a secure CSPRNG and break these functions (and the people using them) which is however easily fixed by changing ``import random`` to ``import random; random = random.DeterministicRandom()`` or we can deprecate the top level functions and try to guide people to choose up front what kind of random they need. Either of these solutions will end up with people being safer and, if we pretend we've agreed to do "something", it comes down to whether we'd prefer breaking compatability for some people while keeping a default random generator that is probably good enough for most people, or if we'd prefer to not break compatability and try to push people to always deciding what kind of random they want. Of course, we still haven't decided that we should do "something", I think that we should because I think that secure by default (or at least, not insecure by default) is a good situation to be in. Over the history of computing it's been shown that time and time again that trying to document or educate users is error prone and doesn't scale, but if you can design APIs to make the "right" thing obvious and opt-out and require opting in to specialist [3] cases which require some particular property. [1] Assuming Theo's claim of the speed of the ChaCha based arc4random function ? ? is accurate, which I haven't tested but I assume he's smart enough to know ? ? what he's talking about WRT to speed of it. [2] I believe anyways, I don't think that any of these rely on the properties ? ? of MT or a deterministic source of random, just a source of random. [3] In this case, their are two specialist use cases, those that require ? ? deterministic results and those that require specific security properties ? ? that are not satisified by a userland CSPRNG because a userland CSPRNG is ? ? not as secure as /dev/urandom but is able to be much faster. > > I've omitted most of the rest of your response largely because we're > probably just going to have to agree to differ. I'm probably too worn > out being annoyed at the way that everything ends up needing to be > security related, and the needs of people who won't read the docs > determines API design, to respond clearly and rationally :-( > > Paul > ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From p.f.moore at gmail.com Thu Sep 10 17:02:00 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Sep 2015 16:02:00 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 10 September 2015 at 15:21, Donald Stufft wrote: > which is however > easily fixed by changing ``import random`` to > ``import random; random = random.DeterministicRandom()`` or we can deprecate Switching (somewhat hypocritically :-)) from an "I'm a naive user" stance, to talking about deeper issues as if I knew what I was talking about, this change results in each module getting a separate instance of the generator. That has implications on the risks of correlated results. It's unlikely to cause issues in real life, conceded. Paul From ncoghlan at gmail.com Thu Sep 10 17:36:39 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 01:36:39 +1000 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55F0E5C9.6030509@brenbarn.net> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> <55F0E5C9.6030509@brenbarn.net> Message-ID: On 10 September 2015 at 12:07, Brendan Barnwell wrote: > On 2015-09-09 14:50, Andrew Barnert via Python-ideas wrote: >> >> Well, have you read the answers given by Nick, me, and others earlier >> in the thread? If so, what do you disagree with? You've only >> addressed one point (that % is faster than {} for simple cases--and >> your solution is just "make {} faster", which may not be possible >> given that it's inherently more hookable than % and therefore >> requires more function calls...). What about formatting headers for >> ASCII wire protocols, sharing tables of format strings between >> programming languages (e.g., for i18n), or any of the other reasons >> people have brought up? > > > This getting off on a tangent, but I don't see most of those as > super compelling. Any programming language can use whatever formatting > scheme it likes. Keeping %-substitutions around helps in sharing format > strings only with other languages that use exactly the same formatting > style. So it's not like % has any intrinsic gain; it just happens to > interoperate with some other particular stuff. That's nice, but I don't > think it makes sense to keep things in Python just so it can interoperate in > specific ways with specific other languages that use less-readable syntax. This perspective doesn't grant enough credit to the significance of C in general, and the C ABI in particular, in the overall computing landscape. While a lot of folks have put a lot of work into making it possible to write software without needing to learn the details of what's happening at the machine level, it's still the case that the *one* language binding interface that *every* language runtime ends up including is being able to load and run C libraries. It's also the case that for any new CPU architecture, one of the first things people will do is bootstrap a C compiler for it, as that then lets them bootstrap a whole host of other things (including Python). For anyone that wants to make the transition from high level programming to low level programming, or vice-versa, C is also the common language understood by both software developers and computer systems engineers. There *are* some worthy contenders out there that may eventually topple C's permissive low level memory access model from its position of dominance (I personally have high hopes for Rust), but that's not going to be a quick process. Regards, Nick. P.S. It's also worth remembering than many Pythonistas, including members of the core development team, happily switch between programming languages according to the task at hand. Python can still be our *preferred* language without becoming the *only* language we use :) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Thu Sep 10 17:46:28 2015 From: brett at python.org (Brett Cannon) Date: Thu, 10 Sep 2015 15:46:28 +0000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F13EAF.5040500@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> Message-ID: On Thu, 10 Sep 2015 at 01:26 M.-A. Lemburg wrote: > Reading this thread is fun, but it doesn't seem to be getting > anywhere - perhaps that's part of the fun ;-) > > Realistically, I see two options: > > 1. Someone goes and implements the OpenBSD random function in C > and put a package up on PyPI, updating it whenever OpenBSD > thinks that a new algorithm is needed or a security issue > has to be fixed (from my experience with other crypto software > like OpenSSL, this should be on the order of every 2-6 months ;-)) > > 2. Ditto, but we put the module in the stdlib and then run around > issuing patch level security releases every 2-6 months. > I see a third: rename random.random() to be be something that gets the point across it is not crypto secure and then stop at that. I don't think the stdlib should get into the game of trying to provide a RNG that we claim is cryptographically secure as that will change suddenly when a weakness is discovered (this is one of the key reasons we chose not to consider adding requests to the stdlib, for instance). Theo's key issue is misuse of random.random(), not the lack of a crypto-appropriate RNG in the stdlib (that just happens to be his solution because he has an RNG that he is partially in charge of). So that means either we take a "consenting adults" approach and say we can't prevent people from using code without reading the docs or we try to rename the function. But then again that won't help with all of the other functions in the random module that implicitly use random.random() (if that even matters; not sure if the helper functions in the module have any crypto use that would lead to their misuse). Oh, and there is always the nuclear 4th option and we just deprecate the random module. ;) -Brett > > Replacing our deterministic default PRNG with a non-deterministic > one doesn't really fly, since we'd break an important feature > of random.random(). You may remember that we already ran a similar > stunt with the string hash function, with very mixed results. > > Calling the result of such a switch-over "secure" is even > worse, since it's a promise we cannot keep (probably not even > fully define). Better leave the promise at "insecure" - that's > something we can promise forever and don't have to define :-) > > Regardless of what we end up with, I think Python land can do > better than name it "arc4random". We're great at bike shedding, > so how about we start the fun with "randomYMMV" :-) > > Overall, I think having more options for good PRNGs is great. > Whether this "arc4random" is any good remains to be seen, but > given that OpenBSD developed it, chances are higher than > usual. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Sep 10 2015) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > ________________________________________________________________________ > 2015-09-18: PyCon UK 2015 ... 8 days to go > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 10 17:55:07 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 01:55:07 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <85h9n482sa.fsf@benfinney.id.au> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On 9 September 2015 at 17:33, Ben Finney wrote: > Paul Moore writes: > >> On 5 September 2015 at 09:30, Nick Coghlan wrote: >> > Unfortunately, I've yet to convince the rest of PyPA (let alone the >> > community at large) that telling people to call "pip" directly is *bad >> > advice* (as it breaks in too many cases that beginners are going to >> > encounter), so it would be helpful if folks helping beginners on >> > python-list and python-tutor could provide feedback supporting that >> > perspective by filing an issue against >> > https://github.com/pypa/python-packaging-user-guide >> >> I would love to see "python -m pip" (or where the launcher is >> appropriate, the shorter "py -m pip") be the canonical invocation used >> in all documentation, discussion and advice on running pip. > > Contrariwise, I would like to see ?pip? become the canonical invocation > used in all documentation, discussion, and advice; and if there are any > technical barriers to that least-surprising method, to see those > barriers addressed and removed. We're doing that too, but it's a "teaching people to use the command line for the first time is hard" problem and a "managing multiple copies of a language runtime and ensuring independently named commands are working against the right target environment" issue, moreso than a language level one. A potentially more fruitful path is likely to be making it so that folks don't need to use the system shell at all, and can just work entirely from the Python REPL. The two main things folks can't do from the REPL at the moment are: * install packages * manage virtual environments The idea of an "install()" command injected into the builtins from site.py would cover the first. The second couldn't be handled the way virtualenv does things, but it *could* be handled through a tool like vex which creates new subshells and runs commands in those rather than altering the current shell: $ python3 Python 3.4.2 (default, Jul 9 2015, 17:24:30) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import subprocess >>> subprocess.call(["vex", "nikola", "python"]) Python 2.7.10 (default, Jul 5 2015, 14:15:43) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> print("Hello virtual environment!") Hello virtual environment! >>> The "vex nikola python" call there: 1. Starts a new bash subshell 2. Activates my "nikola" virtual environment in that subshell 3. Launches Python within that venv (hence the jump over to a Python 2.7 process, since I keep forgetting to recreate it as Python 3). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Thu Sep 10 17:59:10 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 10 Sep 2015 17:59:10 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> Message-ID: <55F1A8CE.3000900@egenix.com> On 10.09.2015 15:39, Stefan Krah wrote: > M.-A. Lemburg writes: >> Reading this thread is fun, but it doesn't seem to be getting >> anywhere - perhaps that's part of the fun >> >> Realistically, I see two options: >> >> 1. Someone goes and implements the OpenBSD random function in C >> and put a package up on PyPI, updating it whenever OpenBSD >> thinks that a new algorithm is needed or a security issue >> has to be fixed (from my experience with other crypto software >> like OpenSSL, this should be on the order of every 2-6 months ) > > The sane option would be to use the OpenBSD libcrypto, which seems to > be part of their OpenSSL fork (libressl), just like libcrypto is part > of OpenSSL. Well, we already link to OpenSSL for SSL and hashes. I guess exposing the OpenSSL RAND interface in a module would be the easiest way to go about this. pyOpenSSL already does this: http://www.egenix.com/products/python/pyOpenSSL/doc/pyopenssl.html/#document-api/rand More pointers: https://wiki.openssl.org/index.php/Random_Numbers https://www.openssl.org/docs/manmaster/crypto/rand.html What's nice about the API is that you can add entropy as you find it. > Then the crypto maintenance would be delegated to the distributions. > > I would even be interested in writing such a package, but it would > be external and non-redistributable for well-known reasons. :) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 10 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 8 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Thu Sep 10 18:05:56 2015 From: brett at python.org (Brett Cannon) Date: Thu, 10 Sep 2015 16:05:56 +0000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Thu, 10 Sep 2015 at 07:22 Donald Stufft wrote: > On September 10, 2015 at 9:44:13 AM, Paul Moore (p.f.moore at gmail.com) > wrote: > > On 10 September 2015 at 14:10, Donald Stufft wrote: > > >> I don't understand the phrase "if you needed determinism, it would > > >> hurt you to say so". Could you clarify? > > > > > > I transposed some words, fixed: > > > > > > "If you needed determinism, would it hurt you to say so?"" > > > > Thanks. > > > > In one sense, no it wouldn't. Nor would it matter to me if "the > > default random number generator" was fast and cryptographically > > secure. What matters is just that I get a load of random (enough) > > numbers. > > > > What hurts somewhat (not enormously, I'll admit) is up front having to > > think about whether I need to be able to capture a seed and replay it. > > That's nearly always something I'd think of way down the line, as a > > "wouldn't it be nice if I could get the user to send me a reproducible > > test case" or something like that. And of course it's just a matter of > > switching the underlying RNG at that point. > > > > None of this is hard. But once again, I'm currently using the module > > correctly, as documented. > > This is actually exactly why Theo suggested using a modern, userland CSPRNG > because it can generate random numbers faster than /dev/urandom can and, > unless > you need deterministic results, there's little downside to doing so. > > There's really two possible ideas here that depends on what sort of balance > we'd want to strike. We can make a default "I don't want to think about it" > implementation of random that is both *generally* secure and fast, however > it > won't be deterministic and you won't be able to explicitly seed it. This > would > be a backwards compatible change [1] for people who are simply calling > these > functions [2]: > > random.getrandbits > random.randrange > random.randint > random.choice > random.shuffle > random.sample > random.random > random.uniform > random.triangular > random.betavariate > random.expovariate > random.gammavariate > random.gauss > random.lognormvariate > random.normalvariate > random.vonmisesvariate > random.paretovariate > random.weibullvariate > > If this were all that the top level functions in random.py provided we > could > simply replace the default and people wouldn't notice, they'd just > automatically get safer randomness whether that's actually useful for their > use case or not. > > However, random.py also has these functions: > > random.seed > random.getstate > random.setstate > random.jumpahead > > and these functions are where the problem comes. These functions only > really > make sense for deterministic sources of random which are not "safe" for use > in security sensitive applications. So pretending for a moment that we've > already decided to do "something" about this, the question boils down to > what > do we do about these 4 functions. Either we can change the default to a > secure > CSPRNG and break these functions (and the people using them) which is > however > easily fixed by changing ``import random`` to > ``import random; random = random.DeterministicRandom()`` or we can > deprecate > the top level functions and try to guide people to choose up front what > kind > of random they need. Either of these solutions will end up with people > being > safer and, if we pretend we've agreed to do "something", it comes down to > whether we'd prefer breaking compatability for some people while keeping a > default random generator that is probably good enough for most people, or > if > we'd prefer to not break compatability and try to push people to always > deciding what kind of random they want. > +1 for deprecating module-level functions and putting everything into classes to force a choice +0 for deprecating the seed-related functions and saying "the stdlib uses was it uses as a RNG and you have to live with it if you don't make your own choice" and switching to a crypto-secure RNG. -0 leaving it as-is -Brett > > Of course, we still haven't decided that we should do "something", I think > that > we should because I think that secure by default (or at least, not > insecure by > default) is a good situation to be in. Over the history of computing it's > been > shown that time and time again that trying to document or educate users is > error prone and doesn't scale, but if you can design APIs to make the > "right" > thing obvious and opt-out and require opting in to specialist [3] cases > which > require some particular property. > > > [1] Assuming Theo's claim of the speed of the ChaCha based arc4random > function > is accurate, which I haven't tested but I assume he's smart enough to > know > what he's talking about WRT to speed of it. > > [2] I believe anyways, I don't think that any of these rely on the > properties > of MT or a deterministic source of random, just a source of random. > > [3] In this case, their are two specialist use cases, those that require > deterministic results and those that require specific security > properties > that are not satisified by a userland CSPRNG because a userland CSPRNG > is > not as secure as /dev/urandom but is able to be much faster. > > > > > I've omitted most of the rest of your response largely because we're > > probably just going to have to agree to differ. I'm probably too worn > > out being annoyed at the way that everything ends up needing to be > > security related, and the needs of people who won't read the docs > > determines API design, to respond clearly and rationally :-( > > > > Paul > > > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 > DCFA > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 10 18:10:26 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 10 Sep 2015 11:10:26 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> Message-ID: [Brett Cannon ] > ... > I see a third: rename random.random() to be be something that gets the point > across it is not crypto secure and then stop at that, > ... > Theo's key issue is misuse of random.random(), ... > ... > But then again that won't help with all of the other functions in > the random module that implicitly use random.random() (if that even matters; > not sure if the helper functions in the module have any crypto use that > would lead to their misuse). The most likely "misuses" in idiomatic Python (not mindlessly translated low-level C) involve some spelling of getting or using random integers, like .choice(), .randrange(), .randint(), or even .sample() and .shuffle(). At least in Python 3, those don't normally ever invoke .random() (neither directly nor indirectly) - they normally use the (didn't always exist) "primitive" .getrandbits() instead (indirectly via the private ._randbelow()). So if something here does need to change, it's all or nothing. > Oh, and there is always the nuclear 4th option and we just deprecate the > random module. ;) I already removed it from the repository. Deprecating it would be a security risk, since it would give hackers information about our future actions ;-) From xavier.combelle at gmail.com Thu Sep 10 18:18:37 2015 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Thu, 10 Sep 2015 18:18:37 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> Message-ID: My belief is that doing the safe thing by default is a major plus of python. So in this point of view, using a cryptographic secure PRNG for random.random() should be done if possible. That would not change a lot the way of people creating insecure software by lack of knowledge (me the first) but could help a little I see a third: rename random.random() to be be something that gets the > point across it is not crypto secure and then stop at that. I don't think > the stdlib should get into the game of trying to provide a RNG that we > claim is cryptographically secure as that will change suddenly when a > weakness is discovered (this is one of the key reasons we chose not to > consider adding requests to the stdlib, for instance). > > This is in my opinion would not be a good idea. Having safe default is a major plus of python, it is not like not having default because one think it eventually it could become insecure. And comparing a cryptographic secure PNRG with openSSL for the expected security release time is not fair because the complexity of the both software is clearly different. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Sep 10 18:20:35 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 18:20:35 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <55F0A1A8.5010001@mail.de> Message-ID: <55F1ADD3.9060903@mail.de> Got it. Thanks. On 10.09.2015 05:40, Jukka Lehtosalo wrote: > On Wed, Sep 9, 2015 at 2:16 PM, Sven R. Kunze > wrote: > > Thanks for sharing, Guido. Some random thoughts: > > - "classes should need to be explicitly marked as protocols" > If so, why are they classes in the first place? Other languages > has dedicated keywords like "interface". > > > I want to preserve compatibility with earlier Python versions (down to > 3.2), and this makes it impossible to add any new syntax. Also, there > is no need to add a keyword as there are other existing mechanisms > which are good enough, including base classes (as in the proposal) and > class decorators. I don't think that this will become a very commonly > used language feature, and thus adding special syntax for this doesn't > seem very important. My expectation is that structural subtyping would > be primarily useful for libraries and frameworks. > > Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Sep 10 18:30:22 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 11 Sep 2015 02:30:22 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On Fri, Sep 11, 2015 at 1:55 AM, Nick Coghlan wrote: > The second couldn't be handled the way virtualenv does things, but it > *could* be handled through a tool like vex which creates new subshells > and runs commands in those rather than altering the current shell: > > $ python3 > Python 3.4.2 (default, Jul 9 2015, 17:24:30) > [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> import subprocess >>>> subprocess.call(["vex", "nikola", "python"]) > Python 2.7.10 (default, Jul 5 2015, 14:15:43) > [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> print("Hello virtual environment!") > Hello virtual environment! >>>> Hmm. This looks like something that could confuse people no end. I already see a lot of people use Ctrl-Z to get out of a program (often because they've come from Windows, I think), and this would be yet another way to get lost as to which of various Python environments you're in. Is it safe to have Python exec to another process? That way, there's no "outer" Python to be left behind, and it'd feel like a transition rather than a nesting. ("Please note: Selecting a virtual environment restarts Python.") (Incidentally, what _would_ happen if you pressed Ctrl-Z while in that 'inner' Python? Would both Pythons get suspended?) ChrisA From skrah at bytereef.org Thu Sep 10 18:32:13 2015 From: skrah at bytereef.org (Stefan Krah) Date: Thu, 10 Sep 2015 16:32:13 +0000 (UTC) Subject: [Python-ideas] Should our default random number generator be secure? References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1A8CE.3000900@egenix.com> Message-ID: M.-A. Lemburg writes: > On 10.09.2015 15:39, Stefan Krah wrote: > > M.-A. Lemburg ...> writes: > >> 1. Someone goes and implements the OpenBSD random function in C > >> and put a package up on PyPI, updating it whenever OpenBSD > >> thinks that a new algorithm is needed or a security issue > >> has to be fixed (from my experience with other crypto software > >> like OpenSSL, this should be on the order of every 2-6 months ) > > > > The sane option would be to use the OpenBSD libcrypto, which seems to > > be part of their OpenSSL fork (libressl), just like libcrypto is part > > of OpenSSL. > > Well, we already link to OpenSSL for SSL and hashes. I guess exposing > the OpenSSL RAND interface in a module would be the easiest way > to go about this. Yes, my suggestion was based on the premise that OpenBSD's libcrypto (which should include the portable arc4(chacha20)random) is more secure, faster, etc. That's a big 'if', their PRNG had a couple of bugs on Linux last year, but OpenSSL also regularly has issues. Stefan Krah From mal at egenix.com Thu Sep 10 18:38:49 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 10 Sep 2015 18:38:49 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> Message-ID: <55F1B219.1000502@egenix.com> On 10.09.2015 17:46, Brett Cannon wrote: > On Thu, 10 Sep 2015 at 01:26 M.-A. Lemburg wrote: > >> Reading this thread is fun, but it doesn't seem to be getting >> anywhere - perhaps that's part of the fun ;-) >> >> Realistically, I see two options: >> >> 1. Someone goes and implements the OpenBSD random function in C >> and put a package up on PyPI, updating it whenever OpenBSD >> thinks that a new algorithm is needed or a security issue >> has to be fixed (from my experience with other crypto software >> like OpenSSL, this should be on the order of every 2-6 months ;-)) >> >> 2. Ditto, but we put the module in the stdlib and then run around >> issuing patch level security releases every 2-6 months. >> > > I see a third: rename random.random() to be be something that gets the > point across it is not crypto secure and then stop at that. I think this is the major misunderstanding here: The random module never suggested that it generates pseudo-random data of crypto quality. I'm pretty sure people doing crypto will know and most others simply don't care :-) Evidence: We used a Wichmann-Hill PRNG as default in random for a decade and people still got their work done. Mersenne was added in Python 2.3 and bumped the period from 6,953,607,871,644 (13 digits) to 2**19937-1 (6002 digits). > I don't think > the stdlib should get into the game of trying to provide a RNG that we > claim is cryptographically secure as that will change suddenly when a > weakness is discovered (this is one of the key reasons we chose not to > consider adding requests to the stdlib, for instance). > > Theo's key issue is misuse of random.random(), not the lack of a > crypto-appropriate RNG in the stdlib (that just happens to be his solution > because he has an RNG that he is partially in charge of). So that means > either we take a "consenting adults" approach and say we can't prevent > people from using code without reading the docs or we try to rename the > function. But then again that won't help with all of the other functions in > the random module that implicitly use random.random() (if that even > matters; not sure if the helper functions in the module have any crypto use > that would lead to their misuse). > > Oh, and there is always the nuclear 4th option and we just deprecate the > random module. ;) Why not add ssl.random() et al. (as interface to the OpenSSL rand APIs) ? By putting the crypto random stuff into the ssl module, even people who don't know about the difference, will recognize that the ssl version must be doing something more related to crypto than the regular random module one, which never promised this. Some background on why I think deterministic RNGs are more useful to have as default than non-deterministic ones: A common use case for me is to write test data generators for large database systems. For such generators, I don't keep the many GBs data around, but instead make the generator take a few parameters which then seed the RNGs, the time module and a few other modules via monkey-patching. This allows me to create reproducible test sets in a very efficient way. The feature to be able to reproduce a set is typically only needed when tracking down a bug in the system, but the whole setup avoids having to keep the whole test sets around on disk. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 10 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 8 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From srkunze at mail.de Thu Sep 10 18:42:46 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 18:42:46 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <55F0AC83.3050505@mail.de> Message-ID: <55F1B306.5070705@mail.de> On 10.09.2015 06:12, Jukka Lehtosalo wrote: > This has been discussed almost to the death before, I am sorry. :) > but there are some of main the benefits as I see them: > - Code becomes more readable. This is especially true for code that > doesn't have very detailed docstrings. If I have code without docstrings, I better write docstrings then. ;) I mean when I am really going to touch that file to improve documentation (which annotations are a piece of), I am going to add more information for the reader of my API and that mostly will be describing the behavior of the API. If my variables have crappy names, so I need to add type hints to them, well, then, I rather fix them first. > This may go against the intuition of some people, but my experience > strongly suggests this, and many others who've used optional typing > have shared the sentiment. It probably takes a couple of days before > you get used to the type annotations, after which they likely won't > distract you any more but will actually improve code understanding by > providing important contextual information that is often difficult to > infer otherwise. > - Tools can automatically find most (simple) bugs of certain common > kinds in statically typed code. A lot of production code has way below > 100% test coverage, so this can save many manual testing iterations > and help avoid breaking stuff in production due to stupid mistakes > (that humans are bad at spotting). > - Refactoring becomes way less scary, especially if you don't have > close to 100% test coverage. A type checker can find many mistakes > that are commonly introduced when refactoring code. > > You'll get the biggest benefits if you are working on a large code > base mostly written by other people with limited test coverage and > little comments or documentation. If I had large untested and undocumented code base (well I actually have), then static type checking would be ONE tool to find out issues. Once found out, I write tests as hell. Tests, tests, tests. I would not add type annotations. I need tested functionality not proper typing. > You get extra credit if your tests are slow to run and flaky, We are problem solvers. So, I would tell my team: "make them faster and more reliable". > I consider that difference pretty significant. I wouldn't want to > increase the fraction of unchecked parts of my annotated code by a > factor of 8, and I want to have control over which parts can be type > checked. Granted. But you still don't know if your code runs correctly. You are better off with tests. And I agree type checking is 1 test to perform (out of 10K). But: > > I don't see the effort for adding type hints AND the effort for > further parsing (by human eyes) justified by partially better IDE > support and 1 single additional test within test suites of about > 10,000s of tests. > > Especially, when considering that correct types don't prove > functionality in any case. But tested functionality in some way > proves correct typing. > I didn't see you respond to that. But you probably know that. :) Thanks for responding anyway. It is helpful to see your intentions, though I don't agree with it 100%. Moreover, I think it is about time to talk about this. If it were not you, somebody else would finally have added type hints to Python. Keep up the good work. +1 Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Sep 10 18:53:59 2015 From: brett at python.org (Brett Cannon) Date: Thu, 10 Sep 2015 16:53:59 +0000 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: I like the idea enough that I'm +1 on moving forward with a PEP. On Wed, 9 Sep 2015 at 13:19 Guido van Rossum wrote: > Jukka wrote up a proposal for structural subtyping. It's pretty good. > Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 10 19:00:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 03:00:22 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 11 September 2015 at 02:05, Brett Cannon wrote: > +1 for deprecating module-level functions and putting everything into > classes to force a choice -1000, as this would be a *huge* regression in Python's usability for educational use cases. (Think 7-8 year olds that are still learning to read, not teenagers or adults with more fully developed vocabularies) A reasonable "Hello world!" equivalent for introducing randomness to students is rolling a 6-sided die, as that relates to a real world object they'll often be familiar with. At the moment that reads as follows: >>> from random import randint >>> randint(1, 6) 6 >>> randint(1, 6) 3 >>> randint(1, 6) 1 >>> randint(1, 6) 4 Another popular educational exercise is the "Guess a number" game, where the program chooses a random number from 1-100, and the person playing the game has to guess what it is. Again, randint() works fine here. Shuffling decks of cards, flipping coins, these are all things used to introduce learners to modelling random events in the real world in software, and we absolutely do *not* want to invalidate the extensive body of educational material that assumes the current module level API for the random module. > +0 for deprecating the seed-related functions and saying "the stdlib uses > was it uses as a RNG and you have to live with it if you don't make your own > choice" and switching to a crypto-secure RNG. However, this I'm +1 on. People *do* use the module level APIs inappropriately, and we can get them to a much safer place, while nudging folks that genuinely need deterministic randomness towards an alternative API. The key for me is that folks that actually *need* deterministic randomness *will* be calling the stateful module level APIs. This means we can put the deprecation warnings on *those* methods, and leave them out for the others. In terms of practical suggestions, rather than DeterministicRandom and NonDeterministicRandom, I'd actually go with the simpler terms SeededRandom and SeedlessRandom (there's a case to be made that those are misnomers, but I'll go into that more below): SeededRandom: Mersenne Twister SeedlessRandom: new CSPRNG SystemRandom: os.urandom() Phase one of transition: * add SeedlessRandom * rename Random to SeededRandom * Random becomes a subclass of SeededRandom that deprecates all methods not shared with SeedlessRandom * this will also effectively deprecate the corresponding module level functions * any SystemRandom methods that are no-ops (like seed()) are deprecated Phase two of transition: * Random becomes an alias for SeedlessRandom * deprecated methods are removed from SystemRandom * deprecated module level functions are removed As far as the proposed Seeded/Seedless naming goes, that deliberately glosses over the fact that "seed" gets used to refer to two different things - seeding a PRNG with entropy, and seeding a deterministic PRNG with a particular seed value. The key is that "SeedlessRandom" won't have a "seed()" *method*, and that's the single most salient fact about it from a user experience perspective: you can't get the same output by providing the same seed value, because we wouldn't let you provide a seed value at all. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From srkunze at mail.de Thu Sep 10 19:01:34 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 19:01:34 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F0E1F2.6040709@brenbarn.net> References: <55F0E1F2.6040709@brenbarn.net> Message-ID: <55F1B76E.2030602@mail.de> On 10.09.2015 03:50, Brendan Barnwell wrote: > On 2015-09-09 13:17, Guido van Rossum wrote: >> Jukka wrote up a proposal for structural subtyping. It's pretty good. >> Please discuss. >> >> https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 > > I'm not totally hip to all the latest typing developments, You bet what I am. > but I'm not sure I fully understand the benefit of this protocol > concept. At the beginning it says that classes have to be explicitly > marked to support these protocols. But why is that? Doesn't the > existing __subclasshook__ already allow an ABC to use any criteria it > likes to determine if a given class is considered a subclass? So > couldn't ABCs like the ones we already have inspect the type > annotations and decide a class "counts" as an iterable (or whatever) > if it defines the right methods with the right type hints? > The benefit from what I understand is actually really, really nice. It's basically adding the ability to shorten the following 'capability' check: if hasattr(obj, 'important') and hasattr(obj, 'relevant') and hasattr(obj, 'necessary'): # do to if implements(obj, protocol): # do As usual with type hints, functionality is not guaranteed. But it simplifies sanity checks OR decision making: if implements(obj, protocol1): # do this elif implements(obj, (protocol2, protocol3)): # do that The ability to extract all protocols of a type would provide a more flexible way of decision making and processing such as: if my_protocol in obj.__protocols__: # iterate over the protocols and do something @Jukka I haven't found the abilities described above. Would it make sense to add it (except it's already there)? Best, Sven From donald at stufft.io Thu Sep 10 19:02:12 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 10 Sep 2015 13:02:12 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 10, 2015 at 10:21:11 AM, Donald Stufft (donald at stufft.io) wrote: > > Assuming Theo's claim of the speed of the ChaCha based arc4random > function > is accurate, which I haven't tested but I assume he's smart enough > to know > what he's talking about WRT to speed of it. I wanted to try and test this. These are not super scientific since I just ran them on a single computer once (but 10 million iterations each) but I think it can probably give us an indication of the differences? I put the code up at https://github.com/dstufft/randtest but it's a pretty simple module. I'm not sure if (double)arc4random() / UINT_MAX is a reasonable way to get a double out of arc4random (which returns a uint) that is between 0.0 and 1.0, but I assume it's fine at least for this test. Here's the results from running the test on my personal computer which is running the OSX El Capitan public Beta: ? ? $ python test.py ? ? Number of Calls: ?10000000 ? ? +---------------+--------------------+ ? ? | method ? ? ? ?| usecs per call ? ? | ? ? +---------------+--------------------+ ? ? | deterministic | 0.0586802460020408 | ? ? | system ? ? ? ?| 1.6681434757076203 | ? ? | userland ? ? ?| 0.1534261149005033 | ? ? +---------------+--------------------+ I'll try it against OpenBSD later to see if their implementation of arc4random is faster than OSX. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From ncoghlan at gmail.com Thu Sep 10 19:03:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 03:03:18 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On 11 September 2015 at 02:30, Chris Angelico wrote: > On Fri, Sep 11, 2015 at 1:55 AM, Nick Coghlan wrote: >> The second couldn't be handled the way virtualenv does things, but it >> *could* be handled through a tool like vex which creates new subshells >> and runs commands in those rather than altering the current shell: >> >> $ python3 >> Python 3.4.2 (default, Jul 9 2015, 17:24:30) >> [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import subprocess >>>>> subprocess.call(["vex", "nikola", "python"]) >> Python 2.7.10 (default, Jul 5 2015, 14:15:43) >> [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> print("Hello virtual environment!") >> Hello virtual environment! >>>>> > > Hmm. This looks like something that could confuse people no end. I > already see a lot of people use Ctrl-Z to get out of a program (often > because they've come from Windows, I think), and this would be yet > another way to get lost as to which of various Python environments > you're in. Is it safe to have Python exec to another process? That > way, there's no "outer" Python to be left behind, and it'd feel like a > transition rather than a nesting. ("Please note: Selecting a virtual > environment restarts Python.") Using subprocess.call() to invoke vex was something I could do without writing a single line of code outside the REPL. An actual PEP would presumably propose something with a much nicer UX :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From srkunze at mail.de Thu Sep 10 19:07:43 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 19:07:43 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: <55F1B8DF.5060001@mail.de> On 09.09.2015 22:17, Guido van Rossum wrote: > Jukka wrote up a proposal for structural subtyping. It's pretty good. > Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 *15) How would|Protocol|be implemented? *"Implement metaclass functionality to detect whether a class is a protocol or not. Maybe add a class attribute such as __protocol__ = True if that's the case" If you consider the __protocols__ attribute I mentioned in an earlier post, I would like to see __protocol__ renamed to __is_protocol__. I think that would make it more readable in the long run. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Sep 10 19:11:19 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 11 Sep 2015 03:11:19 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Fri, Sep 11, 2015 at 3:00 AM, Nick Coghlan wrote: > As far as the proposed Seeded/Seedless naming goes, that deliberately > glosses over the fact that "seed" gets used to refer to two different > things - seeding a PRNG with entropy, and seeding a deterministic PRNG > with a particular seed value. The key is that "SeedlessRandom" won't > have a "seed()" *method*, and that's the single most salient fact > about it from a user experience perspective: you can't get the same > output by providing the same seed value, because we wouldn't let you > provide a seed value at all. Aside from sounding like varieties of grapes in a grocery, those names seem just fine. From the POV of someone with a bit of comprehension of crypto (as in, "use /dev/urandom rather than a PRNG", but not enough knowledge to actually build or verify these things), the distinction is precise: with SeededRandom, I can give it a seed and get back a predictable sequence of numbers, but with SeedlessRandom, I can't. I'm not sure what the difference is between "seeding a PRNG with entropy" and "seeding a deterministic PRNG with a particular seed value", though; aside from the fact that one of them uses a known value and the other doesn't, of course. Back in my BASIC programming days, we used to use "RANDOMIZE TIMER" to seed the RNG with time-of-day, or "RANDOMIZE 12345" (or other value) to seed with a particular value; they're the same operation, but one's considered random and the other's considered predictable. (Of course, bytes from /dev/urandom will be a lot more entropic than "number of centiseconds since midnight", but for a single-player game that wants to provide a different starting layout every time you play, the latter is sufficient.) ChrisA From rosuav at gmail.com Thu Sep 10 19:17:32 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 11 Sep 2015 03:17:32 +1000 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: On Fri, Sep 11, 2015 at 3:03 AM, Nick Coghlan wrote: >> Hmm. This looks like something that could confuse people no end. I >> already see a lot of people use Ctrl-Z to get out of a program (often >> because they've come from Windows, I think), and this would be yet >> another way to get lost as to which of various Python environments >> you're in. Is it safe to have Python exec to another process? That >> way, there's no "outer" Python to be left behind, and it'd feel like a >> transition rather than a nesting. ("Please note: Selecting a virtual >> environment restarts Python.") > > Using subprocess.call() to invoke vex was something I could do without > writing a single line of code outside the REPL. An actual PEP would > presumably propose something with a much nicer UX :) Heh, fair enough! Mainly, though, I'm wondering whether there'd be any risks to using os.exec* from the REPL - anything that would make it a bad idea to even consider that approach. ChrisA From tim.peters at gmail.com Thu Sep 10 19:23:48 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 10 Sep 2015 12:23:48 -0500 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: [Donald Stufft , on arc4random speed] > I wanted to try and test this. These are not super scientific since I just ran > them on a single computer once (but 10 million iterations each) but I think it > can probably give us an indication of the differences? > > I put the code up at https://github.com/dstufft/randtest but it's a pretty > simple module. I'm not sure if (double)arc4random() / UINT_MAX is a reasonable > way to get a double out of arc4random (which returns a uint) that is between > 0.0 and 1.0, but I assume it's fine at least for this test. arc4random() specifically returns uint32_t, which is 21 bits shy of what's needed to generate a reasonable random double. Our MT wrapping internally generates two 32-bit uint32_t thingies, and pastes them together like so (Python's C code here): """ /* random_random is the function named genrand_res53 in the original code; * generates a random number on [0,1) with 53-bit resolution; note that * 9007199254740992 == 2**53; I assume they're spelling "/2**53" as * multiply-by-reciprocal in the (likely vain) hope that the compiler will * optimize the division away at compile-time. 67108864 is 2**26. In * effect, a contains 27 random bits shifted left 26, and b fills in the * lower 26 bits of the 53-bit numerator. * The orginal code credited Isaku Wada for this algorithm, 2002/01/09. */ static PyObject * random_random(RandomObject *self) { PY_UINT32_T a=genrand_int32(self)>>5, b=genrand_int32(self)>>6; return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0)); } """ So now you know how to make it more directly comparable. The high-order bit is that it requires 2 calls to the 32-bit uint integer primitive to get a double, and that can indeed be significant. > Here's the results from running the test on my personal computer which is > running the OSX El Capitan public Beta: > > $ python test.py > Number of Calls: 10000000 > +---------------+--------------------+ > | method | usecs per call | > +---------------+--------------------+ > | deterministic | 0.0586802460020408 | > | system | 1.6681434757076203 | > | userland | 0.1534261149005033 | > +---------------+--------------------+ > > > I'll try it against OpenBSD later to see if their implementation of arc4random > is faster than OSX. Just noting that most people timing the OpenBSD version seem to comment out the "get stuff from the kernel periodically" part first, in order to time the algorithm instead of the kernel ;-) In real life, though, they both count, so I like what you're doing better. From ncoghlan at gmail.com Thu Sep 10 19:27:50 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 03:27:50 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 11 September 2015 at 03:11, Chris Angelico wrote: > On Fri, Sep 11, 2015 at 3:00 AM, Nick Coghlan wrote: >> As far as the proposed Seeded/Seedless naming goes, that deliberately >> glosses over the fact that "seed" gets used to refer to two different >> things - seeding a PRNG with entropy, and seeding a deterministic PRNG >> with a particular seed value. The key is that "SeedlessRandom" won't >> have a "seed()" *method*, and that's the single most salient fact >> about it from a user experience perspective: you can't get the same >> output by providing the same seed value, because we wouldn't let you >> provide a seed value at all. > > Aside from sounding like varieties of grapes in a grocery, those names > seem just fine. From the POV of someone with a bit of comprehension of > crypto (as in, "use /dev/urandom rather than a PRNG", but not enough > knowledge to actually build or verify these things), the distinction > is precise: with SeededRandom, I can give it a seed and get back a > predictable sequence of numbers, but with SeedlessRandom, I can't. I'm > not sure what the difference is between "seeding a PRNG with entropy" > and "seeding a deterministic PRNG with a particular seed value", > though; aside from the fact that one of them uses a known value and > the other doesn't, of course. Actually, that was just a mistake on my part - they're really the same thing, and the only distinction is the one you mention: setting the seed to a known value. Thus the main seed-related difference between something like arc4random and other random APIs is the same one I'm proposing to make here: it's seedless at the API level because it takes care of collecting its own initial entropy from the operating system's random number API. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From robert.kern at gmail.com Thu Sep 10 19:29:22 2015 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 10 Sep 2015 18:29:22 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: On 2015-09-10 00:15, Nathaniel Smith wrote: > On Wed, Sep 9, 2015 at 3:19 PM, Tim Peters wrote: >> The Twister's provably perfect equidistribution across its whole >> period also has its scary sides. For example, run random.random() >> often enough, and it's _guaranteed_ you'll eventually reach a state >> where the output is exactly 0.0 hundreds of times in a row. That will >> happen as often as it "should happen" by chance, but that's scant >> comfort if you happen to hit such a state. Indeed, the Twister was >> patched relatively early in its life to try to prevent it from >> _starting_ in such miserable states. Such states are nevertheless >> still reachable from every starting state. > > This criticism seems a bit unfair though -- even a true stream of > random bits (e.g. from a true unbiased quantum source) has this > property, and trying to avoid this happening would introduce bias that > really could cause problems in practice. A good probabilistic program > is one that has a high probability of returning some useful result, > but they always have some low probability of returning something > weird. So this is just saying that most people don't understand > probability. Which is true, but there isn't much that the random > module can do about it :-) The MT actually does have a problem unique to it (or at least to its family of Generalized Feedback Shift Registers) where a state with a high proportion of 0 bits will get stuck in a region of successive states with high proportions of 0 bits. Other 623-dimensional equidistributed PRNGs will indeed come across the same states with high 0-bit sequences with the frequency that you expect from probability, but they will be surrounded by states with dissimilar 0-bit proportions. This problem isn't *due* to equidistribution per se, but I think Tim's point is that you are inevitably due to hit one such patch if you sample long enough. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From srkunze at mail.de Thu Sep 10 19:35:56 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 19:35:56 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> <55F0E5C9.6030509@brenbarn.net> Message-ID: <55F1BF7C.9060205@mail.de> On 10.09.2015 17:36, Nick Coghlan wrote: > This perspective doesn't grant enough credit to the significance of C > in general, and the C ABI in particular, in the overall computing > landscape. While a lot of folks have put a lot of work into making it > possible to write software without needing to learn the details of > what's happening at the machine level, it's still the case that the > *one* language binding interface that *every* language runtime ends up > including is being able to load and run C libraries. Ah, now I understand. We need to add {} to C. That'll make it, right? ;) Seriously, there are also other significant influences that fit better here: template engines. I know a couple of them using {} in some sense or another. C format strings are just one of them, so I wouldn't stress the significance of C that hard *in that particular instance*. There are other areas where C has its strengths. > P.S. It's also worth remembering than many Pythonistas, including > members of the core development team, happily switch between > programming languages according to the task at hand. Python can still > be our *preferred* language without becoming the *only* language we > use :) I hope everybody on this list knows that.:) Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Sep 10 19:39:16 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 19:39:16 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: <2D5621A7-0676-489D-886E-76E7D953870D@yahoo.com> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> <55F0B677.3090500@mail.de> <2D5621A7-0676-489D-886E-76E7D953870D@yahoo.com> Message-ID: <55F1C044.3040904@mail.de> On 10.09.2015 01:14, Andrew Barnert wrote: > Of course I can easily file a docs bug, with a patch, and possibly start a discussion on the relevant list to get wider discussion. But you can do that as easily as I can, and I don't know why you should anticipate better of me than you do of yourself. (If you don't feel capable of writing the change, because you're not a native speaker or your tech writing skills aren't as good as your coding skills or whatever, I won't argue that your English seems good enough to me; just write a "draft" patch and then ask for people to improve it. There are docs changes that have been done this way in the past, and I think there are more than enough people who'd be happy to help.) I didn't know that. The Python development and discussion process is still somewhat opaque to me. Btw. you asked for what could be improved and I responded. :) Best, Sven From robert.kern at gmail.com Thu Sep 10 19:41:39 2015 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 10 Sep 2015 18:41:39 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: On 2015-09-10 04:56, Nathaniel Smith wrote: > On Wed, Sep 9, 2015 at 8:35 PM, Tim Peters wrote: >> There are some clean and easy approaches to this based on >> crypto-inspired schemes, but giving up crypto strength for speed. If >> you haven't read it, this paper is delightful: >> >> http://www.thesalmons.org/john/random123/papers/random123sc11.pdf > > It really is! As AES acceleration instructions become more common > (they're now standard IIUC on x86, x86-64, and even recent ARM?), even > just using AES in CTR mode becomes pretty compelling -- it's fast, > deterministic, provably equidistributed, *and* cryptographically > secure enough for many purposes. I'll also recommend the PCG paper (and algorithm) as the author's cross-PRNGs comparisons have been bandied about in this thread already. The paper lays out a lot of the relevant issues and balances the various qualities that are important: statistical quality, speed, and security (of various flavors). http://www.pcg-random.org/paper.html I'm actually not that impressed with Random123. The core idea is nice and clean, but the implementation is hideously complex. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From srkunze at mail.de Thu Sep 10 19:43:40 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 19:43:40 +0200 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55F0C193.6000606@btinternet.com> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> <55F0C193.6000606@btinternet.com> Message-ID: <55F1C14C.9060608@mail.de> Of course I would not want to force you, Rob. I believe in progress and progress is achieved through change. So, the best method for change I know of are deprecations: not all changes come bundled but singly and with time to prepare. To me, it's just a minor deficiency in Python's own vision. Best, Sven On 10.09.2015 01:32, Rob Cliffe wrote: > I use %-formatting. > Not because I think it's so wonderful and solves all problems > (although it's pretty good), but because it appeared to be the > recommended method at the time I learned Python in earnest. If I were > only learning Python now, I would probably learn str.format or > whatever it is. > I *could* learn to use something else *and* change all my working > code, but do you really want to force me to do that? > I would guess that there are quite a lot of Python users in the same > position. > Rob Cliffe > > On 09/09/2015 17:05, Sven R. Kunze wrote: >> On 09.09.2015 02:09, Andrew Barnert via Python-ideas wrote: >>> I think it's already been established why % formatting is not going >>> away any time soon. >>> >>> As for de-emphasizing it, I think that's already done pretty well in >>> the current docs. The tutorial has a nice long introduction to >>> str.format, a one-paragraph section on "old string formatting" with >>> a single %5.3f example, and a one-sentence mention of Template. The >>> stdtypes chapter in the library reference explains the difference >>> between the two in a way that makes format sound more attractive for >>> novices, and then has details on each one as appropriate. What else >>> should be done? >> >> I had difficulties to find what you mean by tutorial. But hey, being >> a Python user for years and not knowing where the official tutorial >> resides... >> >> Anyway, Google presented me the version 2.7 of the tutorial. Thus, >> the link to the stdtypes documentation does not exhibit the note of, >> say, 3.5: >> >> "Note: The formatting operations described here exhibit a variety of >> quirks that lead to a number of common errors (such as failing to >> display tuples and dictionaries correctly). Using the newer >> str.format() interface helps avoid these errors, and also provides a >> generally more powerful, flexible and extensible approach to >> formatting text." >> >> So, adding it to the 2.7 docs would be a start. >> >> >> I still don't understand what's wrong with deprecating %, but okay. I >> think f-strings will push {} to wide-range adoption. >> >> >> Best, >> Sven >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2014.0.4830 / Virus Database: 4365/10609 - Release Date: >> 09/09/15 >> >> > From srkunze at mail.de Thu Sep 10 19:46:11 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 10 Sep 2015 19:46:11 +0200 Subject: [Python-ideas] Wheels For ... In-Reply-To: References: <55EC78E9.1050300@mail.de> <20150907012645.GX19373@ando.pearwood.info> Message-ID: <55F1C1E3.90202@mail.de> Another example for the sake of documentation: https://github.com/tornadoweb/tornado/issues/1383#issuecomment-84098055 On 07.09.2015 07:39, Nathaniel Smith wrote: > > On Sep 6, 2015 10:28 PM, "Andrew Barnert via Python-ideas" > > wrote: > > > > On Sep 6, 2015, at 21:20, Donald Stufft > wrote: > > > > > > Let's take lxml for > > > example which binds against libxml2. It needs built on Windows, it > needs built > > > on OSX, it needs built on various Linux distributions in order to > cover the > > > spread of just the common cases. > > > > IIRC, Apple included ancient versions (even at the time) of libxml2 > up to around 10.7, and at one point they even included one of the > broken 2.7.x versions. So a build farm building for 10.6+ (which I > think is what python.org builds still target?) is > going to build against an ancient libxml2, meaning some features of > lxml2 will be disabled, and others may even be broken. Even if I'm > remembering wrong about Apple, I'm sure there are linux distros with > similar issues. > > > > Fortunately, lxml has a built-in option (triggered by an env > variable) for dealing with this, by downloading the source, building a > local copy of the libs, and statically linking them into lxml, but > that means you need some way for a package to specify env variables to > be set on the build server. And can you expect most libraries with > similar issues to do the same? > > Yes, you can! :-) > > I mean, not everyone will necessarily use it, but adding code like > > if "PYPI_BUILD_SERVER" in os.environ: > do_static_link = True > > to your setup.py is *wayyyy* easier than buying an OS X machine and > maintaining it and doing manual builds at every release. Or finding a > volunteer who has an OS X box and nagging them at every release and > dealing with trust hassles. > > And there are a lot of packages out there that just have some cython > files in them for speedups with no external dependencies, or whatever. > A build farm wouldn't have to be perfect to be extremely useful. > > -n > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Sep 10 19:48:12 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 10 Sep 2015 12:48:12 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: [Robert Kern ] > The MT actually does have a problem unique to it (or at least to its family > of Generalized Feedback Shift Registers) where a state with a high > proportion of 0 bits will get stuck in a region of successive states with > high proportions of 0 bits. Other 623-dimensional equidistributed PRNGs will > indeed come across the same states with high 0-bit sequences with the > frequency that you expect from probability, but they will be surrounded by > states with dissimilar 0-bit proportions. This problem isn't *due* to > equidistribution per se, but I think Tim's point is that you are inevitably > due to hit one such patch if you sample long enough. Thank you for explaining it better than I did. I implied MT's "stuck in zero-land" problems were _due_ to perfect equidistribution, but of course they're not. It's just that MT's specific permutation of the equidistributed-regardless-of-order range(1, 2**19937) systematically puts integers with "lots of 0 bits" next to each other. And there are many such patches. But 2**19337 is so large you need to contrive the state "by hand" to get there at once. For example, >>> random.setstate((2, (0,)*600 + (1,)*24 + (624,), None)) >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 >>> random.random() 0.0 That's "impossible" ;-) (1 chance in 2***(53*12)) of seeing 0.0 twelve times in a row) From 4kir4.1i at gmail.com Thu Sep 10 20:06:28 2015 From: 4kir4.1i at gmail.com (Akira Li) Date: Thu, 10 Sep 2015 21:06:28 +0300 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux References: Message-ID: <87y4ge6tej.fsf@gmail.com> Donald Stufft writes: ... > Essentially, there are three basic types of uses of random (the concept, not > the module). Those are: > > 1. People/usecases who absolutely need deterministic output given a seed and > ? ?for whom security properties don't matter. > 2. People/usecases who absolutely need a cryptographically random output and > ? ?for whom having a deterministic output is a downside. > 3. People/usecases that fall somewhere in between where it may or may not be > ? ?security sensitive or it may not be known if it's security sensitive. > > The people in group #1 are currently, in the Python standard library, best > served using the MT random source as it provides exactly the kind of determinsm > they need. The people in group #2 are currently, in the Python standard > library, best served using os.urandom (either directly or via > random.SystemRandom). > > However, the third case is the one that Theo's suggestion is attempting to > solve. In the current landscape, the security minded folks will tell these > people to use os.urandom/random.SystemRandom and the performance or otherwise > less security minded folks will likely tell them to just use random.py. Leaving > these people with a random that is not cryptographically safe. ... "security minded folks" [1] recommend "always use os.urandom()" and advise against *random* module [2,3] despite being aware of random.SystemRandom() [4] i.e., if they are right then *random* module probably only need to care about group #1 and avoid creating the false sense of security in group #3. [1] https://github.com/pyca/cryptography/blob/92d8bd12609586bfa53cf8c7a691e37474aeccd1/AUTHORS.rst [2] https://cryptography.io/en/latest/random-numbers/ [3] https://github.com/pyca/cryptography/blob/92d8bd12609586bfa53cf8c7a691e37474aeccd1/docs/random-numbers.rst [4] https://github.com/pyca/cryptography/issues/2278 From donald at stufft.io Thu Sep 10 20:19:20 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 10 Sep 2015 14:19:20 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <87y4ge6tej.fsf@gmail.com> References: <87y4ge6tej.fsf@gmail.com> Message-ID: On September 10, 2015 at 2:08:46 PM, Akira Li (4kir4.1i at gmail.com) wrote: > > "security minded folks" [1] recommend "always use os.urandom()" and > advise against *random* module [2,3] despite being aware of > random.SystemRandom() [4] > > i.e., if they are right then *random* module probably only need to care > about group #1 and avoid creating the false sense of security in group #3. > Maybe you didn't notice you?re talking to the third name in the list of authors that you linked too, but that documentation is there primarily because the random module's API is problematic and it's easier to recommend people to not use it than to try and explain how to use it safely. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From brenbarn at brenbarn.net Thu Sep 10 20:24:19 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Thu, 10 Sep 2015 11:24:19 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F1B76E.2030602@mail.de> References: <55F0E1F2.6040709@brenbarn.net> <55F1B76E.2030602@mail.de> Message-ID: <55F1CAD3.7050602@brenbarn.net> On 2015-09-10 10:01, Sven R. Kunze wrote: > > > On 10.09.2015 03:50, Brendan Barnwell wrote: >> On 2015-09-09 13:17, Guido van Rossum wrote: >>> Jukka wrote up a proposal for structural subtyping. It's pretty good. >>> Please discuss. >>> >>> https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 >> >> I'm not totally hip to all the latest typing developments, > > You bet what I am. > >> but I'm not sure I fully understand the benefit of this protocol >> concept. At the beginning it says that classes have to be explicitly >> marked to support these protocols. But why is that? Doesn't the >> existing __subclasshook__ already allow an ABC to use any criteria it >> likes to determine if a given class is considered a subclass? So >> couldn't ABCs like the ones we already have inspect the type >> annotations and decide a class "counts" as an iterable (or whatever) >> if it defines the right methods with the right type hints? >> > > The benefit from what I understand is actually really, really nice. It's > basically adding the ability to shorten the following 'capability' check: > > if hasattr(obj, 'important') and hasattr(obj, 'relevant') and > hasattr(obj, 'necessary'): > # do > > to > > if implements(obj, protocol): > # do Right, but can't you already do that with ABCs, as in the example in the docs (https://docs.python.org/2/library/abc.html)? You can write an ABC whose __subclasshook__ does whatever hasattr checks you want (and, if you want, checks the type annotations too), and then you can use isinstance/issubclass to check if a given instance/class "provides the protocol" described by that ABC. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From ericsnowcurrently at gmail.com Thu Sep 10 20:32:11 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 10 Sep 2015 12:32:11 -0600 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> Message-ID: On Thu, Sep 10, 2015 at 9:46 AM, Brett Cannon wrote: > Oh, and there is always the nuclear 4th option and we just deprecate the > random module. ;) Or move it under the math module (a la Go). -eric From 4kir4.1i at gmail.com Thu Sep 10 20:40:32 2015 From: 4kir4.1i at gmail.com (Akira Li) Date: Thu, 10 Sep 2015 21:40:32 +0300 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <87y4ge6tej.fsf@gmail.com> Message-ID: On Thu, Sep 10, 2015 at 9:19 PM, Donald Stufft wrote: > On September 10, 2015 at 2:08:46 PM, Akira Li (4kir4.1i at gmail.com) wrote: > > > > "security minded folks" [1] recommend "always use os.urandom()" and > > advise against *random* module [2,3] despite being aware of > > random.SystemRandom() [4] > > > > i.e., if they are right then *random* module probably only need to care > > about group #1 and avoid creating the false sense of security in group > #3. > > > > Maybe you didn't notice you?re talking to the third name in the list of > authors > that you linked too, Obviously, I've noticed it but I didn't want to call you out. but that documentation is there primarily because the > random module's API is problematic and it's easier to recommend people to > not > use it than to try and explain how to use it safely. > > "it's easier to recommend people to not use it than to try and explain how to use it safely." that is exactly the point if random.SystemRandom() is not safe to use while being based on "secure" os.urandom() then providing the same API based on (possibly less secure) arc4random() won't be any safer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Thu Sep 10 20:50:35 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 10 Sep 2015 14:50:35 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 10, 2015 at 1:24:05 PM, Tim Peters (tim.peters at gmail.com) wrote: > > So now you know how to make it more directly comparable. The > high-order bit is that it requires 2 calls to the 32-bit uint integer > primitive to get a double, and that can indeed be significant. It didn?t change the results really though: My OSX El Capitan machine: Number of Calls: ?10000000 +---------------+---------------------+ | method ? ? ? ?| usecs per call ? ? ?| +---------------+---------------------+ | deterministic | 0.05792283279588446 | | system ? ? ? ?| 1.7192466521984897 ?| | userland ? ? ?| 0.17901834140066059 | +---------------+??????????+ An OpenBSD 5.7 VM: Number of Calls: ?10000000 +---------------+---------------------+ | method ? ? ? ?| usecs per call ? ? ?| +---------------+---------------------+ | deterministic | 0.06555143180000868 | | system ? ? ? ?| 0.8929547749999983 ?| | userland ? ? ?| 0.16291017429998647 | +---------------+---------------------+ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From kramm at google.com Thu Sep 10 20:57:21 2015 From: kramm at google.com (Matthias Kramm) Date: Thu, 10 Sep 2015 11:57:21 -0700 (PDT) Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: <9683c40d-b662-4b77-947e-62c418be8468@googlegroups.com> On Wednesday, September 9, 2015 at 1:19:12 PM UTC-7, Guido van Rossum wrote: > > Jukka wrote up a proposal for structural subtyping. It's pretty good. > Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 > I like this proposal; given Python's flat nominal type hierarchy, it will be useful to have a parallel subtyping mechanism to give things finer granularity without having to resort to ABCs. Are the return types of methods invariant or variant under this proposal? I.e. if I have class A(Protocol): def f() -> int: ... does class B: def f() -> bool: return True implicitly implement the protocol A? Also, marking Protocols using subclassing seems confusing and error-prone. In your examples above, one would think that you could define a new protocol using class SizedAndClosable(Sized): pass instead of class SizedAndClosable(Sized, Protocol): pass because Sized is already a protocol. Maybe the below would be a more intuitive syntax: @protocol class SizedAndClosable(Sized): pass Furthermore, I strongly agree with #7. Typed, but optional, attributes are a bad idea. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kramm at google.com Thu Sep 10 20:57:21 2015 From: kramm at google.com (Matthias Kramm) Date: Thu, 10 Sep 2015 11:57:21 -0700 (PDT) Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: <9683c40d-b662-4b77-947e-62c418be8468@googlegroups.com> On Wednesday, September 9, 2015 at 1:19:12 PM UTC-7, Guido van Rossum wrote: > > Jukka wrote up a proposal for structural subtyping. It's pretty good. > Please discuss. > > https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 > I like this proposal; given Python's flat nominal type hierarchy, it will be useful to have a parallel subtyping mechanism to give things finer granularity without having to resort to ABCs. Are the return types of methods invariant or variant under this proposal? I.e. if I have class A(Protocol): def f() -> int: ... does class B: def f() -> bool: return True implicitly implement the protocol A? Also, marking Protocols using subclassing seems confusing and error-prone. In your examples above, one would think that you could define a new protocol using class SizedAndClosable(Sized): pass instead of class SizedAndClosable(Sized, Protocol): pass because Sized is already a protocol. Maybe the below would be a more intuitive syntax: @protocol class SizedAndClosable(Sized): pass Furthermore, I strongly agree with #7. Typed, but optional, attributes are a bad idea. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Sep 10 21:04:23 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 10 Sep 2015 12:04:23 -0700 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: References: Message-ID: however, he did bring up a python-idea worthy general topic: Sometimes you want to iterate without doing anything with the results of the iteration. So the obvious is his example -- iterate N times: for i in range(N): do_something but you may need to look into the code (probably more than one line) to see if i is used for anything. I know there was talk way back about making integers iterable, so you could do: for i in 32: do something. which would be slightly cleaner, but still has an extra i in there, and this was soundly rejected anyway (for good reason). IN fact, Python's "for" is not really about iterating N times, it's about iteraton over a sequence of objects. Ans I for one find: for _ in range(N): To be just fine -- really very little noise or performance overhead or anything else. However, I've found myself wanting a "make nothing comprehension". For some reason, I find myself frequently following a pattern where I want to call the same method on all the objects in a sequence: for obj in a_sequence: obj.a_method() but I like the compactness of comprehensions, so I do: [obj.a_method() for obj in a_sequence] but then this creates a list of the result from that method call. Which I don't want, so I don't assign the results to anything, and it just goes away. But somehow it bugs me that I'm creating this (maybe big) ol' list full of junk, just to have it deleted. Anyone else think this is a use-case worth supporting better? Or should I jstu get over it -- it's really not that expensive to create a list, after all. -Chris On Thu, Sep 10, 2015 at 12:07 AM, Terry Reedy wrote: > On 9/9/2015 1:10 PM, Stephan Sahm wrote: > > I found a BUG in the standard while statement, which appears both in >> python 2.7 and python 3.4 on my system. >> > > No you did not, but aside from that: python-ideas is for ideas about > future versions of python, not for bug reports, valid or otherwise. You > should have sent this to python-list, which is a place to report possible > bugs. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Thu Sep 10 21:24:21 2015 From: donald at stufft.io (Donald Stufft) Date: Thu, 10 Sep 2015 15:24:21 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <87y4ge6tej.fsf@gmail.com> Message-ID: On September 10, 2015 at 2:40:54 PM, Akira Li (4kir4.1i at gmail.com) wrote: > "it's easier to recommend people to not use it than to try and explain how > to use it safely." that is exactly the point > if random.SystemRandom() is not safe to use while being based on "secure" > os.urandom() then providing the same API based on (possibly less secure) > arc4random() won't be any safer. >? "If the mountain won't come to Muhammad then Muhammad must go to the mountain." In other words, we can write all the documentation in the world we want, and it doesn't change the simple fact that by choosing a default, there is going to be some people who will use it when it's inappropiate due to the fact that it is the default. The pratical effect of changing the default will be that some cases are broken, but in a way that is obvious and trivial to fix, some cases won't have any pratical effect at all, and finally, for some people it's going to take code that was previously completely insecure and make it either secure or harder to exploit for people who are incorrectly using the API. I wouldn't expect the documentation in pyca/cryptography to change, it'd still recommend people to use os.urandom directly and we'd still recommend that people should use SystemRandom/os.urandom in the random.py docs for things that need to be cryptographically secure, this is just a safety net for people who don't know or didn't listen. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From storchaka at gmail.com Thu Sep 10 22:01:20 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 10 Sep 2015 23:01:20 +0300 Subject: [Python-ideas] Round division Message-ID: In Python there is a operation for floor division: a // b. Ceil division easy can be expressed via floor division: -((-a) // b). But round division is more complicated. This operation is needed in Fraction.__round__, in a number of methods in the datetime module (see _divide_and_round). Due to the complexity of the correct Python implementation, it is slower then just division. I propose to add special function in the math module. This not only will speed up Python implementation of the datetime module and the fractions module, but will encourage users to use correct algorithm instead of obvious but incorrect round(a/b). From njs at pobox.com Thu Sep 10 22:33:05 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 10 Sep 2015 13:33:05 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Sep 10, 2015 5:29 AM, "Paul Moore" wrote: [...] > You're claiming that the random > module is security related. I'm claiming it's not, it's documented as > not being, and that's clear to the people who use it for its intended > purpose. Telling those people that you want to make a module designed > for their use harder to use because people for whom it's not intended > can't read the documentation which explicitly states that it's not > suitable for them, is doing a disservice to those people who are > already using the module correctly for its stated purpose. Regarding the "harder to use" point (which is obviously just one of many considerations in this while debate): I trained myself a few years ago to stop using the global random functions and instead always pass around an explicit RNG object, and my experience is that once I got into the habit it gave me a strict improvement in code quality. Suddenly many more of my functions are deterministic ... well ... functions ... of their inputs, and suddenly it's clearly marked in the source which ones have randomness in their semantics, and suddenly it's much easier to do things like refactor the code while preserving the output for a given seed. (This is tricky because just changing the order in which you do things can break your code. I wince in sympathy at people who have to maintain code like your map-generation-from-a-seed example and *aren't* using RNG objects explicitly.) The implicit global RNG is a piece of global state, like global variables, and causes similar unpleasantness. Now that I don't use it, I look back and it's like "huh, why did I always used to hit myself in the face like that? That wasn't very pleasant." So this is what I teach my collaborators and students now. Most of them just use the global state by default because they don't even know about the OO option. YMMV but that's my experience FWIW. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From marky1991 at gmail.com Thu Sep 10 23:13:18 2015 From: marky1991 at gmail.com (Mark Young) Date: Thu, 10 Sep 2015 17:13:18 -0400 Subject: [Python-ideas] Round division In-Reply-To: References: Message-ID: Pardon my ignorance, but what is the definition of round division? (if it isn't "round(a/b)") -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Sep 10 23:48:42 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 10 Sep 2015 22:48:42 +0100 Subject: [Python-ideas] Round division In-Reply-To: References: Message-ID: On 10 September 2015 at 22:13, Mark Young wrote: > Pardon my ignorance, but what is the definition of round division? (if it > isn't "round(a/b)") I assumed it would be "what round(a/b) would give if it weren't subject to weird floating point rounding issues". To put it another way, if a / b is d remainder r, then I'd assume "round division" would be d if r < b/2, d+1 if r > b/2, and (which of d, d+1?) if r == b/2. (a, b, d and r are all integers). If not, then I also would like to know what it means... Either way, if it is introduced then it should be documented (particularly as regards what happens when one or both of a, b are negative) clearly, as it's not 100% obvious. Also, is the math module the right place? All of the operations in the math module (apart from factorial, for some reason...) are floating point. Paul From python at mrabarnett.plus.com Fri Sep 11 00:12:39 2015 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 10 Sep 2015 23:12:39 +0100 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: References: Message-ID: <55F20057.2070103@mrabarnett.plus.com> On 2015-09-10 20:04, Chris Barker wrote: > however, he did bring up a python-idea worthy general topic: > > Sometimes you want to iterate without doing anything with the results of > the iteration. > > So the obvious is his example -- iterate N times: > > for i in range(N): > do_something > > but you may need to look into the code (probably more than one line) to > see if i is used for anything. > > I know there was talk way back about making integers iterable, so you > could do: > > for i in 32: > do something. > > which would be slightly cleaner, but still has an extra i in there, and > this was soundly rejected anyway (for good reason). IN fact, Python's > "for" is not really about iterating N times, it's about iteraton over a > sequence of objects. Ans I for one find: > > for _ in range(N): > > To be just fine -- really very little noise or performance overhead or > anything else. > > However, I've found myself wanting a "make nothing comprehension". For > some reason, I find myself frequently following a pattern where I want > to call the same method on all the objects in a sequence: > > for obj in a_sequence: > obj.a_method() > > but I like the compactness of comprehensions, so I do: > > [obj.a_method() for obj in a_sequence] > > but then this creates a list of the result from that method call. Which > I don't want, so I don't assign the results to anything, and it just > goes away. > > But somehow it bugs me that I'm creating this (maybe big) ol' list full > of junk, just to have it deleted. > > Anyone else think this is a use-case worth supporting better? Or should > I jstu get over it -- it's really not that expensive to create a list, > after all. > You could use a generator expression with a function that discards the results: def every(iterable): for _ in iterable: pass every(obj.a_method() for obj in a_sequence) > > On Thu, Sep 10, 2015 at 12:07 AM, Terry Reedy > wrote: > > On 9/9/2015 1:10 PM, Stephan Sahm wrote: > > I found a BUG in the standard while statement, which appears both in > python 2.7 and python 3.4 on my system. > > > No you did not, but aside from that: python-ideas is for ideas about > future versions of python, not for bug reports, valid or otherwise. > You should have sent this to python-list, which is a place to report > possible bugs. > From abarnert at yahoo.com Fri Sep 11 00:22:41 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 15:22:41 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F1B306.5070705@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> Message-ID: <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> On Sep 10, 2015, at 09:42, Sven R. Kunze wrote: > > I mean when I am really going to touch that file to improve documentation (which annotations are a piece of), I am going to add more information for the reader of my API and that mostly will be describing the behavior of the API. As a bit of useless anecdotal evidence: After starting to play with MyPy when Guido first announced the idea, I haven't actually started using static type checking seriously, but I have started writing annotations for some of my functions. It feels like a concise and natural way to say "this function wants two integers", and it reads as well as it writes. Of course there's no reason I couldn't have been doing this since 3.0, but I wasn't, and now I am. Try playing around with it and see if you get the same feeling. Since everyone is thinking about the random module right now, and it makes a great example of what I'm talking about, specify which functions take/return int vs. float, which need a real int vs. anything Integral, etc., and how much more easily you absorb the information than if it's in the middle of a sentence in the docstring. Anyway, I don't actually annotate every function (or every function except the ones that are so simple that any checker or reader that couldn't infer the types is useless, the way I would in Haskell), just the ones where the types seem like an important part of the semantics. So I haven't missed the more complex features the way I expected to. But I've still got no problem with them being added as we go along, of course. :) From abarnert at yahoo.com Fri Sep 11 00:27:05 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 15:27:05 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <9683c40d-b662-4b77-947e-62c418be8468@googlegroups.com> References: <9683c40d-b662-4b77-947e-62c418be8468@googlegroups.com> Message-ID: <959BCAB1-9A4E-4147-80C8-BD113E4A5319@yahoo.com> On Sep 10, 2015, at 11:57, Matthias Kramm via Python-ideas wrote: > >> On Wednesday, September 9, 2015 at 1:19:12 PM UTC-7, Guido van Rossum wrote: >> Jukka wrote up a proposal for structural subtyping. It's pretty good. Please discuss. >> >> https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 > > I like this proposal; given Python's flat nominal type hierarchy, it will be useful to have a parallel subtyping mechanism to give things finer granularity without having to resort to ABCs. I don't understand this, given that resorting to protocols is basically the same thing as resorting to ABCs. Clearly there's some perceiving difficulty or complexity of ABCs within the Python community that makes people not realize how simple and useful they are. But I don't see how adding something that's nearly equivalent but different and maintaining the two in parallel is a good solution to that problem. There are some cases where the fact that ABCs rely on a metaclass makes them problematic where Protocols aren't (basically, where you need another metaclass), but I doubt that's the case you're worried about. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 11 00:34:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 15:34:29 -0700 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: References: Message-ID: On Sep 10, 2015, at 12:04, Chris Barker wrote: > > However, I've found myself wanting a "make nothing comprehension". For some reason, I find myself frequently following a pattern where I want to call the same method on all the objects in a sequence: > > for obj in a_sequence: > obj.a_method() > > but I like the compactness of comprehensions, so I do: > > [obj.a_method() for obj in a_sequence] I think this is an anti-pattern. The point of a comprehension is that it's an expression, which gathers up results. You're trying to hide side effects inside an expression, which is a bad thing to do, and lamenting the fact that you get a useless value back, which of course you do because expressions have values, so that should be a sign that you don't actually want an expression here. Also, compare the actual brevity here: [obj.a_method() for obj in a_sequence] for obj in a_sequence: obj.a_method() You've replaced a colon with a pair of brackets, so it's actually less concise. If you really want to do this anyway, you can use the consume recipe from the itertools docs or the more-itertools library or write your own one-liner: consume = partial(deque, maxlen=0) consume(obj.a_method() for obj in a_sequence) At least this makes it explicit that you're creating and ignoring a bunch of values. But I still think it's much clearer to just use a for statement. From storchaka at gmail.com Fri Sep 11 00:39:59 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 11 Sep 2015 01:39:59 +0300 Subject: [Python-ideas] Round division In-Reply-To: References: Message-ID: On 11.09.15 00:48, Paul Moore wrote: > On 10 September 2015 at 22:13, Mark Young wrote: >> Pardon my ignorance, but what is the definition of round division? (if it >> isn't "round(a/b)") > > I assumed it would be "what round(a/b) would give if it weren't > subject to weird floating point rounding issues". To put it another > way, if a / b is d remainder r, then I'd assume "round division" would > be d if r < b/2, d+1 if r > b/2, and (which of d, d+1?) if r == b/2. > (a, b, d and r are all integers). > > If not, then I also would like to know what it means... Yes, it is what you have described. If r == b/2, the result is even (i.e. (d+1)//2*2). > Either way, if it is introduced then it should be documented > (particularly as regards what happens when one or both of a, b are > negative) clearly, as it's not 100% obvious. > > Also, is the math module the right place? All of the operations in the > math module (apart from factorial, for some reason...) are floating > point. It is the best place in the stdlib. Apart from floating point functions, the math module contains integer functions (factorial and gcd) and general number functions (floor, ceil, trunc and isclose). gcd and isclose are new in 3.5. From tritium-list at sdamon.com Fri Sep 11 00:47:56 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Thu, 10 Sep 2015 18:47:56 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <55F14B70.2080901@sdamon.com> Message-ID: <55F2089C.4020909@sdamon.com> On 9/10/2015 07:40, Donald Stufft wrote: > What harm is there in making people explicitly choose between deterministic > randomness and secure randomness? Is your use case so much better than theirs > that you thing you deserve to type a few characters less to the detriment of > people who don't know any better? > > API Breakage. This is not worth the break in backwards compatibility. My use case is using the API that has been available for... 20 years? And for what benefit? None, and it can be argued that it would do the opposite of what is intended (false sense of security and all). From abarnert at yahoo.com Fri Sep 11 00:46:36 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 15:46:36 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Sep 10, 2015, at 07:21, Donald Stufft wrote: > > Either we can change the default to a secure > CSPRNG and break these functions (and the people using them) which is however > easily fixed by changing ``import random`` to > ``import random; random = random.DeterministicRandom()`` But that isn't a fix, unless all your code is in a single module. If I call random.seed in game.py and then call random.choice in aiplayer.py, I'll get different results after your fix than I did before. What I'd need to do instead is create a separate myrandom.py that does this and then exports all of the bound methods of random as top-level functions, and then make game.py, aiplayer.py, etc. all import myrandom as random. Which is, while not exactly hard, certainly harder, and much less obvious, than the incorrect fix that you've suggested, and it may not be immediately obvious that it's wrong until someone files a bug three versions later claiming that when he reloads a game the AI cheats and you have to track through the problem. That's why I suggested the set_default_instance function, which makes this problem trivial to solve in a correct way instead of in an incorrect way. From abarnert at yahoo.com Fri Sep 11 00:54:36 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 15:54:36 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> On Sep 10, 2015, at 15:46, Andrew Barnert via Python-ideas wrote: > >> On Sep 10, 2015, at 07:21, Donald Stufft wrote: >> >> Either we can change the default to a secure >> CSPRNG and break these functions (and the people using them) which is however >> easily fixed by changing ``import random`` to >> ``import random; random = random.DeterministicRandom()`` > > But that isn't a fix, unless all your code is in a single module. If I call random.seed in game.py and then call random.choice in aiplayer.py, I'll get different results after your fix than I did before. > > What I'd need to do instead is create a separate myrandom.py that does this and then exports all of the bound methods of random as top-level functions, and then make game.py, aiplayer.py, etc. all import myrandom as random. Which is, while not exactly hard, certainly harder, and much less obvious, than the incorrect fix that you've suggested, and it may not be immediately obvious that it's wrong until someone files a bug three versions later claiming that when he reloads a game the AI cheats and you have to track through the problem. > > That's why I suggested the set_default_instance function, which makes this problem trivial to solve in a correct way instead of in an incorrect way. Actually, I just thought of an even simpler solution: Add a deterministic_singleton member to random (which is just initialized to DeterministicRandom() at startup). Now, the user fix is just to change "import random" to "from random import deterministic_singleton as random". From chris.barker at noaa.gov Fri Sep 11 01:05:43 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 10 Sep 2015 16:05:43 -0700 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: <55F20057.2070103@mrabarnett.plus.com> References: <55F20057.2070103@mrabarnett.plus.com> Message-ID: On Thu, Sep 10, 2015 at 3:12 PM, MRAB wrote: > You could use a generator expression with a function that discards the > results: > > def every(iterable): > for _ in iterable: > pass > > every(obj.a_method() for obj in a_sequence) > sure -- though this adds a new function that people reading my code need to grok. Andrew Barnert wrote: > > [obj.a_method() for obj in a_sequence] I think this is an anti-pattern. The point of a comprehension is that it's > an expression, which gathers up results. You're trying to hide side effects > inside an expression, which is a bad thing to do, and lamenting the fact > that you get a useless value back, which of course you do because > expressions have values, so that should be a sign that you don't actually > want an expression here. Exactly -- I don't want a comprehension, I don't want a expression, I want a concise way to spell : do this thing to all of these things.... Also, compare the actual brevity here: > [obj.a_method() for obj in a_sequence] > for obj in a_sequence: obj.a_method() > You've replaced a colon with a pair of brackets, so it's actually less > concise. Fair enough -- removing a newline does make that pretty simple looking! I guess I got all comprehension-happy there -- back when it was the shiny new toy, and then I got stuck on it. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Sep 11 01:08:59 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 10 Sep 2015 16:08:59 -0700 Subject: [Python-ideas] Round division In-Reply-To: References: Message-ID: On Thu, Sep 10, 2015 at 3:39 PM, Serhiy Storchaka wrote: > On 11.09.15 00:48, Paul Moore wrote: > >> On 10 September 2015 at 22:13, Mark Young wrote: >> >>> Pardon my ignorance, but what is the definition of round division? (if it >>> isn't "round(a/b)") >>> >> >> I assumed it would be "what round(a/b) would give if it weren't >> subject to weird floating point rounding issues". To put it another >> way, if a / b is d remainder r, then I'd assume "round division" would >> be d if r < b/2, d+1 if r > b/2, and (which of d, d+1?) if r == b/2. >> (a, b, d and r are all integers). >> >> If not, then I also would like to know what it means... >> > > Yes, it is what you have described. If r == b/2, the result is even (i.e. > (d+1)//2*2). > > Either way, if it is introduced then it should be documented >> (particularly as regards what happens when one or both of a, b are >> negative) clearly, as it's not 100% obvious. >> >> Also, is the math module the right place? All of the operations in the >> math module (apart from factorial, for some reason...) are floating >> point. >> > > It is the best place in the stdlib. Apart from floating point functions, > the math module contains integer functions (factorial and gcd) and general > number functions (floor, ceil, trunc and isclose). gcd and isclose are new > in 3.5. well, floor, ceil, and isclose are all about floats... Nevertheless, yes the math module is the place for it. -CHB > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Sep 11 04:07:22 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 11 Sep 2015 11:07:22 +0900 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F1B219.1000502@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> Message-ID: <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> Executive summary: The question is, "what value is there in changing the default to be crypto strong to protect future security-sensitive applications from naive implementers vs. the costs to current users who need to rewrite their applications to explicitly invoke the current default?" M.-A. Lemburg writes: > I'm pretty sure people doing crypto will know and most others > simply don't care :-) Which is why botnets have millions of nodes. People who do web security evidently believe that inappropriate RNGs have something to do with widespread security issues. (That doesn't mean they're right, but it gives me pause for thought -- evidently, Guido thought so too!) > Evidence: We used a Wichmann-Hill PRNG as default in random > for a decade and people still got their work done. The question is not whether people get their work done. People work (unless they're seriously dysfunctional), that's what people do. Especially programmers (cf. GNU Manifesto). The question is whether the work of the *crackers* is made significantly easier by security holes that are opened by inappropriate use of random.random. I tend to agree with Steven d'A. (and others) that the answer is no: it doesn't matter if the kind of person who leaves a key under the third flowerpot from the left also habitually leaves the door unlocked (especially if "I'm only gonna be gone for 5 minutes"), and I think that's likely. IOW, installing crypto strong RNGs as default is *not* analogous to the changes to SSL support that were so important that they were backported to 2.7 in a late patch release. OTOH, why default to crypto weak if crypto strong is easily available? You might save a few million Debian users from having to regenerate all their SSH keys.[1] But the people who are "just getting work done" in new programs *won't notice*. I don't think that they care what's under the hood of random.random, as long as (1) the API stays the same, and (2) the documentation clearly indicates where to find PRNGs that support determinism, jumpahead, replicability, and all those other good things, for the needs they doesn't have now but know they probably will have some day. The rub is, as usual, existing applications that would have to be changed for no reason that is relevant to them. Note that arc4random is much simpler to use than random.random. No knobs to tweak or seeds to store for future reference. Seems perfectly suited to "just getting work" done to me. OTOH, if you have an application where you need replicability, jumpahead, etc, you're going to need to read the docs enough to find the APIs for seeding and so on. At design time, I don't see why it would hurt to select an RNG algorithm explicitly as well. > Why not add ssl.random() et al. (as interface to the OpenSSL > rand APIs) ? I like that naming proposal. I'm sure changing the nature of random.random would annoy the heck out of *many* users. An alternative would be to add random.crypto. > Some background on why I think deterministic RNGs are more > useful to have as default than non-deterministic ones: > > A common use case for me is to write test data generators > for large database systems. For such generators, I don't keep > the many GBs data around, but instead make the generator take a > few parameters which then seed the RNGs, the time module and > a few other modules via monkey-patching. If you've gone to that much effort, you evidently have read the docs and it wouldn't have been a huge amount of trouble to use a non-default module with a specified PRNG -- if you were doing it now. But you have every right to be very peeved if you have a bunch of old test runs you want to replicate with a new version of Python, and we've changed the random.random RNG on you. Footnotes: [1] I hasten to add that a programmer who isn't as smart as he thinks he is who "improves" a crypto algorithm is far more likely than that the implementer of a crypto suite would choose an RNG that is inappropriate by design. Still, it's a theoretical possibility, and security is about eliminating every theoretical possibility you can think of. From steve at pearwood.info Fri Sep 11 04:39:23 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Sep 2015 12:39:23 +1000 Subject: [Python-ideas] BUG in standard while statement In-Reply-To: <55F20057.2070103@mrabarnett.plus.com> References: <55F20057.2070103@mrabarnett.plus.com> Message-ID: <20150911023923.GS19373@ando.pearwood.info> On Thu, Sep 10, 2015 at 11:12:39PM +0100, MRAB wrote: > On 2015-09-10 20:04, Chris Barker wrote: > >however, he did bring up a python-idea worthy general topic: > > > >Sometimes you want to iterate without doing anything with the results of > >the iteration. > > > >So the obvious is his example -- iterate N times: > > > >for i in range(N): > > do_something > > > >but you may need to look into the code (probably more than one line) to > >see if i is used for anything. Solution is obvious: for throw_away_variable_not_used_for_anythng in range(N): ... *wink* Just use one of the usual conventions for throw-away variables: call it _ or whocares. But, why do you care so much about whether i is being used for something? Today, you have: for whocares in range(10): print(message) Next week, you decide you need to number them, now you do care about the loop variable: for whocares in range(10): print(whocares, message) Having a loop variable that may remain unused is not exactly a big deal. [Chris] > >However, I've found myself wanting a "make nothing comprehension". For > >some reason, I find myself frequently following a pattern where I want > >to call the same method on all the objects in a sequence: > > > >for obj in a_sequence: > > obj.a_method() > > > >but I like the compactness of comprehensions, so I do: > > > >[obj.a_method() for obj in a_sequence] Ew, ew, ew, ew. You're calling the method for its side-effects, not its return result (which is probably None, but might not be). Turning it into a list comp is abuse of comprehensions: you're collecting the return results, potentially creating an enormous list, which you don't actually want and immediately throw away. Just write it as a one-liner for-loop, which is *more* compact (by exactly one character) as the list comp): [obj.a_method() for obj in a_sequence] for obj in a_sequence: obj.a_method() [MRAB] > You could use a generator expression with a function that discards the > results: > > def every(iterable): > for _ in iterable: > pass > > every(obj.a_method() for obj in a_sequence) If you're going to do such a horrid thing, at least name it accurately. "every" sounds like a synonym for the built-in "all". A more accurate name would be "consume", as in consuming the iterator, and I seem to recall that there's a recipe in the itertools docs to do that as fast as possible. But, whether you call it "every" or "consume", the code still looks misleading: every(obj.a_method() for obj in a_sequence) looks like you care about the return results of a_method, but you don't. List comps and generator expressions are for cases where you care about the expression's result. -- Steve From ncoghlan at gmail.com Fri Sep 11 04:38:19 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 12:38:19 +1000 Subject: [Python-ideas] One way to do format and print In-Reply-To: <55F1BF7C.9060205@mail.de> References: <55ED24C4.9000205@mail.de> <87pp1t1unb.fsf@uwakimon.sk.tsukuba.ac.jp> <55EF2B66.4020509@mail.de> <1441741195.1614886.378114729.37307E0E@webmail.messagingengine.com> <6DDBD724-714E-40E1-88DF-9BC8484FF240@yahoo.com> <55F058B6.9000202@mail.de> <1DCC81C0-DE7A-460A-AD7F-E1533BB14911@yahoo.com> <55F0E5C9.6030509@brenbarn.net> <55F1BF7C.9060205@mail.de> Message-ID: On 11 September 2015 at 03:35, Sven R. Kunze wrote: > On 10.09.2015 17:36, Nick Coghlan wrote: > > This perspective doesn't grant enough credit to the significance of C > in general, and the C ABI in particular, in the overall computing > landscape. While a lot of folks have put a lot of work into making it > possible to write software without needing to learn the details of > what's happening at the machine level, it's still the case that the > *one* language binding interface that *every* language runtime ends up > including is being able to load and run C libraries. > > > Ah, now I understand. We need to add {} to C. That'll make it, right? ;) > > Seriously, there are also other significant influences that fit better here: > template engines. I know a couple of them using {} in some sense or another. > C format strings are just one of them, so I wouldn't stress the significance > of C that hard in that particular instance. There are other areas where C > has its strengths. You're tilting at windmills Sven. Python has 3 substitution variable syntaxes (two with builtin support), and we no longer have any plans for getting rid of any of them. We *did* aim to deprecate percent-substitution as part of the Python 3 migration, and after trying for ~5 years *decided that was a bad idea*, and reversed the original decision to classify it as deprecated. We subsequently switched the relevant section of the docs from describing percent-formatting as "old string formatting" to "printf-style string formatting" in a larger revamp of the builtin sequence type documentation a few years back: https://hg.python.org/cpython/rev/463f52d20314 PEP 461 has now further entrenched the notion that "percent-formatting is recommended for binary data, brace-formatting is recommended for text data" by bringing back the former for bytes and bytearray in 3.5, while leaving str.format as text only: https://www.python.org/dev/peps/pep-0461/ PEP 498 then blesses brace-formatting as the "one obvious way" for text formatting by elevating it from "builtin method" to "syntax" in 3.6. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Sep 11 04:48:07 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 12:48:07 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> Message-ID: On 11 September 2015 at 08:54, Andrew Barnert via Python-ideas wrote: > Actually, I just thought of an even simpler solution: > > Add a deterministic_singleton member to random (which is just initialized to DeterministicRandom() at startup). Now, the user fix is just to change "import random" to "from random import deterministic_singleton as random". Change the spelling to "import random.seeded_random as random" and the user fix is even shorter. I do agree with the idea of continuing to provide a process global instance of the current PRNG for ease of migration - changing a single import is a good way to be able to address a deprecation, and looking for the use of seeded_random in a security sensitive context would still be fairly straightforward. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Sep 11 05:13:04 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Sep 2015 13:13:04 +1000 Subject: [Python-ideas] Round division In-Reply-To: References: Message-ID: <20150911031304.GT19373@ando.pearwood.info> On Fri, Sep 11, 2015 at 01:39:59AM +0300, Serhiy Storchaka wrote: > On 11.09.15 00:48, Paul Moore wrote: > >On 10 September 2015 at 22:13, Mark Young wrote: > >>Pardon my ignorance, but what is the definition of round division? (if it > >>isn't "round(a/b)") > > > >I assumed it would be "what round(a/b) would give if it weren't > >subject to weird floating point rounding issues". To put it another > >way, if a / b is d remainder r, then I'd assume "round division" would > >be d if r < b/2, d+1 if r > b/2, and (which of d, d+1?) if r == b/2. > >(a, b, d and r are all integers). > > > >If not, then I also would like to know what it means... > > Yes, it is what you have described. If r == b/2, the result is even > (i.e. (d+1)//2*2). How does this differ from round(a/b)? round() also rounds to even. Perhaps a more general solution would be a round-to-direction, or divide-and-round-to-direction. Now that we have Enums, we could define enums for round-to-zero, round-to-nearest, round-to-infinity, round-to-even, and have a function divide(a, b, dir=ROUNDEVEN), say. -- Steve From abarnert at yahoo.com Fri Sep 11 05:18:45 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 20:18:45 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> Message-ID: On Sep 10, 2015, at 19:48, Nick Coghlan wrote: > > On 11 September 2015 at 08:54, Andrew Barnert via Python-ideas > wrote: >> Actually, I just thought of an even simpler solution: >> >> Add a deterministic_singleton member to random (which is just initialized to DeterministicRandom() at startup). Now, the user fix is just to change "import random" to "from random import deterministic_singleton as random". > > Change the spelling to "import random.seeded_random as random" and the > user fix is even shorter. OK, sure; I don't care much about the spelling. I think neither name will be unduly confusing to novices, and anyone who actually wants to understand what the choice means will use help or the docs or a Google search and find out in a few seconds. > I do agree with the idea of continuing to provide a process global > instance of the current PRNG for ease of migration - changing a single > import is a good way to be able to address a deprecation, and looking > for the use of seeded_random in a security sensitive context would > still be fairly straightforward. Personally, I think we're done with that change. Deprecation of the names random.Random, random.random(), etc. is sufficient to prevent people from making mistakes without realizing it. Having a good workaround to prevent code churn for the thousands of affected apps means the cost doesn't outweigh the benefits. So, the problem Theo raised is solved.[1] Which means the more radical solution he offered is unnecessary. Unless we're seriously worried that some people who aren't sure if they need Seeded or System may incorrectly choose Seeded just because of performance, there's no need to add a Chacha choice alongside them. Put it on PyPI, maybe with a link from the SystemRandom docs, and see how things go from there. [1] Well, it's not quite solved, because someone has to figure out how to organize things in the docs, which obviously need to change. Do we tell people how to choose between creating a SeededRandom or SystemRandom instance, then describe their interface, and then include a brief note "... but for porting old code, or when you explicitly need a globally shared Seeded instance, use seeded_random"? Or do we present all three as equally valid choices, and try to explain why you might want the singleton seeded_random vs. constructing and managing an instance or instances? From abarnert at yahoo.com Fri Sep 11 05:25:52 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 10 Sep 2015 20:25:52 -0700 Subject: [Python-ideas] Round division In-Reply-To: <20150911031304.GT19373@ando.pearwood.info> References: <20150911031304.GT19373@ando.pearwood.info> Message-ID: <999FEFC7-47CF-4651-9613-8A6B94C24A8C@yahoo.com> On Sep 10, 2015, at 20:13, Steven D'Aprano wrote: > >> On Fri, Sep 11, 2015 at 01:39:59AM +0300, Serhiy Storchaka wrote: >>> On 11.09.15 00:48, Paul Moore wrote: >>>> On 10 September 2015 at 22:13, Mark Young wrote: >>>> Pardon my ignorance, but what is the definition of round division? (if it >>>> isn't "round(a/b)") >>> >>> I assumed it would be "what round(a/b) would give if it weren't >>> subject to weird floating point rounding issues". To put it another >>> way, if a / b is d remainder r, then I'd assume "round division" would >>> be d if r < b/2, d+1 if r > b/2, and (which of d, d+1?) if r == b/2. >>> (a, b, d and r are all integers). >>> >>> If not, then I also would like to know what it means... >> >> Yes, it is what you have described. If r == b/2, the result is even >> (i.e. (d+1)//2*2). > > How does this differ from round(a/b)? round() also rounds to even. His rounds based on the exact integer remainder; yours rounds based on the inexact float fractional part. So, if b is large enough, using round division is guaranteed to do the right thing,[1] but rounding float division may have rounding, overflow, or underflow errors. [1] Except I'm pretty sure he wants to compare r*2 to b, not r to b/2. Otherwise he's reintroduced the problem he's trying to solve. From ncoghlan at gmail.com Fri Sep 11 05:33:59 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 13:33:59 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 September 2015 at 12:07, Stephen J. Turnbull wrote: > Executive summary: > > The question is, "what value is there in changing the default to be > crypto strong to protect future security-sensitive applications from > naive implementers vs. the costs to current users who need to rewrite > their applications to explicitly invoke the current default?" > > M.-A. Lemburg writes: > > > I'm pretty sure people doing crypto will know and most others > > simply don't care :-) > > Which is why botnets have millions of nodes. People who do web > security evidently believe that inappropriate RNGs have something to > do with widespread security issues. (That doesn't mean they're right, > but it gives me pause for thought -- evidently, Guido thought so too!) They're right. I used to be sanguine about this kind of thing because I spent a long time working in the defence sector, and assumed everyone else was as professionally paranoid as we were. I've been out of that world long enough now to realise that that assumption was deeply, and problematically, wrong*. In that world, you work on the following assumptions: 1) you're an interesting target; 2) the attackers' compute capacity is nigh infinite; 3) any weakness will be found; 4) any weakness will be exploited; 5) "other weaknesses exist" isn't a reason to avoid addressing the weaknesses you know about. As useful background, there's a recent Ars Technica article on the technical details of cracking the passwords in the Ashley Madison data dump, where security researchers found a NINE order of magnitude speedup due to a vulnerability in another part of the system which let them drastically reduce the search space for passwords: http://arstechnica.com/security/2015/09/once-seen-as-bulletproof-11-million-ashley-madison-passwords-already-cracked/ That kind of reduction in search requirements means that searches that *should* have taken almost 3000 years (in the absence of the vulnerability) can instead be completed within a day. Weak random number generators have a similar effect of reducing the search space for attackers - if you know a weakly random source was used, rather than a cryptographically secure one, then you can use what you know about the random number generator to favour inputs it is *likely* to have produced, rather than having to assume equal probability for the entire search space. And if the target was using a deterministic RNG and you're able to figure out the seed that was used? You no longer need to search at all - you can recreate the exact series of numbers the target was using. Moving the default random source to a CSPRNG, and allowing folks to move a faster deterministic PRNG for known non-security related use cases, or to the system random number generator for known security-related ones is likely to prove a good way to provide safer defaults without reducing flexibility or raising barriers to entry too much. Regards, Nick. P.S. * As a case in point, it was only a couple of years ago that I realised most developers *haven't* read docs like the NIST crypto usage guidelines or the IEEE 802.11i WPA2 spec, and don't make a habit of even casually following the progress of block cipher and secure hash function design competitions. It's been an interesting exercise for me in learning the true meaning of "expertise is relative" :) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Sep 11 05:38:06 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 11 Sep 2015 13:38:06 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> Message-ID: On 11 September 2015 at 13:18, Andrew Barnert wrote: > Personally, I think we're done with that change. Deprecation of the names random.Random, random.random(), etc. is sufficient to prevent people from making mistakes without realizing it. Implementing dice rolling or number guessing for a game as "from random import randint" is *not* a mistake, and I'm adamantly opposed to any proposal that makes it one - the cost imposed on educational use cases would be far too high. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Fri Sep 11 06:44:30 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 11 Sep 2015 13:44:30 +0900 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> Message-ID: <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > Implementing dice rolling or number guessing for a game as "from > random import randint" is *not* a mistake, Turning the number guessing game into a text CAPTCHA might be one, though. That randint may as well be crypto strong, modulo the problem that people who use an explicit seed get punished for knowing what they're doing. I suppose it would be too magic to have the seed method substitute the traditional PRNG for the default, while an implicitly seeded RNG defaults to a crypto strong algorithm? Steve From kramm at google.com Thu Sep 10 20:20:38 2015 From: kramm at google.com (Matthias Kramm) Date: Thu, 10 Sep 2015 11:20:38 -0700 (PDT) Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: Message-ID: I like this proposal; given Python's flat nominal type hierarchy, it will be useful to have a parallel subtyping mechanism to give things finer granularity without having to resort to ABCs. Are the return types of methods invariant or variant under this proposal? I.e. if I have class A(Protocol): def f() -> int: ... does class B: def f() -> bool: return True implicitly implement the protocol? Also, marking Protocols using subclassing seems confusing and error-prone. In your examples above, one would think that you could define a new protocol using class SizedAndClosable(Sized): pass instead of class SizedAndClosable(Sized, Protocol): pass because Sized is already a protocol. Maybe the below would be a more intuitive syntax: @protocol class SizedAndClosable(Sized): pass Furthermore, I strongly agree with #7. Typed, but optional, attributes are a bad idea. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Sep 11 06:54:30 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 11 Sep 2015 14:54:30 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Sep 11, 2015 at 2:44 PM, Stephen J. Turnbull wrote: > I suppose it would be too magic to have the seed method substitute the > traditional PRNG for the default, while an implicitly seeded RNG > defaults to a crypto strong algorithm? Ooh. Actually, I rather like that idea. If you don't seed the RNG, its output will be unpredictable; it doesn't matter whether it's a PRNG seeded by an unknown number, a PRNG seeded by /dev/urandom, a CSRNG, or just reading from /dev/urandom every time. Until you explicitly request determinism, you don't have it. If Python changes its RNG algorithm and you haven't been seeding it, would you even know? Could it ever matter to you? It would require a bit of an internals change; is it possible that code depends on random.seed and random.randint are bound methods of the same object? To implement what you describe, they'd probably have to not be. ChrisA From jlehtosalo at gmail.com Fri Sep 11 07:01:38 2015 From: jlehtosalo at gmail.com (Jukka Lehtosalo) Date: Thu, 10 Sep 2015 22:01:38 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <7DC7EA44-0CD8-4F61-8462-8147B8BB8059@yahoo.com> Message-ID: On Wed, Sep 9, 2015 at 10:48 PM, Andrew Barnert wrote: > On Sep 9, 2015, at 21:34, Jukka Lehtosalo wrote: > > I'm not sure if I fully understand what you mean by implicit vs. explicit > ABCs (and the static/runtime distinction). Could you define these terms and > maybe give some examples of each? > > > I just gave examples just one paragraph above. > > A (runtime) implicit ABC is something that uses a __subclasshook__ > (usually implementing a structural check). So, for instance, any type that > implements __iter__ is-a Iterable, e.g., according to isinstance or > issubclass or @singledispatch, because that's what > Iterable.__subclasshook__ checks for. > > A (runtime) explicit ABC is something that isn't implicit, like Sequence: > no hook, so nothing is-a Sequence unless it either inherits the ABC or > registers with it. > > You're proposing a parallel but separate distinction at static typing > time. Any ABC that's a Protocol is checked based on a structural check; > otherwise, it's checked based on inheritance. > In my proposal I actually suggest that protocols shouldn't support isinstance or issubclass (these operations should raise an exception) by default. A protocol is free to override the default exception-raising __subclasshook__ to implement a structural check, and a static type checker would allow isinstance and issubclass for protocols that do this. I'll need to explain this idea in more detail, as clearly the current explanation is too easy to misundertand. Here's a concrete example: class X(Protocol): def f(self): ... class A: def f(self): print('f') if isinstance(A(), X): ... # Raise an exception, because no __subclasshook__ override in X Previously I toyed with the idea of having a default implementation of __subclasshook__ that actually does a structural check, but I'm no longer sure if that would be desirable, as it's difficult to come up with an implementation that does the right thing in all reasonable cases. For example, consider a structural type like this that people might want to use to work around the current limitations of Callable (it doesn't support keyword arguments, for example): class MyCallable(Protocol): def __call__(self, x, y): ... (This example has some other potential issues that I'm hand-waving away for now.) Now how would the default isinstance work? Preferably it should only accept callables that are compatible with the signature, but doing that check is pretty difficult for arbitrary functions and should probably be out of scope for the typing module. Just checking whether __call__ exists would be too general, as the programmer probably expects that he's able to call the method with the specific arguments the type suggests. Also, sometimes checking the argument names would be a good thing to do, but sometimes any names (as long the the number of arguments is compatible) would be fine. > This means it's now possible to create supertypes that are implicit at > runtime but explicit at static typing time (which might occasionally be > useful), or vice-versa (which I can't imagine why you'd ever want). > As I showed above, you wouldn't get the latter unless you really try very hard (consenting adults and all). > > Besides the obvious negatives in having two not-quite-compatible and > very-different-looking ways of expressing the same concept, this is going > to lead to people wanting to know why their type checker is complaining > about perfectly good code ("I tested that constant with isinstance, and it > really is-a Spammable, and the type checker is inferring its type properly, > and yet I get an error passing it to a function that wants a Spammable") or > allowing blatantly invalid code ("I annotated my function to only take > Spammable arguments, but someone is passing something that calls the > fallback implementation of my singledispatch function instead of the > Spammable overload"). > I agree that having the default nominal/explicit isinstance semantics for a protocol type would be a very bad idea. > > Maybe the solution is to expand your proposal a little: make Protocol > automatically create a __subclasshook__ (which you listed as an optional > idea in the proposal), and also change all of the existing stdlib implicit > ABCs to Protocols and scrap their manual hooks, and also update the > relevant documentation (e.g., the abc module and the data model section on > __subclasshook__) to recommend using Protocol instead of implementing a > manual hook if the only thing you want is structural subtyping. Of course > the backward compatibility isn't perfect (unless you want to manually munge > up collections.abc when typing is imported), and people using legacy > third-party code might need to add stubs (although that seems necessary > anyway). But for most people, everything should just work as people expect. > A type is either structurally typed or explicitly (via inheritance or > registration) types, both at static typing time and a runtime, and that's > always expressed by the name Protocol. (But for the rare cases when you > really need a type check that's looser at runtime, you can still write a > manual hook to handle that.) > > Yeah, this would be nice, but as I argued above, implementing a generic __subclasshook__ is actually quite tricky. Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 11 07:24:23 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 11 Sep 2015 00:24:23 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [M.-A. Lemburg] >> I'm pretty sure people doing crypto will know and most others >> simply don't care :-) [Stephen J. Turnbull ] > Which is why botnets have millions of nodes. I'm not a security wonk, but I'll bet a life's salary ;-) we'd have botnets just as pervasive if every non-crypto RNG in the world were banned - or had never existed. To start a botnet, the key skill is social engineering: tricking ordinary users into installing malicious software. So long as end users are allowed to run programs, that problem will never go away. Hell, I get offers to install malware each day on Facebook alone, although they're *spelled* like "Install Flash update to see this shocking video!". Those never end for the same reason I still routinely get Nigerian 419 spam: there are plenty of people gullible enough to fall for them outright. Technical wizardry isn't needed to get in the door on millions of machines. So if RNGs have something to do with security, it's not with botnets; let's not oversell this. > People who do web security evidently believe that inappropriate RNGs > have something to do with widespread security issues. Do they really? I casually follow news of the latest exploits, and I really don't recall any of them pinned on an RNG (as opposed to highly predictable default RNG _seeding_ from several years back). Mostly out-of-bounds crap in C, or exploiting holes in security models, or bugs in the implementations of those models (whether Microsoft's, Java's, Adobe Flash's ...). > (That doesn't mean they're right, but it gives me pause for thought -- evidently, > Guido thought so too!) Or it's that Theo can be very insistent, and Guido is only brusque with the non-Dutch ;-) Not saying switching is bad. Am saying I've seen no compelling justification for causing users (& book & course authors & ....) such pain. If this were Python 0.9.1 at issue, sure - but random.py's basic API really hasn't changed since then. From greg.ewing at canterbury.ac.nz Fri Sep 11 07:42:34 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Sep 2015 17:42:34 +1200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <1441890163.3120507.379846857.49842A96@webmail.messagingengine.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <1441890163.3120507.379846857.49842A96@webmail.messagingengine.com> Message-ID: <55F269CA.9050708@canterbury.ac.nz> random832 at fastmail.us wrote: > Being able to produce multiple independent streams of numbers is the > important feature. Doing it by "jumping ahead" seems less so. Doing it by jumping ahead isn't strictly necessary; the important thing is to have some way of generating *provably* non-overlapping and independent sequences. Jumping ahead is one obvious way to achieve that. Simply setting the seed of each generator randomly and hoping for the best is not really good enough. > And the > need for doing it "efficiently" isn't as clear either I say that because you can obviously jump ahead N steps in any generator just by running it for N cycles, but that's likely to be unacceptably slow. A more direct way of getting there is desirable. -- Greg From jlehtosalo at gmail.com Fri Sep 11 08:00:30 2015 From: jlehtosalo at gmail.com (Jukka Lehtosalo) Date: Thu, 10 Sep 2015 23:00:30 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <55F0AC83.3050505@mail.de> Message-ID: On Thu, Sep 10, 2015 at 3:01 AM, Luciano Ramalho wrote: > Jukka, thank you very much for working on such a hard topic and being > patient enough to respond to issues that I am sure were exhaustively > discussed before (but I was not following the discussions then since I > was in the final sprint for my book, Fluent Python, at the time). > > I have two questions which were probably already asked before, so feel > free to point me to relevant past messages: > > 1) Why is a whole new hierarchy of types being created in the typing > module, instead of continuing the hierarchy in the collections module > while enhancing the ABCs already there? For example, why weren't the > List and Dict type created under the existing MutableSequence and > MutableMapping types in collections.abc? > There are two main reasons. First, we wanted typing to be backward compatible down to Python 3.2, and so all the new features had to work without any changes to other standard library modules. Second, the module is provisional and it would be awkward to have non-provisional standard library modules depend on or closely interact with a provisional module. Also, List and Dict are actually type aliases for regular classes (list and dict, respectively) and so they actually represent subclasses of MutableSequence and MutableMapping as defined in collections.abc. They aren't proper classes so they don't directly play a role at runtime outside annotations. > > 2) Similarly, I note that PEP-484 shuns existing ABCs like those in > the numbers module, and the ByteString ABC. The reasons given are > pragmatic, so that users don't need to import the numbers module, and > would not "have to write typing.ByteString everywhere." as the PEP > says... I don not understand these arguments because: > > a) as you just wrote in another message, the users will be primarily > the authors of libraries and frameworks, who will always be forced to > import typing anyhow, so it does not seem such a burden to have them > import other modules get the benefits of type hinting; > I meant that protocols will likely be often *defined* in libraries or frameworks (or their stubs). Almost any code can *use* protocols in annotations, but user code might be less likely to define additional protocols. That's just a guess and I could be easily proven wrong, though. b) alternatively, there could be aliases of the relevant ABCs in the > typing module for convenience > There are other reasons for not using ABCs for things like numbers. For example, a lot of standard library functions expect concrete numeric types and won't accept arbitrary subclasses of the ABCs. For example, you couldn't pass a value with the numbers.Integral type to math.sin, because it expects an int or a float. Using ABCs instead of int, float or str wouldn't really work well (or at all) for type checking. > > So the second question is: what's wrong with points (a) and (b), and > why did PEP-484 keep such a distance form existing ABCs in general? > See above. There are more reasons but those that I mentioned are some of the more important ones. If you are still unconvinced, ask for more details and maybe I'll dig through the archives. :-) > > I understand pragmatic choices, but as a teacher and writer I know > such choices are often obstacles to learning because they seem > arbitrary to anyone who is not privy to the reasons behind them. So > I'd like to better understand the reasoning, and I think PEP-484 is > not very persuasive when it comes to the issues I mentioned. > Yeah, PEP 484 doesn't go through the rationale and subtleties in much detail. Maybe there should be a separate rationale PEP and we could just link to it when we get asked some of these (quite reasonable, mind you!) questions again. ;-) Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Sep 11 08:19:13 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 11 Sep 2015 18:19:13 +1200 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <55F27261.1060901@canterbury.ac.nz> Chris Angelico wrote: > I'm > not sure what the difference is between "seeding a PRNG with entropy" > and "seeding a deterministic PRNG with a particular seed value", > though; aside from the fact that one of them uses a known value and > the other doesn't, of course. Back in my BASIC programming days, we > used to use "RANDOMIZE TIMER" to seed the RNG with time-of-day, or > "RANDOMIZE 12345" (or other value) to seed with a particular value; I think the only other difference is that the Linux kernel is continually re-seeding its generator whenever more unpredictable bits become available. It's not something you need to explicitly do yourself, as in your BASIC example. -- Greg From jlehtosalo at gmail.com Fri Sep 11 08:24:36 2015 From: jlehtosalo at gmail.com (Jukka Lehtosalo) Date: Thu, 10 Sep 2015 23:24:36 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F1B306.5070705@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> Message-ID: On Thu, Sep 10, 2015 at 9:42 AM, Sven R. Kunze wrote: > On 10.09.2015 06:12, Jukka Lehtosalo wrote: > > but there are some of main the benefits as I see them: > > - Code becomes more readable. This is especially true for code that > doesn't have very detailed docstrings. > > > If I have code without docstrings, I better write docstrings then. ;) > > I mean when I am really going to touch that file to improve documentation > (which annotations are a piece of), I am going to add more information for > the reader of my API and that mostly will be describing the behavior of the > API. > > If my variables have crappy names, so I need to add type hints to them, > well, then, I rather fix them first. > Even good variable names can leave the type ambiguous. And besides, if you assume that all code is perfect or can be made perfect I think that you've already lost the discussion. Reality disagrees with you. ;-) You can't just wave a magic wand and to get every programmer to document their code and write unit tests. However, we know quite well that programmers are perfectly capable of writing type annotations, and tools can even enforce that they are present (witness all the Java code in existence). Tools can't verify that you have good variable names or useful docstrings, and people are too inconsistent or lazy to be relied on. > > You'll get the biggest benefits if you are working on a large code base > mostly written by other people with limited test coverage and little > comments or documentation. > > > If I had large untested and undocumented code base (well I actually have), > then static type checking would be ONE tool to find out issues. > Sure, it doesn't solve everything. > > Once found out, I write tests as hell. Tests, tests, tests. I would not > add type annotations. I need tested functionality not proper typing. > Most programmers only have limited time for improving existing code. Adding type annotations is usually easier that writing tests. In a cost/benefit analysis it may be optimal to spent half the available time on annotating parts of the code base to get some (but necessarily limited) static checking coverage and spend the remaining half on writing tests for selected parts of the code base, for example. It's not all or nothing. > > > You get extra credit if your tests are slow to run and flaky, > > > We are problem solvers. So, I would tell my team: "make them faster and > more reliable". > But you'd probably also ask them to implement new features (or *your* manager might be unhappy), and they have to find the right balance, as they only have 40 hours a week (or maybe 80 hours if you work at an early-stage startup :-). Having more tools gives you more options for spending your time efficiently. > > > I consider that difference pretty significant. I wouldn't want to increase > the fraction of unchecked parts of my annotated code by a factor of 8, and > I want to have control over which parts can be type checked. > > > Granted. But you still don't know if your code runs correctly. You are > better off with tests. And I agree type checking is 1 test to perform (out > of 10K). > Actually a type checker can verify multiple properties of a typical line of code. So for 10k lines of code, complete type checking coverage would give you the equivalent of maybe 30,000 (simple) tests. :-P And I'm sure it would take much less time to annotate your code than to manually write the 30,000 test cases. > > But: > > >> I don't see the effort for adding type hints AND the effort for further >> parsing (by human eyes) justified by partially better IDE support and 1 >> single additional test within test suites of about 10,000s of tests. >> >> Especially, when considering that correct types don't prove functionality >> in any case. But tested functionality in some way proves correct typing. >> > > I didn't see you respond to that. But you probably know that. :) > This is a variation of an old argument, which goes along the lines of "if you have tests and comments (and everybody should, of course!) type checking doesn't buy you anyhing". But if the premise can't be met, the argument doesn't actually say anything about the usefulness of type checking. :-) It's often not cost effective to have good test coverage (and even 100% line coverage doesn't give you full coverage of all interactions). Testing can't prove that your code doesn't have defects -- it just proves that for a tiny subset of possible inputs you code works as expected. A type checker may be able to prove that for *all* possible inputs your code doesn't do certain bad things, but it can't prove that it does the good things. Neither subsumes the other, and both of these are approaches are useful and complementary (but incomplete). I think that there was a good talk basically about this at PyCon this year, by the way, but I can't remember the title. Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Sep 11 08:27:15 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 11 Sep 2015 09:27:15 +0300 Subject: [Python-ideas] Round division In-Reply-To: <20150911031304.GT19373@ando.pearwood.info> References: <20150911031304.GT19373@ando.pearwood.info> Message-ID: On 11.09.15 06:13, Steven D'Aprano wrote: > How does this differ from round(a/b)? round() also rounds to even. >>> round(5000000000000000/9999999999999999) 0 >>> round(14999999999999999/10000000000000000) 2 But fractions 5000000000000000/9999999999999999 > 1/2 and 14999999999999999/10000000000000000 < 3/2. From xavier.combelle at gmail.com Fri Sep 11 08:34:43 2015 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Fri, 11 Sep 2015 08:34:43 +0200 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2015-09-11 6:54 GMT+02:00 Chris Angelico : > On Fri, Sep 11, 2015 at 2:44 PM, Stephen J. Turnbull > wrote: > > I suppose it would be too magic to have the seed method substitute the > > traditional PRNG for the default, while an implicitly seeded RNG > > defaults to a crypto strong algorithm? > > Ooh. Actually, I rather like that idea. If you don't seed the RNG, its > output will be unpredictable; it doesn't matter whether it's a PRNG > seeded by an unknown number, a PRNG seeded by /dev/urandom, a CSRNG, > or just reading from /dev/urandom every time. Until you explicitly > request determinism, you don't have it. If Python changes its RNG > algorithm and you haven't been seeding it, would you even know? Could > it ever matter to you? > > It would require a bit of an internals change; is it possible that > code depends on random.seed and random.randint are bound methods of > the same object? To implement what you describe, they'd probably have > to not be. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > I have thought of this idea and was quite seduced by it. However in this case on a non seeded generator, getstate/setstate would be meaningless. I also wonder what pickling generators does. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlehtosalo at gmail.com Fri Sep 11 08:38:24 2015 From: jlehtosalo at gmail.com (Jukka Lehtosalo) Date: Thu, 10 Sep 2015 23:38:24 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <9683c40d-b662-4b77-947e-62c418be8468@googlegroups.com> References: <9683c40d-b662-4b77-947e-62c418be8468@googlegroups.com> Message-ID: On Thu, Sep 10, 2015 at 11:57 AM, Matthias Kramm via Python-ideas < python-ideas at python.org> wrote: > On Wednesday, September 9, 2015 at 1:19:12 PM UTC-7, Guido van Rossum > wrote: >> >> Jukka wrote up a proposal for structural subtyping. It's pretty good. >> Please discuss. >> >> https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 >> > > I like this proposal; given Python's flat nominal type hierarchy, it will > be useful to have a parallel subtyping mechanism to give things finer > granularity without having to resort to ABCs. > > Are the return types of methods invariant or variant under this proposal? > > I.e. if I have > > class A(Protocol): > def f() -> int: ... > > does > > class B: > def f() -> bool: > return True > > implicitly implement the protocol A? > The proposal doesn't spell out the rules for subtyping, but we should follow the ordinary rules for subtyping for functions, and return types would behave covariantly. So the answer is yes. > Also, marking Protocols using subclassing seems confusing and error-prone. > In your examples above, one would think that you could define a new > protocol using > > class SizedAndClosable(Sized): > pass > > instead of > > class SizedAndClosable(Sized, Protocol): > pass > > because Sized is already a protocol. > The proposal also lets you define the protocols implemented by your class explicitly, and without having the explicit Protocol base class or some other marker these would be impossible to distinguish in general. Example: class MyList(Sized): # I want this to be a normal class, not a protocol. def __len__(self) -> int: return self.num_items class DerivedProtocol(Sized): # This should actually be a protocol. def foo(self) -> int: ... > Maybe the below would be a more intuitive syntax: > > @protocol > class SizedAndClosable(Sized): > pass > > We could use that. The tradeoff it that then we'd have some inconsistency depending on whether a protocol is generic or not: @protocol class A(metaclass=ProtocolMeta): # Non-generic protocol ... @protocol class B(Generic[T]): # Generic protocol. But this has a different metaclass than the above? ... I'm not sure if we can use ABCMeta for protocols as protocols may need some additional metaclass functionality. Anyway, any proposal should consider all these possible ways of defining protocols: 1. Basic protocol, no protocol inheritance 2. Generic protocol, no protocol inheritance 3. Basic protocol that inherits one or more protocols 4. Generic protocol that inherits one or more protocols My approach seems to deal with all of these reasonable well in my opinion (but I haven't implemented it yet!), but the tradeoff is that the Protocol base class needs to be present for all protocols. Jukka -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Sep 11 08:39:11 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 11 Sep 2015 15:39:11 +0900 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F269CA.9050708@canterbury.ac.nz> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <1441890163.3120507.379846857.49842A96@webmail.messagingengine.com> <55F269CA.9050708@canterbury.ac.nz> Message-ID: <87r3m5zchc.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > random832 at fastmail.us wrote: > > Being able to produce multiple independent streams of numbers is the > > important feature. Doing it by "jumping ahead" seems less so. > > Doing it by jumping ahead isn't strictly necessary; the > important thing is to have some way of generating > *provably* non-overlapping and independent sequences. By definition you don't have (stochastic) independence if you're using a PRNG and deterministically jumping ahead. Proving non-overlapping is easy, but I don't even have a definition of "independence" of fixed sequences: equidistribution of pairs? That might make sense if you have a sequence long enough to contain all pairs, but even then you really just have a single sequence with larger support, and I don't see how you can prove that it's a "good" sequence for using in a simulation. > Jumping ahead is one obvious way to achieve that. > Simply setting the seed of each generator randomly > and hoping for the best is not really good enough. It is not at all obvious to me that jumping ahead is better than randomly seeding separate generators. The latter actually gives stochastic independence (at least if you randomize over all possible seeds). From p.f.moore at gmail.com Fri Sep 11 10:02:47 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 11 Sep 2015 09:02:47 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 September 2015 at 05:44, Stephen J. Turnbull wrote: > I suppose it would be too magic to have the seed method substitute the > traditional PRNG for the default, while an implicitly seeded RNG > defaults to a crypto strong algorithm? One issue with that - often, programs simply use a RNG for their own purposes, but offer a means of getting the seed after the fact for reproducibility reasons (the "map seed" case, for example). Pseudo-code: if : state = random.setstate(state) else: state = random.getstate() ... do the program's main job, never calling seed/setstate if : print state So getstate (and setstate) would also need to switch to a PRNG. There's actually very few cases I can think of where I'd need seed() (as opposed to setstate()). Maybe if I let the user *choose* a seed Some games do this. Paul From encukou at gmail.com Fri Sep 11 10:08:38 2015 From: encukou at gmail.com (Petr Viktorin) Date: Fri, 11 Sep 2015 10:08:38 +0200 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Sep 11, 2015 at 6:54 AM, Chris Angelico wrote: > On Fri, Sep 11, 2015 at 2:44 PM, Stephen J. Turnbull wrote: >> I suppose it would be too magic to have the seed method substitute the >> traditional PRNG for the default, while an implicitly seeded RNG >> defaults to a crypto strong algorithm? > > Ooh. Actually, I rather like that idea. If you don't seed the RNG, its > output will be unpredictable; it doesn't matter whether it's a PRNG > seeded by an unknown number, a PRNG seeded by /dev/urandom, a CSRNG, > or just reading from /dev/urandom every time. Until you explicitly > request determinism, you don't have it. If Python changes its RNG > algorithm and you haven't been seeding it, would you even know? Could > it ever matter to you? > > It would require a bit of an internals change; is it possible that > code depends on random.seed and random.randint are bound methods of > the same object? To implement what you describe, they'd probably have > to not be. I've also thought about this idea. The problem with it is that seed() and friends affect a global instance of Random. If, after this change, there was a library that used random.random() for crypto, calling seed() in the main program (or any other library) would make it insecure. So we'd still be in a situation where nobody should use random() for crypto. From p.f.moore at gmail.com Fri Sep 11 10:11:37 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 11 Sep 2015 09:11:37 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 10 September 2015 at 23:46, Andrew Barnert wrote: > On Sep 10, 2015, at 07:21, Donald Stufft wrote: >> >> Either we can change the default to a secure >> CSPRNG and break these functions (and the people using them) which is however >> easily fixed by changing ``import random`` to >> ``import random; random = random.DeterministicRandom()`` > > But that isn't a fix, unless all your code is in a single module. If I call random.seed in game.py and then call random.choice in aiplayer.py, I'll get different results after your fix than I did before. Note that this is another case of wanting "correct by default". Requiring the user to pass around a RNG object makes it easy to do the wrong thing - because (as above) people can too easily create multiple independent RNGs by mistake, which means your numbers don't necessarily satisfy the randomness criteria any more. "Secure by default" isn't (and shouldn't be) the only example of "correct by default" that matters here. Whether "secure" is more important than "gives the right results" is a matter of opinion, and application dependent. Password generators have more need to be secure than to be mathematically random, Monte Carlo simulations (and to a lesser extent games) the other way around. Many things care about neither. If we can't manage "correct and secure by default", someone (and it won't be me) has to decide which end of the scale gets preference. Paul. From rosuav at gmail.com Fri Sep 11 10:57:32 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 11 Sep 2015 18:57:32 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Sep 11, 2015 at 6:08 PM, Petr Viktorin wrote: > I've also thought about this idea. The problem with it is that seed() > and friends affect a global instance of Random. > If, after this change, there was a library that used random.random() > for crypto, calling seed() in the main program (or any other library) > would make it insecure. So we'd still be in a situation where nobody > should use random() for crypto. So library functions shouldn't use random.random() for anything they know needs security. If you write a function generate_password(), the responsibility is yours to ensure that it's entropic rather than deterministic. That's no different from the current situation (seeding the RNG makes it deterministic) except that the unseeded RNG is not just harder to predict, it's actually entropic. In some cases, having the 99% by default is a barrier to people who need the 100%. (Conflating UCS-2 with Unicode deceives people into thinking their program works just fine, and then it fails on astral characters.) But in this case, there's no perfect-by-default solution, so IMO the best two solutions are: Be great, but vulnerable to an external seed(), until someone chooses; or have no random number generation until someone chooses. We know that the latter is a terrible option for learning, so vulnerability to someone else calling random.seed() is a small price to pay. ChrisA From njs at pobox.com Fri Sep 11 11:52:41 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 11 Sep 2015 02:52:41 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Sep 11, 2015 at 1:02 AM, Paul Moore wrote: > On 11 September 2015 at 05:44, Stephen J. Turnbull wrote: >> I suppose it would be too magic to have the seed method substitute the >> traditional PRNG for the default, while an implicitly seeded RNG >> defaults to a crypto strong algorithm? > > One issue with that - often, programs simply use a RNG for their own > purposes, but offer a means of getting the seed after the fact for > reproducibility reasons (the "map seed" case, for example). > > Pseudo-code: > > if : > state = > random.setstate(state) > else: > state = random.getstate() > ... do the program's main job, never calling seed/setstate > if : > print state > > So getstate (and setstate) would also need to switch to a PRNG. > > There's actually very few cases I can think of where I'd need seed() > (as opposed to setstate()). Maybe if I let the user *choose* a seed > Some games do this. You don't really want to use the full 4992 byte state for a "map seed" application anyway (type 'random.getstate()' in a REPL and watch your terminal scroll down multiple pages...). No game actually uses map seeds that look anything like that. I'm 99% sure that real applications in this category are actually using logic like: if : seed = user_seed() else: # use some RNG that was seeded with real entropy seed = random_short_printable_string() r = random.Random(seed) # now use 'r' to generate the map -n -- Nathaniel J. Smith -- http://vorpus.org From p.f.moore at gmail.com Fri Sep 11 12:03:42 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 11 Sep 2015 11:03:42 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 September 2015 at 10:52, Nathaniel Smith wrote: > You don't really want to use the full 4992 byte state for a "map seed" > application anyway (type 'random.getstate()' in a REPL and watch your > terminal scroll down multiple pages...). No game actually uses map > seeds that look anything like that. I'm 99% sure that real > applications in this category are actually using logic like: > > if : > seed = user_seed() > else: > # use some RNG that was seeded with real entropy > seed = random_short_printable_string() > r = random.Random(seed) > # now use 'r' to generate the map Yeah, good point. As I say, I don't actually *use* this in the example program I'm thinking of, I just know it's a feature I need to add in due course. So when I do, I'll have to look into how to best implement it. (And I'll probably nick the approach you show above, thanks ;-)) Paul From abarnert at yahoo.com Fri Sep 11 12:07:27 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 11 Sep 2015 03:07:27 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4C44F738-05E4-4557-8C24-B1B9B7A38E0D@yahoo.com> On Sep 11, 2015, at 02:52, Nathaniel Smith wrote: > >> On Fri, Sep 11, 2015 at 1:02 AM, Paul Moore wrote: >>> On 11 September 2015 at 05:44, Stephen J. Turnbull wrote: >>> I suppose it would be too magic to have the seed method substitute the >>> traditional PRNG for the default, while an implicitly seeded RNG >>> defaults to a crypto strong algorithm? >> >> One issue with that - often, programs simply use a RNG for their own >> purposes, but offer a means of getting the seed after the fact for >> reproducibility reasons (the "map seed" case, for example). >> >> Pseudo-code: >> >> if : >> state = >> random.setstate(state) >> else: >> state = random.getstate() >> ... do the program's main job, never calling seed/setstate >> if : >> print state >> >> So getstate (and setstate) would also need to switch to a PRNG. >> >> There's actually very few cases I can think of where I'd need seed() >> (as opposed to setstate()). Maybe if I let the user *choose* a seed >> Some games do this. > > You don't really want to use the full 4992 byte state for a "map seed" > application anyway (type 'random.getstate()' in a REPL and watch your > terminal scroll down multiple pages...). No game actually uses map > seeds that look anything like that. But games do store the entire map state with saved games if they want repeatable saves (e.g., to prevent players from defeating the RNG by save scumming). From p.f.moore at gmail.com Fri Sep 11 12:10:56 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 11 Sep 2015 11:10:56 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <4C44F738-05E4-4557-8C24-B1B9B7A38E0D@yahoo.com> References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> <4C44F738-05E4-4557-8C24-B1B9B7A38E0D@yahoo.com> Message-ID: On 11 September 2015 at 11:07, Andrew Barnert wrote: > But games do store the entire map state with saved games if they want repeatable saves (e.g., to prevent players from defeating the RNG by save scumming). So far off-topic it's not true, but a number of games I know of (e.g., Factorio, Minecraft) include a means to get a map seed (a simple text string) which you can publish, that allows other users to (in effect) play on the same map as you. That's different from saves. Paul From njs at pobox.com Fri Sep 11 12:26:07 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 11 Sep 2015 03:26:07 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Fri, Sep 11, 2015 at 1:11 AM, Paul Moore wrote: > On 10 September 2015 at 23:46, Andrew Barnert wrote: >> On Sep 10, 2015, at 07:21, Donald Stufft wrote: >>> >>> Either we can change the default to a secure >>> CSPRNG and break these functions (and the people using them) which is however >>> easily fixed by changing ``import random`` to >>> ``import random; random = random.DeterministicRandom()`` >> >> But that isn't a fix, unless all your code is in a single module. If I call random.seed in game.py and then call random.choice in aiplayer.py, I'll get different results after your fix than I did before. > > Note that this is another case of wanting "correct by default". > Requiring the user to pass around a RNG object makes it easy to do the > wrong thing - because (as above) people can too easily create multiple > independent RNGs by mistake, which means your numbers don't > necessarily satisfy the randomness criteria any more. Accidentally creating multiple independent RNGs is not going to cause any problems with respect to randomness. It only creates a problem with respect to determinism/reproducibility. Beyond that I just find your message a bit baffling. I guess I believe you that you find passing around RNG objects to make it easy to do the wrong thing, but it's exactly the opposite of my experience: when writing code that cares about determinism/reproducibility, then for me, passing around RNG objects makes it way *easier* to get things right. It makes it much more obvious what kinds of refactoring will break reproducibility, and it enables all kinds of useful tricks. E.g., keeping to the example of games and "aiplayer.py", a common thing game designers want to do is to record playthroughs so they can be replayed again as demos or whatever. And a common way to do that is to (1) record the player's inputs, (2) make sure that the way the game state evolves through time is deterministic given the players inputs. (This isn't necessarily the *best* strategy, but it is a common one.) Now suppose we're writing a game like this, and we have a bunch of "enemies", each of whose behavior is partially random. So on each "tick" we have to iterate through each enemy and update its state. If we are using a single global RNG, then for correctness it becomes crucial that we always iterate over all enemies in exactly the same order. Which is a mess. A better strategy is, keep one global RNG for the level, but then when each new enemy is spawned, assign it its own RNG that will be used to determine its actions, and seed this RNG using a value sampled from the global RNG (!). Now the overall pattern of the game will be just as random, still be deterministic, and -- crucially -- it no longer matters what order we iterate over the enemies in. I particularly would not want to use the global RNG in any program that was complicated enough to involve multiple modules. Passing state between inter-module calls using a global variable is pretty much always a bad plan, and that's exactly what you're talking about here. Non-deterministic global RNGs are fine, b/c they're semantically stateless; it's exactly the cases where you care about the determinism of the RNG state that you want to *stop* using the global RNG. -n -- Nathaniel J. Smith -- http://vorpus.org From rosuav at gmail.com Fri Sep 11 12:30:40 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 11 Sep 2015 20:30:40 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Fri, Sep 11, 2015 at 8:26 PM, Nathaniel Smith wrote: > A better strategy is, keep one global RNG for the level, but then when > each new enemy is spawned, assign it its own RNG that will be used to > determine its actions, and seed this RNG using a value sampled from > the global RNG (!). Now the overall pattern of the game will be just > as random, still be deterministic, and -- crucially -- it no longer > matters what order we iterate over the enemies in. As long as the order you seed their RNGs is deterministic. And if you can do that, can't you iterate over them in a deterministic order too? ChrisA From skrah at bytereef.org Fri Sep 11 13:07:38 2015 From: skrah at bytereef.org (Stefan Krah) Date: Fri, 11 Sep 2015 11:07:38 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Should_our_default_random_number_generat?= =?utf-8?q?or_be=09secure=3F?= References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Tim Peters writes: > Not saying switching is bad. Am saying I've seen no compelling > justification for causing users (& book & course authors & ....) such > pain. If this were Python 0.9.1 at issue, sure - but random.py's > basic API really hasn't changed since then. Agreed, and just recording my -1 for changing the API. Also, I'm noting that in *this* thread most people were at least moderately against the change. Stefan Krah From random832 at fastmail.us Fri Sep 11 14:42:55 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 11 Sep 2015 08:42:55 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1441975375.3458375.380808489.2E341B77@webmail.messagingengine.com> On Fri, Sep 11, 2015, at 00:54, Chris Angelico wrote: > It would require a bit of an internals change; is it possible that > code depends on random.seed and random.randint are bound methods of > the same object? That's a ridiculous thing to depend on. > To implement what you describe, they'd probably have > to not be. You could implement one class that calls either a SystemRandom instance or an instance of another class depending on which mode it is in. From mal at egenix.com Fri Sep 11 14:56:11 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 11 Sep 2015 14:56:11 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> Message-ID: <55F2CF6B.40301@egenix.com> On 10.09.2015 19:04, Xavier Combelle wrote: >> I think this is the major misunderstanding here: >> >> The random module never suggested that it generates pseudo-random data >> of crypto quality. >> >> I'm pretty sure people doing crypto will know and most others >> simply don't care :-) >> >> Evidence: We used a Wichmann-Hill PRNG as default in random >> for a decade and people still got their work done. Mersenne >> was added in Python 2.3 and bumped the period from >> 6,953,607,871,644 (13 digits) to 2**19937-1 (6002 digits). > > It is not a evidence, I have an evidence of the opposite: > some people can and does use random.random() for generating session key or > csrf tokens and it's an insecure default. It all depends on what you consider "secure" or "secure enough" and points directly to another misunderstanding: that "secure" is a well-defined term :-) The random module seeds its global Random instance using urandom (if available on the system), so while the generator itself is deterministic, the seed used to kick off the pseudo-random series is not. For many purposes, this is secure enough. It's also easy to make the output of the random instance more secure by passing it through a crypto hash function. But back to the original question: What is "secure" ? In crypto terms, "secure" usually refers to "computationally infeasible to calculate before the sun goes dark" (to take one variant). More realistically, it can be defined as: Based on the public knowledge known today, it's impossible to run a program which allows converting the output of a crypto function back to its inputs within a reasonable time span. And this property will - based on today's knowledge - hold for at least the next 5-10 years. You may notice the many parameters in these definition attempts. It all depends on who you ask. With the advent of new technologies like quantum computers, it's not at all clear that any of those definitions will still hold in a couple of years. It's well possible that only quantum computers will be able to implement the necessary programs and it'll take a while for mobile phones to catch up and come with chips implementing those ;-) Now, leaving aside this bright future, what's reasonable today ? If you look at tools like untwister: https://github.com/bishopfox/untwister you can get a feeling for how long it takes to deduce the seed from an output sequence. Bare in mind, that in order to be reasonably sure that the seed is correct, the available output sequence has to be long enough. That's a known plain text attack, so you need access to lots of session keys to begin with. The tools is still running on an example set of 1000 32-bit numbers and it says it'll be done in 1.5 hours, i.e. before the sun goes down in my timezone. I'll leave it running to see whether it can find my secret key. Untwister is only slightly smarter than bruteforce. Given that MT has a seed size of 32 bits, it's not surprising that a tool can find the seed within a day. Perhaps it's time to switch to a better version of MT, e.g. a 64-bit version (with 64-bit internal state): http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html or an even faster SIMD variant with better properties and 128 bit internal state: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html Esp. the latter will help make brute force attacks practically impossible. Tim ? BTW: Looking at the sources of the _random module, I found that the seed function uses the hash of non-integers such as e.g. strings passed to it as seeds. Given the hash randomization for strings this will create non-deterministic results, so it's probably wise to only use 32-bit integers as seed values for portability, if you need to rely on seeding the global Python RNG. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 11 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 7 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From random832 at fastmail.us Fri Sep 11 14:58:30 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 11 Sep 2015 08:58:30 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> <4C44F738-05E4-4557-8C24-B1B9B7A38E0D@yahoo.com> Message-ID: <1441976310.3463176.380840969.265C8003@webmail.messagingengine.com> On Fri, Sep 11, 2015, at 06:10, Paul Moore wrote: > On 11 September 2015 at 11:07, Andrew Barnert wrote: > > But games do store the entire map state with saved games if they want repeatable saves (e.g., to prevent players from defeating the RNG by save scumming). > > So far off-topic it's not true, but a number of games I know of (e.g., > Factorio, Minecraft) include a means to get a map seed (a simple text > string) which you can publish, that allows other users to (in effect) > play on the same map as you. That's different from saves. Of course, Minecraft doesn't actually use the seed in such a simple way as seeding a single-sequence random number generator. If it did, the map would depend on what order you visited regions in. (This is less of an issue for games with finite worlds) From steve at pearwood.info Fri Sep 11 15:36:13 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Sep 2015 23:36:13 +1000 Subject: [Python-ideas] DRAFT Re: Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <20150911133613.GW19373@ando.pearwood.info> On Thu, Sep 10, 2015 at 09:10:09AM -0400, Donald Stufft wrote: > Essentially, other than typing a little bit more, why is: > > ? ? import random > ? ? print(random.choice([?a?, ?b?, ?c?])) > > better than > > ? ? import random; > ? ? print(random.DetereministicRandom().choice([?a?, ?b?, ?C?])) Ironically, the spelling mistake in your example is a good example of how this is worse. Another reason why it's worse is that if you create a new instance every single time you need a random number, as you do above, performance is definitely going to suffer. By my timings, creating a new SystemRandom instance each time is around two times slower; creating a new DeterministicRandom (i.e. the current MT default) instance each time is over 100 times slower. Hypothetically, it may even hurt your randomness: it may be that some future (or current) (C)PRNG's quality will be "less random" (biased, predictable, or correlated) because you keep using a fresh instance rather than the same one. TL;DR: Yes, calling `random.choice` is *significantly better* than calling `random.SomethingRandom().choice`. It's better for beginners, it's even better for expert users whose random needs are small, and those whose needs are greater shouldn't be using the later anyway. > You're allowed to pick DeterministicRandom, you're even allowed to do it > without thinking. This isn't about making it impossible to ever insecurely use > random numbers, that's obviously a boil the ocean level of problem, this is > about trying to make it more likely that someone won't be hit by a fairly easy > to hit footgun if it does matter for them, even if they don't know it. It's > also about making code that is easier to understand on the surface, for example > without using the prior knowledge that it's using MT, tell me how you'd know > if this was safe or not: > > ? ? import random > ? ? import string > ? ? password = "".join(random.choice(string.ascii_letters) for _ in range(9)) > ? ? print("Your random password is",) Is this a trick question? In the absense of a keylogger and screen reader monitoring my system while I run that code snippet, of course it is safe. In the absence of any credible attack on the password based on how it was generated, of course it is safe. > Can you point out one use case where cryptographically safe random numbers, > assuming we could generate them as quickly as you asked for them, would hurt > you unless you needed/wanted to be able to save the seed and thus require or > want deterministic results? Nobody is saying that To put that question another way: "If you exclude the case where crypto would > Reminder that this warning does not show up (in any color, much less red) > if you?re using ``help(random)`` or ``dir(random)`` to explore the random > module. It also does not show up in code review when you see someone doing > random.random. > > It encourages you to write bad code, because it has a baked in assumption that > there is a sane default for a random number generator and expects people to > understand a fairly dificult concept, which is that not all "random" is equal. > > For instance, you've already made the mistake of saying you wanted "random" not > deterministic, but the two are not mutually exlusive and deterministic is a > property that a source of random can have, and one that you need for one of the > features you say you like.? > > > > > > Here?s a game a friend of mine created where the purpose of the game is > > > to essentially unrandomize some random data, which is only possible > > > because it?s (purposely) using MT to make it possible > > > https://github.com/reaperhulk/dsa-ctf. This is not an ivory tower paranoia > > > case, it?s a real concern that will absolutely fix some insecure software > > > out there instead of telling them ?welp typing a little bit extra once > > > an import is too much of a burden for me and really it?s your own fault > > > anyways?. > > > > I don't understand how that game (which is an interesting way of > > showing people how attacks on crypto work, sure, but that's just > > education, which you dismissed above) relates to the issue here. > > > > And I hope you don't really think that your quote is even remotely > > what I'm trying to say (I'm not that selfish) - my point is that not > > everything is security related. Not every application people write, > > and not every API in the stdlib. You're claiming that the random > > module is security related. I'm claiming it's not, it's documented as > > not being, and that's clear to the people who use it for its intended > > purpose. Telling those people that you want to make a module designed > > for their use harder to use because people for whom it's not intended > > can't read the documentation which explicitly states that it's not > > suitable for them, is doing a disservice to those people who are > > already using the module correctly for its stated purpose. > > I'm claiming that the term random is ambiguously both security related and > people to pick whether or not their use case is security related, or we should > assume that it is unless otherwise instructed. I don't particularly care what > the exact spelling of this looks like, random.(System|Secure)Random and > random.DeterministicRandom is just one option. > Another option is to look at > something closer to what Go did and deprecate the "random" module and move the > MT based thing to ``math.random`` and the CSPRNG can be moved to something like > crypto.random. This might be acceptable, although I wouldn't necessarily deprecate the random module. > > > > > By the same argument, we should remove the statistics module because > > it can be used by people with numerically unstable problems. (I doubt > > you'll find StackOverflow questions along these lines yet, but that's > > only because (a) the module's pretty new, and (b) it actually works > > pretty hard to handle the hard corner cases, but I bet they'll start > > turning up in due course, if only from the people who don't understand > > floating point...) > > > > No, by this argument we shouldn't have a function called statistics in the > statistics module because there is no globally "right" answer for what the > default should be. Should it be mean? mode? median? Why is *your* use case the > "right" use case for the default option, particularly in a situation where > picking the wrong option can be disastrous. > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Fri Sep 11 15:49:48 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 11 Sep 2015 23:49:48 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> Message-ID: <20150911134948.GX19373@ando.pearwood.info> On Thu, Sep 10, 2015 at 04:08:09PM +1000, Chris Angelico wrote: > On Thu, Sep 10, 2015 at 11:50 AM, Andrew Barnert via Python-ideas > wrote: > > Of course it adds the cost of making the module slower, and also > > more complex. Maybe a better solution would be to add a > > random.set_default_instance function that replaced all of the > > top-level functions with bound methods of the instance (just like > > what's already done at startup in random.py)? That's simple, and > > doesn't slow down anything, and it seems like it makes it more clear > > what you're doing than setting random.inst. > > +1. A single function call that replaces all the methods adds a > minuscule constant to code size, run time, etc, and it's no less > readable than assignment to a module attribute. Making monkey-patching the official, recommended way to choose a PRNG is a risky solution, to put it mildly. That means that at any time, some other module that is directly or indirectly imported might change the random number generators you are using without your knowledge. You want a crypto PRNG, but some module replaces it with MT. Or visa versa. Technically, it is true that (this being Python) they can do this now, just by assigning to the random module: random.random = lambda: 9 but that is clearly abusive, and if you write code to do that, you're asking for whatever trouble you get. There's no official API to screw over other callers of the random module behind their back. You're suggesting that we add one. > (If anything, it makes > it more clearly a supported operation Which is exactly why this is a terrible idea. You're making monkey- patching not only officially supported, but encouraged. That will not end well. -- Steve From graffatcolmingov at gmail.com Fri Sep 11 15:50:01 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Fri, 11 Sep 2015 08:50:01 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150909190757.GM19373@ando.pearwood.info> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: On Wed, Sep 9, 2015 at 2:07 PM, Steven D'Aprano wrote: > On Wed, Sep 09, 2015 at 02:55:01PM -0400, random832 at fastmail.us wrote: >> On Wed, Sep 9, 2015, at 14:31, Tim Peters wrote: >> > Also over & over again. If you volunteer to own responsibility for >> > updating all versions of Python each time it changes (in a crypto >> > context, an advance in the state of the art implies the prior state >> > becomes "a bug"), and post a performance bond sufficient to pay >> > someone else to do it if you vanish, then a major pragmatic objection >> > would go away ;-) >> >> I don't see how "Changing Python's RNG implementation today to >> arc4random as it exists now" necessarily implies "Making a commitment to >> guarantee the cryptographic suitability of Python's RNG for all time". >> Those are two separate things. > > Not really. Look at the subject line. It doesn't say "should we change > from MT to arc4random?", it asks if the default random number generator > should be secure. The only reason we are considering the change from MT > to arc4random is to make the PRNG cryptographically secure. "Secure" is > a moving target, what is secure today will not be secure tomorrow. > > Yes, in principle, we could make the change once, then never again. But > why bother? We don't gain anything from changing to arc4random if there > is no promise to be secure into the future. This is a good point. Let's remove the ssl library from Python too. Until recently, the most widely used versions of Python were all woefully behind the times and anyone wanting anything relatively up-to-date had to use a third party library. Even so, if you count the distributions of RHEL and other "Long Term Support" operating systems that are running Python 2.7 (pre 2.7.9) and below, most people are operating with barely secure versions of OpenSSL on versions of Python that don't have constants for modern secure communications standards. Clearly, deciding to add the ssl module was a huge mistake because it wasn't forwards compatible with future security standards. From random832 at fastmail.us Fri Sep 11 16:23:52 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 11 Sep 2015 10:23:52 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux Message-ID: <1441981432.3482349.380920585.5A4249C0@webmail.messagingengine.com> On Fri, Sep 11, 2015, at 09:36, Steven D'Aprano wrote: > Yes, calling `random.choice` is *significantly better* than calling > `random.SomethingRandom().choice`. It's better for beginners, it's even > better for expert users whose random needs are small, and those whose > needs are greater shouldn't be using the later anyway. Why is it that people who need deterministic/seed based random aren't considered to be "those whose needs are greater"? From cory at lukasa.co.uk Fri Sep 11 16:28:35 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 11 Sep 2015 15:28:35 +0100 Subject: [Python-ideas] DRAFT Re: Python's Source of Randomness and the random.py module Redux In-Reply-To: <20150911133613.GW19373@ando.pearwood.info> References: <20150911133613.GW19373@ando.pearwood.info> Message-ID: On 11 September 2015 at 14:36, Steven D'Aprano wrote: > Is this a trick question? > > In the absence of any credible attack on the password based on how it > was generated, of course it is safe. I feel like I must have misunderstood you Steven. Didn't you just exclude the attack vector that we're discussing here? What we are saying is that a deterministic PRNG definitionally allows attacks on the password based on how it was generated. The very nature of a deterministic PRNG is that it is possible to predict subsequent outputs based on previous ones, or at least to dramatically constrain the search space. This is not a hypothetical attack, and it's not even a very complicated one. Now, it's possible that the way the system is constructed precludes this attack, but let me tell you that vastly more engineers think that about their systems than are actually right about it. Generally, if the word 'password' appears anywhere near something, you want to keep a Mersenne Twister as far away from it as possible. The concern being highlighted in this thread is that users who don't know what I just said (the vast majority) are at risk of writing deeply insecure code. We think the default should be changed. From rosuav at gmail.com Fri Sep 11 16:33:29 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 12 Sep 2015 00:33:29 +1000 Subject: [Python-ideas] DRAFT Re: Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <20150911133613.GW19373@ando.pearwood.info> Message-ID: On Sat, Sep 12, 2015 at 12:28 AM, Cory Benfield wrote: > On 11 September 2015 at 14:36, Steven D'Aprano wrote: >> Is this a trick question? >> >> In the absence of any credible attack on the password based on how it >> was generated, of course it is safe. > > I feel like I must have misunderstood you Steven. Didn't you just > exclude the attack vector that we're discussing here? > > What we are saying is that a deterministic PRNG definitionally allows > attacks on the password based on how it was generated. Only if an attacker can access many passwords generated from the same MT stream, right? If the entire program is as was posted (importing random and using random.choice(), then terminating), then an attack would have to be based on the seeding of the RNG, not on the RNG itself. There simply isn't enough content being generated for you to be able to learn the internal state, and even if you did, the next run of the program will be freshly seeded anyway. ChrisA From cory at lukasa.co.uk Fri Sep 11 16:34:55 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 11 Sep 2015 15:34:55 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F2CF6B.40301@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> Message-ID: On 11 September 2015 at 13:56, M.-A. Lemburg wrote: > The random module seeds its global Random instance using urandom > (if available on the system), so while the generator itself is > deterministic, the seed used to kick off the pseudo-random series > is not. For many purposes, this is secure enough. Secure enough for what purposes? Certainly not generating a password, or anything that is 'password equivalent' (e.g. session cookies). As you acknowledge in the latter portion of your email, one can predict the future output of a Mersenne Twister by observing lots of previous values. If I get to see the output of your RNG, I can dramatically constrain the search space of other things it generated. It is not hard to see how you can mount a pretty trivial attack against web software using this, > It's also easy to make the output of the random instance more > secure by passing it through a crypto hash function. Or...just use a CSPRNG and save yourself the computation overhead of the hash? Besides, anyone who knows enough to hash their random numbers surely knows enough to use a CSPRNG, so who does this help? > Perhaps it's time to switch to a better version of MT, e.g. > a 64-bit version (with 64-bit internal state): > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html > > or an even faster SIMD variant with better properties and > 128 bit internal state: > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html > > Esp. the latter will help make brute force attacks practically > impossible. Or, we can move to a CSPRNG and stop trying to move the goalposts on MT? Or, do both? Using a better Mersenne Twister does not mean we shouldn't switch the default. From cory at lukasa.co.uk Fri Sep 11 16:38:12 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 11 Sep 2015 15:38:12 +0100 Subject: [Python-ideas] DRAFT Re: Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <20150911133613.GW19373@ando.pearwood.info> Message-ID: On 11 September 2015 at 15:33, Chris Angelico wrote: > Only if an attacker can access many passwords generated from the same > MT stream, right? If the entire program is as was posted (importing > random and using random.choice(), then terminating), then an attack > would have to be based on the seeding of the RNG, not on the RNG > itself. There simply isn't enough content being generated for you to > be able to learn the internal state, and even if you did, the next run > of the program will be freshly seeded anyway. Sure, if the entire program is as posted, but we should probably assume it isn't. Some programs definitely are, but I'm not worried about them: I'm worried about the ones that aren't. From donald at stufft.io Fri Sep 11 16:38:24 2015 From: donald at stufft.io (Donald Stufft) Date: Fri, 11 Sep 2015 10:38:24 -0400 Subject: [Python-ideas] DRAFT Re: Python's Source of Randomness and the random.py module Redux In-Reply-To: References: <20150911133613.GW19373@ando.pearwood.info> Message-ID: On September 11, 2015 at 10:33:55 AM, Chris Angelico (rosuav at gmail.com) wrote: > On Sat, Sep 12, 2015 at 12:28 AM, Cory Benfield wrote: > > On 11 September 2015 at 14:36, Steven D'Aprano wrote: > >> Is this a trick question? > >> > >> In the absence of any credible attack on the password based on how it > >> was generated, of course it is safe. > > > > I feel like I must have misunderstood you Steven. Didn't you just > > exclude the attack vector that we're discussing here? > > > > What we are saying is that a deterministic PRNG definitionally allows > > attacks on the password based on how it was generated. > > Only if an attacker can access many passwords generated from the same > MT stream, right? If the entire program is as was posted (importing > random and using random.choice(), then terminating), then an attack > would have to be based on the seeding of the RNG, not on the RNG > itself. There simply isn't enough content being generated for you to > be able to learn the internal state, and even if you did, the next run > of the program will be freshly seeded anyway. This is a silly, take that code, stick it in a web application and have it generating API keys or session identifiers instead of passwords, or hell, even? passwords or random tokens to reset password or any other such thing. Suddenly you have a case where you have a persistent process, so there isn't a new seed, and the attacker can more or less request an unlimited number of outputs. This isn't some mind boggling uncommon case. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From steve at pearwood.info Fri Sep 11 16:44:47 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Sep 2015 00:44:47 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> References: <5DBE2F72-DAB1-43D3-97F5-318D480E91FE@yahoo.com> <87si6lzhsh.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150911144447.GZ19373@ando.pearwood.info> On Fri, Sep 11, 2015 at 01:44:30PM +0900, Stephen J. Turnbull wrote: > I suppose it would be too magic to have the seed method substitute the > traditional PRNG for the default, while an implicitly seeded RNG > defaults to a crypto strong algorithm? Yes, too much magic. -- Steve From steve at pearwood.info Fri Sep 11 16:53:26 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Sep 2015 00:53:26 +1000 Subject: [Python-ideas] DRAFT Re: Python's Source of Randomness and the random.py module Redux In-Reply-To: <20150911133613.GW19373@ando.pearwood.info> References: <20150911133613.GW19373@ando.pearwood.info> Message-ID: <20150911145326.GA19373@ando.pearwood.info> Ah crap. Sorry folks, this post was *not supposed to go to the list* in this state. I'm having some trouble with my mail client (mutt) not saving drafts, so I intended to email it to myself for later editing, and didn't notice that the list was CCed. On Fri, Sep 11, 2015 at 11:36:13PM +1000, Steven D'Aprano wrote: [...] -- Steve From steve at pearwood.info Fri Sep 11 17:41:25 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Sep 2015 01:41:25 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <1441981432.3482349.380920585.5A4249C0@webmail.messagingengine.com> References: <1441981432.3482349.380920585.5A4249C0@webmail.messagingengine.com> Message-ID: <20150911154125.GC19373@ando.pearwood.info> Random832, You appear to have edited the subject line to remove the word "DRAFT". As I explained in an earlier post, that message was a draft and not intended to go to the list. Nevertheless, I will respond to your question below. On Fri, Sep 11, 2015 at 10:23:52AM -0400, random832 at fastmail.us wrote: > On Fri, Sep 11, 2015, at 09:36, Steven D'Aprano wrote: > > Yes, calling `random.choice` is *significantly better* than calling > > `random.SomethingRandom().choice`. It's better for beginners, it's even > > better for expert users whose random needs are small, and those whose > > needs are greater shouldn't be using the later anyway. > > Why is it that people who need deterministic/seed based random aren't > considered to be "those whose needs are greater"? I didn't say that. Read again: I give three groups of people: - Beginners, who are best served by calling `random.choice` rather than `random.SomethingRandom().choice`. - Those who are experts *and also* have "small" needs. I didn't define "small needs" because (1) I thought it was obvious in context and (2) the post was a draft and still in progress. What I mean by small needs is that they don't care about reproducibility, security, or having multiple independent PRNGs. - Those who *do* have "greater" needs, whether expert or not. Again, I thought in context it would be clear that greater needs includes such things as reproducibility, security or multiple independent PRNGs. In no case that I know of is it a good thing to be creating a brand-new instance for each and every call to the PRNG. At best, it is harmless, and only a little inefficient. At worst, it is a lot inefficient, and potentially may affect the reproducibility, security or statistical properties of the random numbers you generate. -- Steve From random832 at fastmail.us Fri Sep 11 17:55:38 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 11 Sep 2015 11:55:38 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <20150911154125.GC19373@ando.pearwood.info> References: <1441981432.3482349.380920585.5A4249C0@webmail.messagingengine.com> <20150911154125.GC19373@ando.pearwood.info> Message-ID: <1441986938.3505841.381040833.33561D29@webmail.messagingengine.com> On Fri, Sep 11, 2015, at 11:41, Steven D'Aprano wrote: > Random832, > > You appear to have edited the subject line to remove the word "DRAFT". > As I explained in an earlier post, that message was a draft and not > intended to go to the list. Sorry about that... I didn't see it until I went to send it, and I'd had some issues on my client causing me to have to fish my reply out of my own drafts folder; I assumed the presence of the word "DRAFT" was related to that and didn't realize it was on your original message. From wes.turner at gmail.com Fri Sep 11 18:05:03 2015 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 11 Sep 2015 11:05:03 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Sep 10, 2015 at 9:07 PM, Stephen J. Turnbull wrote: > Executive summary: > > The question is, "what value is there in changing the default to be > crypto strong to protect future security-sensitive applications from > naive implementers vs. the costs to current users who need to rewrite > their applications to explicitly invoke the current default?" > * [ ] DOC: note regarding the 'pseudo' part of pseudorandom and MT * https://docs.python.org/2/library/random.html * https://docs.python.org/3/library/random.html * [ ] DOC: upgrade cryptography docs in re: random numbers * https://cryptography.io/en/latest/random-numbers/ * [ ] ENH: random_(algo=) (~IRandomSource) * [ ] ENH: Add arc4random * [ ] ENH: Add chacha * [ ] ENH: Add OpenBSD's * [ ] BUG,SEC: new secure default named random.random * must also be stateful / **reproducible** (must support .seed) * justified as BUG,SEC because: [secure by default is the answer] * https://en.wikipedia.org/wiki/Session_fixation * https://cwe.mitre.org/data/definitions/384.html * The docs did not say "you should know better." * see also: hash collisions: https://bugs.python.org/issue13703 * [ ] REF: random.random -> random.random_old > > M.-A. Lemburg writes: > > > I'm pretty sure people doing crypto will know and most others > > simply don't care :-) > > Which is why botnets have millions of nodes. People who do web > security evidently believe that inappropriate RNGs have something to > do with widespread security issues. (That doesn't mean they're right, > but it gives me pause for thought -- evidently, Guido thought so too!) > > > Evidence: We used a Wichmann-Hill PRNG as default in random > > for a decade and people still got their work done. > > The question is not whether people get their work done. People work > (unless they're seriously dysfunctional), that's what people do. > Especially programmers (cf. GNU Manifesto). The question is whether > the work of the *crackers* is made significantly easier by security > holes that are opened by inappropriate use of random.random. > > I tend to agree with Steven d'A. (and others) that the answer is no: > it doesn't matter if the kind of person who leaves a key under the > third flowerpot from the left also habitually leaves the door unlocked > (especially if "I'm only gonna be gone for 5 minutes"), and I think > that's likely. IOW, installing crypto strong RNGs as default is *not* > analogous to the changes to SSL support that were so important that > they were backported to 2.7 in a late patch release. > > OTOH, why default to crypto weak if crypto strong is easily available? > You might save a few million Debian users from having to regenerate > all their SSH keys.[1] > > But the people who are "just getting work done" in new programs *won't > notice*. I don't think that they care what's under the hood of > random.random, as long as (1) the API stays the same, and (2) the > documentation clearly indicates where to find PRNGs that support > determinism, jumpahead, replicability, and all those other good > things, for the needs they doesn't have now but know they probably > will have some day. The rub is, as usual, existing applications that > would have to be changed for no reason that is relevant to them. > > Note that arc4random is much simpler to use than random.random. No > knobs to tweak or seeds to store for future reference. Seems > perfectly suited to "just getting work" done to me. OTOH, if you have > an application where you need replicability, jumpahead, etc, you're > going to need to read the docs enough to find the APIs for seeding and > so on. At design time, I don't see why it would hurt to select an > RNG algorithm explicitly as well. > > > Why not add ssl.random() et al. (as interface to the OpenSSL > > rand APIs) ? > > I like that naming proposal. I'm sure changing the nature of > random.random would annoy the heck out of *many* users. > > An alternative would be to add random.crypto. > > > Some background on why I think deterministic RNGs are more > > useful to have as default than non-deterministic ones: > > > > A common use case for me is to write test data generators > > for large database systems. For such generators, I don't keep > > the many GBs data around, but instead make the generator take a > > few parameters which then seed the RNGs, the time module and > > a few other modules via monkey-patching. > > If you've gone to that much effort, you evidently have read the docs > and it wouldn't have been a huge amount of trouble to use a > non-default module with a specified PRNG -- if you were doing it now. > But you have every right to be very peeved if you have a bunch of old > test runs you want to replicate with a new version of Python, and > we've changed the random.random RNG on you. > > > > Footnotes: > [1] I hasten to add that a programmer who isn't as smart as he thinks > he is who "improves" a crypto algorithm is far more likely than that > the implementer of a crypto suite would choose an RNG that is > inappropriate by design. Still, it's a theoretical possibility, and > security is about eliminating every theoretical possibility you can > think of. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Sep 11 18:08:09 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 12 Sep 2015 02:08:09 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: <20150911160809.GD19373@ando.pearwood.info> On Wed, Sep 09, 2015 at 09:23:23PM -0500, Tim Peters wrote: > [Steven D'Aprano] > > The default MT is certainly deterministic, and although only the output > > of random() itself is guaranteed to be reproducible, the other methods > > are *usually* stable in practice. > > > > There's a jumpahead method too, > > Not in Python. It is there, up to Python 2.7. I hadn't noticed it was gone in Python 3. -- Steve From mal at egenix.com Fri Sep 11 18:14:39 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 11 Sep 2015 18:14:39 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> Message-ID: <55F2FDEF.9010000@egenix.com> On 11.09.2015 16:34, Cory Benfield wrote: > On 11 September 2015 at 13:56, M.-A. Lemburg wrote: >> The random module seeds its global Random instance using urandom >> (if available on the system), so while the generator itself is >> deterministic, the seed used to kick off the pseudo-random series >> is not. For many purposes, this is secure enough. > > Secure enough for what purposes? Certainly not generating a password, > or anything that is 'password equivalent' (e.g. session cookies). > > As you acknowledge in the latter portion of your email, one can > predict the future output of a Mersenne Twister by observing lots of > previous values. If I get to see the output of your RNG, I can > dramatically constrain the search space of other things it generated. > It is not hard to see how you can mount a pretty trivial attack > against web software using this, In theory, yes, in practice it's not all that easy. I suggest to give untwister a try... it started with telling me it needs about 1.5 hours, then flipped to more than a year, now it's back to 6 hours. I'll leave it running for while to see whether it finishes today :-) >> It's also easy to make the output of the random instance more >> secure by passing it through a crypto hash function. > > Or...just use a CSPRNG and save yourself the computation overhead of > the hash? Besides, anyone who knows enough to hash their random > numbers surely knows enough to use a CSPRNG, so who does this help? There's a difference between taking a pseudo random number generator and applying a hash to it vs. using a CPRNG: A CPRNG will add entropy to its state at regular intervals, so there's no such thing as a seeded sequence. A RNG + hash still has the nice property of allowing to reproduce the sequence given the seed, but makes it much harder to determine the seed (brute force can be made arbitrarily hard via the hash function). The entropy in the output of the second variant is constant (only defined by the initial seed and the hash parameters), while it constantly increases in the CPRNG. Some more background on this: https://en.wikipedia.org/wiki/Entropy_%28information_theory%29 >> Perhaps it's time to switch to a better version of MT, e.g. >> a 64-bit version (with 64-bit internal state): >> >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html >> >> or an even faster SIMD variant with better properties and >> 128 bit internal state: >> >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html >> >> Esp. the latter will help make brute force attacks practically >> impossible. > > Or, we can move to a CSPRNG and stop trying to move the goalposts on > MT? Or, do both? Using a better Mersenne Twister does not mean we > shouldn't switch the default. I think it's worthwhile exposing the CPRNG from OpenSSL via the ssl module (see one of my earlier posts in this thread). People who need something as secure as their SSL implementation, can then get secure random numbers, while kids implementing coin flipping games can continue to use the well established API of the random module. Switching to a CPRNG in random would break the API, since some of the functions in the API would no longer be available (e.g. random.seed(), random.getstate(), random.setstate()). PS: Apart from the API issue, the default RNG in random would also have to be equidistributed and uniform, otherwise, the derivatives available in the module would no longer satisfy their expected distribution qualities. This is not needed when all you're interested in is to get some non-predictable random number for use in a session key :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 11 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 7 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From kramm at google.com Fri Sep 11 18:18:32 2015 From: kramm at google.com (Matthias Kramm) Date: Fri, 11 Sep 2015 09:18:32 -0700 (PDT) Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <9683c40d-b662-4b77-947e-62c418be8468@googlegroups.com> Message-ID: On Thursday, September 10, 2015 at 11:38:48 PM UTC-7, Jukka Lehtosalo wrote: > > The proposal doesn't spell out the rules for subtyping, but we should > follow the ordinary rules for subtyping for functions, and return types > would behave covariantly. So the answer is yes. > Ok. Note that this introduces some weird corner cases when trying to decide whether a class implements a protocol. Consider class P(Protocol): def f() -> P class A: def f() -> A It would be both valid to say that A *does not* implement P (because the return value of f is incompatible with P) as it would be to say that A *does* implement it (because once it does, the return value of f becomes compatible with P). For a more quirky example, consider class A(Protocol): def f(self) -> B def g(self) -> str class B(Protocol): def f(self) -> A def g(self) -> float class C: def f(self) -> D: return self.x def g(self): return self.y class D: def f(self) -> C: return self.x def g(self): return self.y Short of introducing intersection types, the protocols A and B are incompatible (because the return types of g() are mutually exclusive). Hence, C and D can, respectively, conform to either A or B, but not both. So the possible assignments are: C -> A D -> B *or* C -> B D -> A . It seems undecidable which of the two is the right one. (The structural type converter in pytype solves this by dropping the "mutually exclusive" constraint to the floor and making A and B both a C *and* a D, which you can do if all you want is a name for an anonymous structural type, But here you're using your structural types in type declarations, so that solution doesn't apply) Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From emile at fenx.com Fri Sep 11 18:18:52 2015 From: emile at fenx.com (Emile van Sebille) Date: Fri, 11 Sep 2015 09:18:52 -0700 Subject: [Python-ideas] Round division In-Reply-To: References: <20150911031304.GT19373@ando.pearwood.info> Message-ID: On 9/10/2015 11:27 PM, Serhiy Storchaka wrote: > On 11.09.15 06:13, Steven D'Aprano wrote: >> How does this differ from round(a/b)? round() also rounds to even. > > >>> round(5000000000000000/9999999999999999) > 0 > >>> round(14999999999999999/10000000000000000) > 2 > > But fractions 5000000000000000/9999999999999999 > 1/2 and > 14999999999999999/10000000000000000 < 3/2. > Wow -- I'm glad I work predominately in business environments and keep amounts in pennies. The only time I need to round anything is to the nearest cent. Emile From guido at python.org Fri Sep 11 18:30:35 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 11 Sep 2015 09:30:35 -0700 Subject: [Python-ideas] Round division In-Reply-To: References: <20150911031304.GT19373@ando.pearwood.info> Message-ID: On Fri, Sep 11, 2015 at 9:18 AM, Emile van Sebille wrote: > On 9/10/2015 11:27 PM, Serhiy Storchaka wrote: > >> On 11.09.15 06:13, Steven D'Aprano wrote: >> >>> How does this differ from round(a/b)? round() also rounds to even. >>> >> >> >>> round(5000000000000000/9999999999999999) >> 0 >> >>> round(14999999999999999/10000000000000000) >> 2 >> >> But fractions 5000000000000000/9999999999999999 > 1/2 and >> 14999999999999999/10000000000000000 < 3/2. >> >> > Wow -- I'm glad I work predominately in business environments and keep > amounts in pennies. The only time I need to round anything is to the > nearest cent. > I thought any programmer worth their salt would round down (i.e. trunc()) and transfer the fractional penny to their own account? :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 11 18:36:55 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 11 Sep 2015 11:36:55 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F2CF6B.40301@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> Message-ID: [M.-A. Lemburg ] > ... > Now, leaving aside this bright future, what's reasonable today ? > > If you look at tools like untwister: > > https://github.com/bishopfox/untwister > > you can get a feeling for how long it takes to deduce the > seed from an output sequence. Bare in mind, that in order > to be reasonably sure that the seed is correct, the available > output sequence has to be long enough. > > That's a known plain text attack, so you need access to lots > of session keys to begin with. > > The tools is still running on an example set of 1000 32-bit > numbers and it says it'll be done in 1.5 hours, i.e. before > the sun goes down in my timezone. I'll leave it running to > see whether it can find my secret key. I'm only going to talk about current Python 3, because _any_ backward incompatible change is off limits for a bugfix release. So: 1. untwister appears _mostly_ to be probing for poor seeding schemes. Python 3's default "by magic" seeding is unimpeachable ;-) It's computationally infeasible to attack it. 2. If they knew they were targeting MT, and had 624 consecutive 32-bit outputs, they could compute MT's full internal state essentially instantly. #2 is hard to get, though. These "pick a passward" examples are only using a relative handful of bits from each 32-bit MT output. Attacks with such spotty info are "exponentially harder". > Untwister is only slightly smarter than bruteforce. Given > that MT has a seed size of 32 bits, it's not surprising that > a tool can find the seed within a day. No no no. MT's state is 19937 bits, and current .seed() implementations use every bit you pass to .seed(). By default, current Python seeds the state with 2500 bytes (20000 bits) from the system .urandom() (if available). That's why it's computationally infeasible for "poor seeding" searches to attack the default seeding: they have a space of 2**19937-1 (the all-0 state can't occur) to search through, each of which is equally likely (assuming the system .urandom() is doing _its_ job). Of course the user can screw that up by using their _own_ seed. But, by default, current Pythons already do the best possible seeding job. > Perhaps it's time to switch to a better version of MT, e.g. > a 64-bit version (with 64-bit internal state): > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html > > or an even faster SIMD variant with better properties and > 128 bit internal state: > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html > > Esp. the latter will help make brute force attacks practically > impossible. > > Tim ? We already have a 19937-bit internal state, and current seeding schemes don';t hide that. I would like to move to a different generator entirely someday, but not before some specific better-than-MT alternative gains significant traction outside the Python world ("better a follower than a leader" in this area). > BTW: Looking at the sources of the _random module, I found that > the seed function uses the hash of non-integers such as e.g. > strings passed to it as seeds. Given the hash randomization > for strings this will create non-deterministic results, so it's > probably wise to only use 32-bit integers as seed values for > portability, if you need to rely on seeding the global Python > RNG. None of that applies to Python 3. `seed()` string inputs go through this path now: if isinstance(a, (str, bytes, bytearray)): if isinstance(a, str): a = a.encode() a += _sha512(a).digest() a = int.from_bytes(a, 'big') super().seed(a) IOW, a crypto hash is _appended_ to the string, but no info from the original string is lost (but, if you ask me, this particular step is useless - it adds no "new entropy"). The whole mess is converted to a giant integer, again with no loss of input information. And every bit of the giant integer affects what `super().seed(a) does`. From wes.turner at gmail.com Fri Sep 11 18:53:46 2015 From: wes.turner at gmail.com (Wes Turner) Date: Fri, 11 Sep 2015 11:53:46 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: devguide/documenting.html#security-considerations-and-other-concerns https://docs.python.org/devguide/documenting.html#security-considerations-and-other-concerns On Fri, Sep 11, 2015 at 11:05 AM, Wes Turner wrote: > > > On Thu, Sep 10, 2015 at 9:07 PM, Stephen J. Turnbull > wrote: > >> Executive summary: >> >> The question is, "what value is there in changing the default to be >> crypto strong to protect future security-sensitive applications from >> naive implementers vs. the costs to current users who need to rewrite >> their applications to explicitly invoke the current default?" >> > > * [ ] DOC: note regarding the 'pseudo' part of pseudorandom and MT > * https://docs.python.org/2/library/random.html > * https://docs.python.org/3/library/random.html > > * [ ] DOC: upgrade cryptography docs in re: random numbers > * https://cryptography.io/en/latest/random-numbers/ > > > * [ ] ENH: random_(algo=) (~IRandomSource) > * [ ] ENH: Add arc4random > * [ ] ENH: Add chacha > * [ ] ENH: Add OpenBSD's > > * [ ] BUG,SEC: new secure default named random.random > * must also be stateful / **reproducible** (must support .seed) > * justified as BUG,SEC because: [secure by default is the answer] > * https://en.wikipedia.org/wiki/Session_fixation > * https://cwe.mitre.org/data/definitions/384.html > * The docs did not say "you should know better." > * see also: hash collisions: https://bugs.python.org/issue13703 > > * [ ] REF: random.random -> random.random_old > > > > >> >> M.-A. Lemburg writes: >> >> > I'm pretty sure people doing crypto will know and most others >> > simply don't care :-) >> >> Which is why botnets have millions of nodes. People who do web >> security evidently believe that inappropriate RNGs have something to >> do with widespread security issues. (That doesn't mean they're right, >> but it gives me pause for thought -- evidently, Guido thought so too!) >> >> > Evidence: We used a Wichmann-Hill PRNG as default in random >> > for a decade and people still got their work done. >> >> The question is not whether people get their work done. People work >> (unless they're seriously dysfunctional), that's what people do. >> Especially programmers (cf. GNU Manifesto). The question is whether >> the work of the *crackers* is made significantly easier by security >> holes that are opened by inappropriate use of random.random. >> >> I tend to agree with Steven d'A. (and others) that the answer is no: >> it doesn't matter if the kind of person who leaves a key under the >> third flowerpot from the left also habitually leaves the door unlocked >> (especially if "I'm only gonna be gone for 5 minutes"), and I think >> that's likely. IOW, installing crypto strong RNGs as default is *not* >> analogous to the changes to SSL support that were so important that >> they were backported to 2.7 in a late patch release. >> >> OTOH, why default to crypto weak if crypto strong is easily available? >> You might save a few million Debian users from having to regenerate >> all their SSH keys.[1] >> >> But the people who are "just getting work done" in new programs *won't >> notice*. I don't think that they care what's under the hood of >> random.random, as long as (1) the API stays the same, and (2) the >> documentation clearly indicates where to find PRNGs that support >> determinism, jumpahead, replicability, and all those other good >> things, for the needs they doesn't have now but know they probably >> will have some day. The rub is, as usual, existing applications that >> would have to be changed for no reason that is relevant to them. >> >> Note that arc4random is much simpler to use than random.random. No >> knobs to tweak or seeds to store for future reference. Seems >> perfectly suited to "just getting work" done to me. OTOH, if you have >> an application where you need replicability, jumpahead, etc, you're >> going to need to read the docs enough to find the APIs for seeding and >> so on. At design time, I don't see why it would hurt to select an >> RNG algorithm explicitly as well. >> >> > Why not add ssl.random() et al. (as interface to the OpenSSL >> > rand APIs) ? >> >> I like that naming proposal. I'm sure changing the nature of >> random.random would annoy the heck out of *many* users. >> >> An alternative would be to add random.crypto. >> >> > Some background on why I think deterministic RNGs are more >> > useful to have as default than non-deterministic ones: >> > >> > A common use case for me is to write test data generators >> > for large database systems. For such generators, I don't keep >> > the many GBs data around, but instead make the generator take a >> > few parameters which then seed the RNGs, the time module and >> > a few other modules via monkey-patching. >> >> If you've gone to that much effort, you evidently have read the docs >> and it wouldn't have been a huge amount of trouble to use a >> non-default module with a specified PRNG -- if you were doing it now. >> But you have every right to be very peeved if you have a bunch of old >> test runs you want to replicate with a new version of Python, and >> we've changed the random.random RNG on you. >> >> >> >> Footnotes: >> [1] I hasten to add that a programmer who isn't as smart as he thinks >> he is who "improves" a crypto algorithm is far more likely than that >> the implementer of a crypto suite would choose an RNG that is >> inappropriate by design. Still, it's a theoretical possibility, and >> security is about eliminating every theoretical possibility you can >> think of. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kramm at google.com Fri Sep 11 18:18:32 2015 From: kramm at google.com (Matthias Kramm) Date: Fri, 11 Sep 2015 09:18:32 -0700 (PDT) Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <9683c40d-b662-4b77-947e-62c418be8468@googlegroups.com> Message-ID: On Thursday, September 10, 2015 at 11:38:48 PM UTC-7, Jukka Lehtosalo wrote: > > The proposal doesn't spell out the rules for subtyping, but we should > follow the ordinary rules for subtyping for functions, and return types > would behave covariantly. So the answer is yes. > Ok. Note that this introduces some weird corner cases when trying to decide whether a class implements a protocol. Consider class P(Protocol): def f() -> P class A: def f() -> A It would be both valid to say that A *does not* implement P (because the return value of f is incompatible with P) as it would be to say that A *does* implement it (because once it does, the return value of f becomes compatible with P). For a more quirky example, consider class A(Protocol): def f(self) -> B def g(self) -> str class B(Protocol): def f(self) -> A def g(self) -> float class C: def f(self) -> D: return self.x def g(self): return self.y class D: def f(self) -> C: return self.x def g(self): return self.y Short of introducing intersection types, the protocols A and B are incompatible (because the return types of g() are mutually exclusive). Hence, C and D can, respectively, conform to either A or B, but not both. So the possible assignments are: C -> A D -> B *or* C -> B D -> A . It seems undecidable which of the two is the right one. (The structural type converter in pytype solves this by dropping the "mutually exclusive" constraint to the floor and making A and B both a C *and* a D, which you can do if all you want is a name for an anonymous structural type, But here you're using your structural types in type declarations, so that solution doesn't apply) Matthias -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Fri Sep 11 19:16:12 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 11 Sep 2015 12:16:12 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150911160809.GD19373@ando.pearwood.info> References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> <20150911160809.GD19373@ando.pearwood.info> Message-ID: [Steven D'Aprano] >>> The default MT is certainly deterministic, and although only the output >>> of random() itself is guaranteed to be reproducible, the other methods >>> are *usually* stable in practice. >>> >>> There's a jumpahead method too, [Tim] >> Not in Python. [Steve] > It is there, up to Python 2.7. I hadn't noticed it was gone in Python 3. Yes, there's something _called_ `,jumpahead()`, for backward compatibility with the old WIchmann-Hill generator. But what it does for MT is "eh - no idea what to do, so let's just make stuff up": def jumpahead(self, n): """Change the internal state to one that is likely far away from the current state. This method will not be in Py3.x, so it is better to simply reseed. """ # The super.jumpahead() method uses shuffling to change state, # so it needs a large and "interesting" n to work with. Here, # we use hashing to create a large n for the shuffle. s = repr(n) + repr(self.getstate()) n = int(_hashlib.new('sha512', s).hexdigest(), 16) super(Random, self).jumpahead(n) I doubt there's anything that can be proved about the result of doing that - except that it's almost certain it won't bear any relationship to what calling the generator `n` times instead would have done ;-) From mal at egenix.com Fri Sep 11 19:19:00 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 11 Sep 2015 19:19:00 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> Message-ID: <55F30D04.4010001@egenix.com> On 11.09.2015 18:36, Tim Peters wrote: > [M.-A. Lemburg ] >> ... >> Now, leaving aside this bright future, what's reasonable today ? >> >> If you look at tools like untwister: >> >> https://github.com/bishopfox/untwister >> >> you can get a feeling for how long it takes to deduce the >> seed from an output sequence. Bare in mind, that in order >> to be reasonably sure that the seed is correct, the available >> output sequence has to be long enough. >> >> That's a known plain text attack, so you need access to lots >> of session keys to begin with. >> >> The tools is still running on an example set of 1000 32-bit >> numbers and it says it'll be done in 1.5 hours, i.e. before >> the sun goes down in my timezone. I'll leave it running to >> see whether it can find my secret key. > > I'm only going to talk about current Python 3, because _any_ backward > incompatible change is off limits for a bugfix release. > > So: > > 1. untwister appears _mostly_ to be probing for poor seeding schemes. > Python 3's default "by magic" seeding is unimpeachable ;-) It's > computationally infeasible to attack it. > > 2. If they knew they were targeting MT, and had 624 consecutive 32-bit > outputs, they could compute MT's full internal state essentially > instantly. How would they do that ? MT's period is too large for things like rainbow tables. > #2 is hard to get, though. These "pick a passward" examples are only > using a relative handful of bits from each 32-bit MT output. Attacks > with such spotty info are "exponentially harder". > >> Untwister is only slightly smarter than bruteforce. Given >> that MT has a seed size of 32 bits, it's not surprising that >> a tool can find the seed within a day. > > No no no. MT's state is 19937 bits, and current .seed() > implementations use every bit you pass to .seed(). Ah, right. I was looking at init_genrand() in the C implementation which only takes a single 32-bit unsigned int as value. The init_by_array() function does take seeds which use all available bits. I guess untwister indeed only tries the 32-bit unsigned int seeding approach, as it keeps listing things like: Progress: 50.99% [2190032137 / 4294967295] ~128296.63/sec 4 hours 33 minutes 30 [-] > By default, current Python seeds the state with 2500 bytes (20000 > bits) from the system .urandom() (if available). That's why it's > computationally infeasible for "poor seeding" searches to attack the > default seeding: they have a space of 2**19937-1 (the all-0 state > can't occur) to search through, each of which is equally likely > (assuming the system .urandom() is doing _its_ job). > > Of course the user can screw that up by using their _own_ seed. But, > by default, current Pythons already do the best possible seeding job. > > >> Perhaps it's time to switch to a better version of MT, e.g. >> a 64-bit version (with 64-bit internal state): >> >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html >> >> or an even faster SIMD variant with better properties and >> 128 bit internal state: >> >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html >> >> Esp. the latter will help make brute force attacks practically >> impossible. >> >> Tim ? > > We already have a 19937-bit internal state, and current seeding > schemes don';t hide that. Ouch. I confused internal state with the output size. Sorry. So we're more than fine already and it's only the cracking tools that are apparently broken :-) > I would like to move to a different generator entirely someday, but > not before some specific better-than-MT alternative gains significant > traction outside the Python world ("better a follower than a leader" > in this area). Another candidate is the new WELL family: http://www.iro.umontreal.ca/~panneton/WELLRNG.html This has some nicer properties w/r to booting out of zeroland (as they call it: too many 0 bits in the seed). >> BTW: Looking at the sources of the _random module, I found that >> the seed function uses the hash of non-integers such as e.g. >> strings passed to it as seeds. Given the hash randomization >> for strings this will create non-deterministic results, so it's >> probably wise to only use 32-bit integers as seed values for >> portability, if you need to rely on seeding the global Python >> RNG. > > None of that applies to Python 3. Well, it still does for the .seed() C implementation in _random.c, but since that's overridden in Python 3's Random class, you can't access it anymore :-) > `seed()` string inputs go through > this path now: > > if isinstance(a, (str, bytes, bytearray)): > if isinstance(a, str): > a = a.encode() > a += _sha512(a).digest() > a = int.from_bytes(a, 'big') > super().seed(a) > > IOW, a crypto hash is _appended_ to the string, but no info from the > original string is lost (but, if you ask me, this particular step is > useless - it adds no "new entropy"). The whole mess is converted to a > giant integer, again with no loss of input information. And every bit > of the giant integer affects what `super().seed(a) does`. As far as I'm concerned this maps to case closed. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 11 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 7 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tim.peters at gmail.com Fri Sep 11 20:52:08 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 11 Sep 2015 13:52:08 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F30D04.4010001@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> Message-ID: [Tim] >> ... >> 2. If they knew they were targeting MT, and had 624 consecutive 32-bit >> outputs, they could compute MT's full internal state essentially >> instantly. [Marc-Andre] > How would they do that ? MT's period is too large for > things like rainbow tables. It's not trivial to figure out how to do this, but once you do, it works ;-) No search, or tables, of any kind are required. It's just simple (albeit non-obvious!) bit-fiddling to invert MT's state-to-output transformations to get the state back. Here's a very nice writeup: https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html From abarnert at yahoo.com Fri Sep 11 22:27:02 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 11 Sep 2015 13:27:02 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: <20150911134948.GX19373@ando.pearwood.info> References: <72597E4F-4E74-412D-8ED3-442E832232EF@yahoo.com> <20150911134948.GX19373@ando.pearwood.info> Message-ID: <9A57E7BB-4314-4929-B7F5-51764779F5D2@yahoo.com> On Sep 11, 2015, at 06:49, Steven D'Aprano wrote: > >> On Thu, Sep 10, 2015 at 04:08:09PM +1000, Chris Angelico wrote: >> On Thu, Sep 10, 2015 at 11:50 AM, Andrew Barnert via Python-ideas >> wrote: >>> Of course it adds the cost of making the module slower, and also >>> more complex. Maybe a better solution would be to add a >>> random.set_default_instance function that replaced all of the >>> top-level functions with bound methods of the instance (just like >>> what's already done at startup in random.py)? That's simple, and >>> doesn't slow down anything, and it seems like it makes it more clear >>> what you're doing than setting random.inst. >> >> +1. A single function call that replaces all the methods adds a >> minuscule constant to code size, run time, etc, and it's no less >> readable than assignment to a module attribute. > > Making monkey-patching the official, recommended way to choose a PRNG is > a risky solution, to put it mildly. But that's not the proposal. The proposal is to make explicitly passing around an instance the official, recommended way to choose a PRNG; monkey-patching is only the official, recommended way to quickly get legacy code working: once you see the warning about the potential problem and decide that the problem doesn't affect you, you write one standard line of code at the top of your main script instead of rewriting all of your modules and patching or updating every third-party module you use. As I said later, I think my later suggestion of just having a singleton DeterministicRandom instance (or even a submodule with the same interface) that you can explicitly import in place or random serves the same needs well enough, and is even simpler, and is more flexible (in particular, it can also be used for novices' "my first game" programs), so I'm no longer suggesting this. But that doesn't mean there's any benefit to mischaracterizing the suggestion (especially if Chris or anyone else still supports it even though I don't). From mal at egenix.com Fri Sep 11 22:44:46 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 11 Sep 2015 22:44:46 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> Message-ID: <55F33D3E.7000904@egenix.com> On 11.09.2015 20:52, Tim Peters wrote: > [Tim] >>> ... >>> 2. If they knew they were targeting MT, and had 624 consecutive 32-bit >>> outputs, they could compute MT's full internal state essentially >>> instantly. > > [Marc-Andre] >> How would they do that ? MT's period is too large for >> things like rainbow tables. > > It's not trivial to figure out how to do this, but once you do, it > works ;-) No search, or tables, of any kind are required. It's just > simple (albeit non-obvious!) bit-fiddling to invert MT's > state-to-output transformations to get the state back. Here's a very > nice writeup: > > https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html Indeed very nice. Thanks for the pointer. I wonder why untwister doesn't use this. I gave it 1000 32-bit integers, so it should have enough information to recover the seed in a short while, but it's still trying to find the seed. Oh, and it now shows: 5 days 21 hours left. I stopped it there. Anyone up for a random.recover_seed() function ? ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 11 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 7 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From random832 at fastmail.us Fri Sep 11 23:12:00 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 11 Sep 2015 17:12:00 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> Message-ID: <1442005920.3575903.381296913.53B6421C@webmail.messagingengine.com> On Wed, Sep 9, 2015, at 17:02, Nathaniel Smith wrote: > Keeping that promise in mind, an alternative would be to keep both > generators around, use the cryptographically secure one by default, and > switch to MT when someone calls > > seed(1234, generator="INSECURE LEGACY MT") > > But this would justifiably get us crucified by the security community, > because the above call would flip the insecure switch for your entire > program, including possibly other modules that were depending on random > to > provide secure bits. I just realized, OpenBSD has precisely this functionality, for the rand/random/rand48 functions, in the "_deterministic" versions of their respective seed functions. So that's probably not a terrible path to go down: Make a Random class that uses a CSPRNG and/or os.urandom until/unless it is explicitly seeded. Use that class for the global instance. We could probably skip the "make a separate function name to show you really mean it" because unlike C, Python has never encouraged explicitly seeding with the {time, pid, four bytes from /dev/random} when one doesn't actually want determinism. (The default seed in C for rand/random is *1*; for rand48 it is an implementation-defined, but specified to be constant, value). For completeness, have getstate return a tuple of a boolean (for which mode it is in) and whatever state Random returns. setstate can accept either this tuple, or for compatibility whatever Random uses. From encukou at gmail.com Fri Sep 11 23:48:54 2015 From: encukou at gmail.com (Petr Viktorin) Date: Fri, 11 Sep 2015 23:48:54 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <1442005920.3575903.381296913.53B6421C@webmail.messagingengine.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <1442005920.3575903.381296913.53B6421C@webmail.messagingengine.com> Message-ID: On Fri, Sep 11, 2015 at 11:12 PM, wrote: > On Wed, Sep 9, 2015, at 17:02, Nathaniel Smith wrote: >> Keeping that promise in mind, an alternative would be to keep both >> generators around, use the cryptographically secure one by default, and >> switch to MT when someone calls >> >> seed(1234, generator="INSECURE LEGACY MT") >> >> But this would justifiably get us crucified by the security community, >> because the above call would flip the insecure switch for your entire >> program, including possibly other modules that were depending on random >> to >> provide secure bits. > > I just realized, OpenBSD has precisely this functionality, for the > rand/random/rand48 functions, in the "_deterministic" versions of their > respective seed functions. So that's probably not a terrible path to go > down: > > Make a Random class that uses a CSPRNG and/or os.urandom until/unless it > is explicitly seeded. Use that class for the global instance. We could > probably skip the "make a separate function name to show you really mean > it" because unlike C, Python has never encouraged explicitly seeding > with the {time, pid, four bytes from /dev/random} when one doesn't > actually want determinism. (The default seed in C for rand/random is > *1*; for rand48 it is an implementation-defined, but specified to be > constant, value). > > For completeness, have getstate return a tuple of a boolean (for which > mode it is in) and whatever state Random returns. setstate can accept > either this tuple, or for compatibility whatever Random uses. Calling getstate() means yoy want to call setstate() at some point in the future, and have deterministic results. Getting the CSRNG state is dangerous (since it would allow replaying), and it's not even useful (since system entropy gets mixed in occasionally). Instead, in this scheme, getstate() should activate the deterministic RNG (seeding it if it's the first use), and return its state. setstate() would then also switch to the Twister, and seed it. From mal at egenix.com Sat Sep 12 00:59:01 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 12 Sep 2015 00:59:01 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F33D3E.7000904@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> <55F33D3E.7000904@egenix.com> Message-ID: <55F35CB5.6000701@egenix.com> On 11.09.2015 22:44, M.-A. Lemburg wrote: > On 11.09.2015 20:52, Tim Peters wrote: >> [Tim] >>>> ... >>>> 2. If they knew they were targeting MT, and had 624 consecutive 32-bit >>>> outputs, they could compute MT's full internal state essentially >>>> instantly. >> >> [Marc-Andre] >>> How would they do that ? MT's period is too large for >>> things like rainbow tables. >> >> It's not trivial to figure out how to do this, but once you do, it >> works ;-) No search, or tables, of any kind are required. It's just >> simple (albeit non-obvious!) bit-fiddling to invert MT's >> state-to-output transformations to get the state back. Here's a very >> nice writeup: >> >> https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html > > Indeed very nice. Thanks for the pointer. > > I wonder why untwister doesn't use this. I gave it 1000 32-bit > integers, so it should have enough information to recover the > seed in a short while, but it's still trying to find the seed. > Oh, and it now shows: 5 days 21 hours left. I stopped it there. > > Anyone up for a random.recover_seed() function ? ;-) Turns out this will have to be named random.recover_state(). Getting at the seed is too difficult, esp. for strings in Python 3, and not really worth the effort anyway. While implementing this, I found that there's a bit more trickery involved due to the fact that the MT RNG in Python writes the 624 words internal state in batches - once every 624 times the .getrandbits() function is called. So you may need up to 624*2 - 1 output values to determine a correct array of internal state values. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 6 days to go 2015-10-21: Python Meeting Duesseldorf ... 39 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From random832 at fastmail.us Sat Sep 12 01:12:38 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 11 Sep 2015 19:12:38 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <1442005920.3575903.381296913.53B6421C@webmail.messagingengine.com> Message-ID: <1442013158.85026.381453473.0F3F82A3@webmail.messagingengine.com> On Fri, Sep 11, 2015, at 17:48, Petr Viktorin wrote: > Calling getstate() means yoy want to call setstate() at some point in > the future, and have deterministic results. Getting the CSRNG state is > dangerous (since it would allow replaying), and it's not even useful > (since system entropy gets mixed in occasionally). > Instead, in this scheme, getstate() should activate the deterministic > RNG (seeding it if it's the first use), and return its state. > setstate() would then also switch to the Twister, and seed it. My thinking was that "CSRNG is enabled" should be regarded as a single state of the "magic switching RNG". The alternative would be that calling getstate on a magic switching RNG that is not already in deterministic mode is an error. From tim.peters at gmail.com Sat Sep 12 03:19:19 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 11 Sep 2015 20:19:19 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F35CB5.6000701@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> <55F33D3E.7000904@egenix.com> <55F35CB5.6000701@egenix.com> Message-ID: [Tim, on recovering MT state from outputs] >>> https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html [Marc-Andre] >> Indeed very nice. Thanks for the pointer. >> >> I wonder why untwister doesn't use this. I gave it 1000 32-bit >> integers, so it should have enough information to recover the >> seed in a short while, but it's still trying to find the seed. >> Oh, and it now shows: 5 days 21 hours left. I stopped it there. As you went on to discover, while the writeup gives enough to convince you it's possible, there are always details ;-) > Turns out this will have to be named random.recover_state(). > > Getting at the seed is too difficult, esp. for strings in Python 3, > and not really worth the effort anyway. It's flatly impossible to ever know what the seed was, unless you _also_ know exactly how many times MT was invoked before the first output you captured. Think about that a bit, and I'm sure you'll see that's obvious. Even if you did know how many times, it would still be impossible without more assumptions, since seed arguments can contain any number of bits. > While implementing this, I found that there's a bit more trickery > involved due to the fact that the MT RNG in Python writes the > 624 words internal state in batches - once every 624 times > the .getrandbits() function is called. > > So you may need up to 624*2 - 1 output values to determine a > correct array of internal state values. Don't be too sure about that. From an information-theoretic view, "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31 more bits than the internal state actually has. You don't need to reproduce Python's current internal MT state exactly, you only need to create _a_ MT state that will produce the same values forever after. Specifically, the index of the "current" vector element is an artifact of the implementation, and doesn't need to be reproduced. You're free to set that index to anything you like in _your_ MT state - the real goal is to get the same results. From tim.peters at gmail.com Sat Sep 12 05:23:42 2015 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 11 Sep 2015 22:23:42 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> <55F33D3E.7000904@egenix.com> <55F35CB5.6000701@egenix.com> Message-ID: [Marc-Andre] ... >> While implementing this, I found that there's a bit more trickery >> involved due to the fact that the MT RNG in Python writes the >> 624 words internal state in batches - once every 624 times >> the .getrandbits() function is called. >> >> So you may need up to 624*2 - 1 output values to determine a >> correct array of internal state values. [Tim] > Don't be too sure about that. From an information-theoretic view, > "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31 > more bits than the internal state actually has. You don't need to > reproduce Python's current internal MT state exactly, you only need to > create _a_ MT state that will produce the same values forever after. > Specifically, the index of the "current" vector element is an artifact > of the implementation, and doesn't need to be reproduced. You're free > to set that index to anything you like in _your_ MT state - the real > goal is to get the same results. Concrete proof of concept. First code to reconstruct state from 624 consecutive 32-bit outputs: def invert(transform, output, n=100): guess = output for i in range(n): newguess = transform(guess) if newguess == output: return guess guess = newguess raise ValueError("%r not invertible in %s tries" % (output, n)) t1 = lambda y: y ^ (y >> 11) t2 = lambda y: y ^ ((y << 7) & 0x9d2c5680) t3 = lambda y: y ^ ((y << 15) & 0xefc60000) t4 = lambda y: y ^ (y >> 18) def invert_mt(y): y = invert(t4, y) y = invert(t3, y) y = invert(t2, y) y = invert(t1, y) return y def guess_state(vec): assert len(vec) == 624 return (3, tuple(map(invert_mt, vec)) + (624,), None) Now we can try it: import random for i in range(129): random.random() That loop was just to move MT into "the middle" of its internal vector. Now grab values: vec = [random.getrandbits(32) for i in range(624)] Note that the `guess_state()` function above _always_ sets the index to 624. When it becomes obvious _why_ it does so, all mysteries will vanish ;-) Now create a distinct generator and force its state to the deduced state: newrand = random.Random() newrand.setstate(guess_state(vec)) And some quick sanity checks: for i in range(1000000): assert random.random() == newrand.random() for i in range(1000000): assert random.getrandbits(32) == newrand.getrandbits(32) The internal states are _not_ byte-for-byte identical. But they don't need to be. The artificial `index` bookkeeping variable allows hundreds of distinct spellings of _semantically_ identical states. From tritium-list at sdamon.com Sat Sep 12 05:29:15 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Fri, 11 Sep 2015 23:29:15 -0400 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> <55F33D3E.7000904@egenix.com> <55F35CB5.6000701@egenix.com> Message-ID: <55F39C0B.9090600@sdamon.com> My final thoughts on this entire topic is this: The suggestions made here, and in the other thread, are pointless api breaking changes that do no effect the stated target audience (people who actually need secure random numbers but are not getting them correctly - they will still find a way to do it wrong, changing the api wont fix that). The net effect is a longer support burden on 2.x - this proposes another porting headache for NO reason. From mal at egenix.com Sat Sep 12 13:31:48 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 12 Sep 2015 13:31:48 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> <55F33D3E.7000904@egenix.com> <55F35CB5.6000701@egenix.com> Message-ID: <55F40D24.8080008@egenix.com> On 12.09.2015 05:23, Tim Peters wrote: > [Marc-Andre] > ... >>> While implementing this, I found that there's a bit more trickery >>> involved due to the fact that the MT RNG in Python writes the >>> 624 words internal state in batches - once every 624 times >>> the .getrandbits() function is called. >>> >>> So you may need up to 624*2 - 1 output values to determine a >>> correct array of internal state values. > > [Tim] >> Don't be too sure about that. From an information-theoretic view, >> "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31 >> more bits than the internal state actually has. You don't need to >> reproduce Python's current internal MT state exactly, you only need to >> create _a_ MT state that will produce the same values forever after. >> Specifically, the index of the "current" vector element is an artifact >> of the implementation, and doesn't need to be reproduced. You're free >> to set that index to anything you like in _your_ MT state - the real >> goal is to get the same results. > > Concrete proof of concept. First code to reconstruct state from 624 > consecutive 32-bit outputs: > > def invert(transform, output, n=100): > guess = output > for i in range(n): > newguess = transform(guess) > if newguess == output: > return guess > guess = newguess > raise ValueError("%r not invertible in %s tries" % > (output, n)) > > t1 = lambda y: y ^ (y >> 11) > t2 = lambda y: y ^ ((y << 7) & 0x9d2c5680) > t3 = lambda y: y ^ ((y << 15) & 0xefc60000) > t4 = lambda y: y ^ (y >> 18) > > def invert_mt(y): > y = invert(t4, y) > y = invert(t3, y) > y = invert(t2, y) > y = invert(t1, y) > return y > > def guess_state(vec): > assert len(vec) == 624 > return (3, > tuple(map(invert_mt, vec)) + (624,), > None) > > Now we can try it: > > import random > for i in range(129): > random.random() > > That loop was just to move MT into "the middle" of its internal > vector. Now grab values: > > vec = [random.getrandbits(32) for i in range(624)] > > Note that the `guess_state()` function above _always_ sets the index > to 624. When it becomes obvious _why_ it does so, all mysteries will > vanish ;-) > > Now create a distinct generator and force its state to the deduced state: > > newrand = random.Random() > newrand.setstate(guess_state(vec)) > > And some quick sanity checks: > > for i in range(1000000): > assert random.random() == newrand.random() > for i in range(1000000): > assert random.getrandbits(32) == newrand.getrandbits(32) > > The internal states are _not_ byte-for-byte identical. But they don't > need to be. The artificial `index` bookkeeping variable allows > hundreds of distinct spellings of _semantically_ identical states. It's a rolling index, yes, but when creating the vector of output values, the complete internal state array will have undergone a recalc at one of the iterations. The guess_state(vec) function will thus return an internal state vector that is half state of the previous recalc run, half new recalc run, it is not obvious to me why you would still be able to get away with not synchronizing to the next recalc in order to have a complete state from the current recalc. Let's see... The values in the state array are each based on a) previous state[i] b) state[(i + 1) % 624] c) state[(i + 397) % 624] Since the calculation is forward looking, your trick will only work if you can make sure that i + 397 doesn't wrap around into the previous state before you trigger the recalc in newrand. Which is easy, of course, since you can control the current index of newrand and force it to do a recalc with the next call to .getrandbits() by setting it to 624. Clever indeed :-) Here's a better way to do the inversion without guess work: # 32-bits all set ALL_BITS_SET = 0xffffffffL def undo_bitshift_right_xor(value, shift, mask=ALL_BITS_SET): # Set shift high order bits; there's probably a better way to # do this, but this does the trick for now decoding_mask = (ALL_BITS_SET << (32 - shift)) & ALL_BITS_SET decoded_part = 0 result = 0 while decoding_mask > 0: decoded_part = (value ^ (decoded_part & mask)) & decoding_mask result |= decoded_part decoded_part >>= shift decoding_mask >>= shift return result def undo_bitshift_left_xor(value, shift, mask=ALL_BITS_SET): # Set shift low order bits decoding_mask = ALL_BITS_SET >> (32 - shift) decoded_part = 0 result = 0 while decoding_mask > 0: decoded_part = (value ^ (decoded_part & mask)) & decoding_mask result |= decoded_part decoded_part = (decoded_part << shift) & ALL_BITS_SET decoding_mask = (decoding_mask << shift) & ALL_BITS_SET return result def recover_single_state_value(value): value = undo_bitshift_right_xor(value, 18) value = undo_bitshift_left_xor(value, 15, 0xefc60000L) value = undo_bitshift_left_xor(value, 7, 0x9d2c5680L) value = undo_bitshift_right_xor(value, 11) return value def guess_state(data): if len(data) < 624: raise TypeError('not enough data to recover state') # Only work with the 624 last entries data = data[-624:] state = [recover_single_state_value(x) for x in data] return (3, tuple(state) + (624,), None) This is inspired by the work of James Roper, but uses a slightly faster approach for the undo functions. Not that it matters much. It was fun, that's what matters :-) Oh, in Python 3, you need to remove the 'L' after the constants. Too bad that it doesn't recognize those old annotations anymore. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 6 days to go 2015-10-21: Python Meeting Duesseldorf ... 39 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Sat Sep 12 13:35:48 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 12 Sep 2015 13:35:48 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F40D24.8080008@egenix.com> References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> <55F33D3E.7000904@egenix.com> <55F35CB5.6000701@egenix.com> <55F40D24.8080008@egenix.com> Message-ID: <55F40E14.6000907@egenix.com> On 12.09.2015 13:31, M.-A. Lemburg wrote: > On 12.09.2015 05:23, Tim Peters wrote: >> [Marc-Andre] >> ... >>>> While implementing this, I found that there's a bit more trickery >>>> involved due to the fact that the MT RNG in Python writes the >>>> 624 words internal state in batches - once every 624 times >>>> the .getrandbits() function is called. >>>> >>>> So you may need up to 624*2 - 1 output values to determine a >>>> correct array of internal state values. >> >> [Tim] >>> Don't be too sure about that. From an information-theoretic view, >>> "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31 >>> more bits than the internal state actually has. You don't need to >>> reproduce Python's current internal MT state exactly, you only need to >>> create _a_ MT state that will produce the same values forever after. >>> Specifically, the index of the "current" vector element is an artifact >>> of the implementation, and doesn't need to be reproduced. You're free >>> to set that index to anything you like in _your_ MT state - the real >>> goal is to get the same results. >> >> Concrete proof of concept. First code to reconstruct state from 624 >> consecutive 32-bit outputs: >> >> def invert(transform, output, n=100): >> guess = output >> for i in range(n): >> newguess = transform(guess) >> if newguess == output: >> return guess >> guess = newguess >> raise ValueError("%r not invertible in %s tries" % >> (output, n)) >> >> t1 = lambda y: y ^ (y >> 11) >> t2 = lambda y: y ^ ((y << 7) & 0x9d2c5680) >> t3 = lambda y: y ^ ((y << 15) & 0xefc60000) >> t4 = lambda y: y ^ (y >> 18) >> >> def invert_mt(y): >> y = invert(t4, y) >> y = invert(t3, y) >> y = invert(t2, y) >> y = invert(t1, y) >> return y >> >> def guess_state(vec): >> assert len(vec) == 624 >> return (3, >> tuple(map(invert_mt, vec)) + (624,), >> None) >> >> Now we can try it: >> >> import random >> for i in range(129): >> random.random() >> >> That loop was just to move MT into "the middle" of its internal >> vector. Now grab values: >> >> vec = [random.getrandbits(32) for i in range(624)] >> >> Note that the `guess_state()` function above _always_ sets the index >> to 624. When it becomes obvious _why_ it does so, all mysteries will >> vanish ;-) >> >> Now create a distinct generator and force its state to the deduced state: >> >> newrand = random.Random() >> newrand.setstate(guess_state(vec)) >> >> And some quick sanity checks: >> >> for i in range(1000000): >> assert random.random() == newrand.random() >> for i in range(1000000): >> assert random.getrandbits(32) == newrand.getrandbits(32) >> >> The internal states are _not_ byte-for-byte identical. But they don't >> need to be. The artificial `index` bookkeeping variable allows >> hundreds of distinct spellings of _semantically_ identical states. > > It's a rolling index, yes, but when creating the vector of output > values, the complete internal state array will have undergone > a recalc at one of the iterations. > > The guess_state(vec) function will thus return an internal > state vector that is half state of the previous recalc run, > half new recalc run, it is not obvious to me why you would > still be able to get away with not synchronizing to the next > recalc in order to have a complete state from the current recalc. > > Let's see... > > The values in the state array are each based on > > a) previous state[i] > b) state[(i + 1) % 624] > c) state[(i + 397) % 624] > > Since the calculation is forward looking, your trick will only > work if you can make sure that i + 397 doesn't wrap around > into the previous state before you trigger the recalc in > newrand. > > Which is easy, of course, since you can control the current > index of newrand and force it to do a recalc with the next > call to .getrandbits() by setting it to 624. > > Clever indeed :-) > > Here's a better way to do the inversion without guess work: > > # 32-bits all set > ALL_BITS_SET = 0xffffffffL > > def undo_bitshift_right_xor(value, shift, mask=ALL_BITS_SET): > > # Set shift high order bits; there's probably a better way to > # do this, but this does the trick for now > decoding_mask = (ALL_BITS_SET << (32 - shift)) & ALL_BITS_SET > decoded_part = 0 > result = 0 > while decoding_mask > 0: > decoded_part = (value ^ (decoded_part & mask)) & decoding_mask > result |= decoded_part > decoded_part >>= shift > decoding_mask >>= shift > return result > > def undo_bitshift_left_xor(value, shift, mask=ALL_BITS_SET): > > # Set shift low order bits > decoding_mask = ALL_BITS_SET >> (32 - shift) > decoded_part = 0 > result = 0 > while decoding_mask > 0: > decoded_part = (value ^ (decoded_part & mask)) & decoding_mask > result |= decoded_part > decoded_part = (decoded_part << shift) & ALL_BITS_SET > decoding_mask = (decoding_mask << shift) & ALL_BITS_SET > return result > > def recover_single_state_value(value): > > value = undo_bitshift_right_xor(value, 18) > value = undo_bitshift_left_xor(value, 15, 0xefc60000L) > value = undo_bitshift_left_xor(value, 7, 0x9d2c5680L) > value = undo_bitshift_right_xor(value, 11) > return value > > def guess_state(data): Hmm, the name doesn't fit anymore, better call it: def recover_state(data): > if len(data) < 624: > raise TypeError('not enough data to recover state') > > # Only work with the 624 last entries > data = data[-624:] > state = [recover_single_state_value(x) > for x in data] > return (3, > tuple(state) + (624,), > None) > > This is inspired by the work of James Roper, but uses a slightly > faster approach for the undo functions. Not that it matters much. > It was fun, that's what matters :-) > > Oh, in Python 3, you need to remove the 'L' after the constants. > Too bad that it doesn't recognize those old annotations anymore. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 6 days to go 2015-10-21: Python Meeting Duesseldorf ... 39 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tim.peters at gmail.com Sun Sep 13 03:00:17 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 12 Sep 2015 20:00:17 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F40D24.8080008@egenix.com> References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <55F2CF6B.40301@egenix.com> <55F30D04.4010001@egenix.com> <55F33D3E.7000904@egenix.com> <55F35CB5.6000701@egenix.com> <55F40D24.8080008@egenix.com> Message-ID: [Marc-Andre, puzzling over Tim's MT state-recovering hack] > ... > Since the calculation is forward looking, your trick will only > work if you can make sure that i + 397 doesn't wrap around > into the previous state before you trigger the recalc in > newrand. > > Which is easy, of course, since you can control the current > index of newrand and force it to do a recalc with the next > call to .getrandbits() by setting it to 624. > > Clever indeed :-) I'll suggest a different way to look at it: suppose you wanted to reproduce the state at _the start_ of the 624 values captured instead. Well, we'd do exactly the same thing, except set the index to 0. Then it's utterly obvious that your MT instance would spit out exactly the same 624 outputs as the ones captured. That's all the internals do when the index starts at 0: march through the state vector one word at a time, spitting out the tempered version of whichever 32-bit word is current. And increment the index each time (the only _mutation_ of any part of the MT internals). At the end of that, the only change to the internals is that the index would be left at 624. Which is exactly what "my code" sets it to. It acts exactly the same as if we had just finished generating the 624 captured outputs. Since we (in our heads) just reproduced enough bits to cover the entire internal state, it must be the case that we'll continue to reproduce all future outputs too. The "wrap around" is a red herring ;-) > Here's a better way to do the inversion without guess work: "Better" depends. Despite the variable named "guess" in the code, it's not guessing about anything ;-) It's a single function that doesn't care (and can't even be told) whether a left or right shift is being used, what the shift count is, whether a mask is in use, or even what the word size is. In those senses it's "better": it can be used without change for "anything like this", including, e.g., the 64-bit variant of MT. Just paste the C tempering lines into the lambdas. Nothing about the inversion function needs to change. But why it works efficiently is far from obvious. It _can_ take as many (but not more) iterations as there are bits in a word, but that's almost never needed. IIRC, it can never require more than 8 iterations to invert any of the tempering functions in the 32-bit MT, and, e.g., always inverts the very weak "lambda y: y ^ (y >> 18)" 32-bit MT transform on the first try. Nevertheless, you can - as you did - be more efficient by writing distinct inversion functions for "left shift" and "right shift" cases, and wiring in the word size. But the expense of deducing the state is just plain trivial here either way. We're not consuming days or hours here, we're not even consuming an appreciable fraction of a second at Python speed :-) From stephen at xemacs.org Sun Sep 13 03:53:05 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 13 Sep 2015 10:53:05 +0900 Subject: [Python-ideas] Round division In-Reply-To: References: <20150911031304.GT19373@ando.pearwood.info> Message-ID: <87mvwrytj2.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > On Fri, Sep 11, 2015 at 9:18 AM, Emile van Sebille wrote: > > Wow -- I'm glad I work predominately in business environments and keep > > amounts in pennies. The only time I need to round anything is to the > > nearest cent. > > > > I thought any programmer worth their salt would round down (i.e. trunc()) > and transfer the fractional penny to their own account? :-) Hate to tell you, but the accountants and even the SEC caught on to that one four decades ago. From stephen at xemacs.org Sun Sep 13 17:47:31 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 14 Sep 2015 00:47:31 +0900 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Tim Peters writes: > [M.-A. Lemburg] > >> I'm pretty sure people doing crypto will know and most others > >> simply don't care :-) > > [Stephen J. Turnbull ] > > Which is why botnets have millions of nodes. > > I'm not a security wonk, but I'll bet a life's salary ;-) we'd have > botnets just as pervasive if every non-crypto RNG in the world were > banned - or had never existed. I am in violent agreement with you on that point and most others.[1] However, the analogy was not intended to be so direct as to imply that "insecure" RNGs are responsible for botnets, merely that not caring is. I agree I twisted MAL's words a bit -- he meant most people have no technical need for crypto, and so don't care, I suppose. But then, "doing crypto" (== security) is like "speaking prose": a lot of folks doing it don't realize that's what they're doing -- and they don't care, either. > So long as end users are allowed to run programs, that problem will > never go away. s/users/programmers/ and s/run/write/, and we get a different analogy that is literally correct -- but fails in an important dimension. One user's mistake adds one node to the botnet, and that's about as bad as one user's mistake gets in terms of harm to third parties. But one programmer's (or system administrator's) mistake can put many, perhaps millions, at risk. Personally I doubt that justifies an API break here, even if we can come up with attacks where breaking the PRNG would be cost-effective compared to "social engineering" or theft of physical media. I think it does justify putting quite a bit of thought into ways to make it easier for naive programmers to do the "safe" thing even if they technically don't need it. I will say that IMO the now-traditional API was a very unfortunate choice. If you have a CSPRNG that just generates "uniform random numbers" and has no user-visible APIs for getting or setting state, it's immediately obvious to the people who know they need access to state what they need to do -- change "RNG" implementation. The most it might cost them is rerunning an expensive base case simulation with a more appropriate implementation that provides the needed APIs. On the other hand, if you have something like the MT that "shouldn't be allowed anywhere near a password", it's easy to ignore the state access APIs, and call it the same way that you would call a CSPRNG. In fact that's what's documented as correct usage, as Paul Moore points out. Thus, programmers who are using a PRNG whose parameters can be inferred from its output, and should not be doing so, generally won't know it until the (potentially widespread) harm is done. It would be nice if it wasn't so easy for them to use the MT. Footnotes: [1] I think "agree with Tim" is a pretty safe default. From njs at pobox.com Mon Sep 14 04:54:51 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Sep 2015 19:54:51 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: [This is getting fairly off-topic for python-ideas (since AFAICT there is no particular reason right now to add a new deterministic generator to the stdlib), so CC'ing numpy-discussion and I'd suggest followups be directed to there alone.] On Thu, Sep 10, 2015 at 10:41 AM, Robert Kern wrote: > On 2015-09-10 04:56, Nathaniel Smith wrote: >> >> On Wed, Sep 9, 2015 at 8:35 PM, Tim Peters wrote: >>> >>> There are some clean and easy approaches to this based on >>> crypto-inspired schemes, but giving up crypto strength for speed. If >>> you haven't read it, this paper is delightful: >>> >>> http://www.thesalmons.org/john/random123/papers/random123sc11.pdf >> >> >> It really is! As AES acceleration instructions become more common >> (they're now standard IIUC on x86, x86-64, and even recent ARM?), even >> just using AES in CTR mode becomes pretty compelling -- it's fast, >> deterministic, provably equidistributed, *and* cryptographically >> secure enough for many purposes. > > > I'll also recommend the PCG paper (and algorithm) as the author's > cross-PRNGs comparisons have been bandied about in this thread already. The > paper lays out a lot of the relevant issues and balances the various > qualities that are important: statistical quality, speed, and security (of > various flavors). > > http://www.pcg-random.org/paper.html > > I'm actually not that impressed with Random123. The core idea is nice and > clean, but the implementation is hideously complex. I'm curious what you mean by this? In terms of the code, or...? I haven't looked at the code, but the simplest generator they recommend in the paper is literally just def rng_stream(seed): counter = 0 while True: # AES128 takes a 128 bit key and 128 bits of data and returns 128 bits of encrypted data val = AES128(key=seed, data=counter) yield low_64_bits(val) yield high_64_bits(val) counter += 1 which gives a 64-bit generator with a period of 2^129. They benchmark it as faster than the Mersenne Twister on modern CPUs (<2 cycles-per-byte on recent x86, x86-64, ARMv8), it requires less scratch space, is incredibly simple to work with -- you can parallelize it, get independent random streams, etc., in a totally trivial way -- and has a *way* stronger guarantee of random-looking-ness than merely passing TestU01. The downsides are that it does still require 176 bytes of read-only scratch storage (used to cache an expanded version of the "key"), the need for a modern CPU to get that speed, and that it doesn't play well with GPUs. So they also provide a set of three more ad hoc generators designed to solve these problems. I'm not as convinced about these, but hey, they pass TestU01... The PCG paper does a much better job of all the other stuff *around* making a good RNG -- it has the nice website, clear comparisons, nice code, etc. -- which is definitely important. But to me the increase in speed and reduction in memory use doesn't seem worth it given how fast these generators are to start with + the nice properties of counter mode + and cryptographic guarantees of randomness that you get from AES, for code that's generally targeting non-embedded non-GPU systems. -n -- Nathaniel J. Smith -- http://vorpus.org From tim.peters at gmail.com Mon Sep 14 05:34:58 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 13 Sep 2015 22:34:58 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: [Robert Kern ] >> ... >> I'll also recommend the PCG paper (and algorithm) as the author's >> cross-PRNGs comparisons have been bandied about in this thread already. The >> paper lays out a lot of the relevant issues and balances the various >> qualities that are important: statistical quality, speed, and security (of >> various flavors). >> >> http://www.pcg-random.org/paper.html >> >> I'm actually not that impressed with Random123. The core idea is nice and >> clean, but the implementation is hideously complex. [Nathaniel Smith ] > I'm curious what you mean by this? In terms of the code, or...? > > I haven't looked at the code, but the simplest generator they > recommend in the paper is literally just > > def rng_stream(seed): > counter = 0 > while True: > # AES128 takes a 128 bit key and 128 bits of data and returns > 128 bits of encrypted data > val = AES128(key=seed, data=counter) > yield low_64_bits(val) > yield high_64_bits(val) > counter += 1 I assume it's because if you expand what's required to _implement_ AES128() in C, it does indeed look pretty hideously complex. On HW implementing AES primitives, of course the code can be much simpler. But to be fair, if integer multiplication and/or addition weren't implemented in HW, and we had to write to C code emulating them via bit-level fiddling, the code for any of the PCG algorithms would look hideously complex too. But being fair isn't much fun ;-) From tim.peters at gmail.com Mon Sep 14 07:29:52 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 14 Sep 2015 00:29:52 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: ... [Tim] >> I'm not a security wonk, but I'll bet a life's salary ;-) we'd have >> botnets just as pervasive if every non-crypto RNG in the world were >> banned - or had never existed. [Stephen J. Turnbull ] > I am in violent agreement with you on that point and most others.[1] > However, the analogy was not intended to be so direct as to imply that > "insecure" RNGs are responsible for botnets, merely that not caring > is. I agree I twisted MAL's words a bit -- he meant most people have > no technical need for crypto, and so don't care, I suppose. But then, > "doing crypto" (== security) is like "speaking prose": a lot of folks > doing it don't realize that's what they're doing -- and they don't > care, either. I don't know that it's true, though. Crypto wonks are like lawyers that way, always worrying about the worst possible case "in theory". In my personal life, I've had to tell lawyers "enough already - I'm not paying another N thousand dollars to insert another page about what happens in case of nuclear war". Crypto wonks have no limit on the costs they'd like to impose either - a Security State never does. >> So long as end users are allowed to run programs, that problem will >> never go away. > s/users/programmers/ and s/run/write/, and we get a different analogy > that is literally correct -- but fails in an important dimension. One > user's mistake adds one node to the botnet, and that's about as bad as > one user's mistake gets in terms of harm to third parties. Not really. The best social engineering is for a bot to rummage through your email address book and send copies of itself to people you know, appearing to be a thoroughly legitimate email from you. Add a teaser to invite the recipient to click on the attachment, and response rate can be terrific. And my ISP (like many others) happens to provide a free industrial-strength virus/malware scanner/cleaner program. I doubt that's because they actually care about me ;-) Seems more likely they don't want to pay the costs of hosting millions of active bots. > But one programmer's (or system administrator's) mistake can put many, > perhaps millions, at risk. What I question is whether this has anything _plausible_ to do with Python's PRNG. > Personally I doubt that justifies an API break here, even if we can > come up with attacks where breaking the PRNG would be cost-effective > compared to "social engineering" or theft of physical media. I think > it does justify putting quite a bit of thought into ways to make it > easier for naive programmers to do the "safe" thing even if they > technically don't need it. I remain unclear on what "the danger" is thought to be, such that replacing with a CSPRNG could plausibly prevent it. For example, I know how to deduce the MT's internal state from outputs (and posted working code for doing so, requiring a small fraction of a second given 624 consecutive 32-bit outputs). But it's not an easy problem _unless_ you have 624 consecutive 32-bit outputs. It's far beyond the ken of script kiddies. If it's the NSA, they can demand you turn over everything anyway, or just plain steal it ;-) Consider one of these "password" examples: import string, random alphabet = string.ascii_letters + string.digits print(random.choice(alphabet)) Suppose that prints 'c'. What have you learned? Surprisingly, perhaps, very little. You learned that one 32-bit output of MT had 0b000010 as its first 6 bits. You know nothing about its other 26 bits. And you don't know _which_ MT 32-bit output: internally, .choice() consumes as many 32-bit outputs as it needs until it finds one whose first six bits are less than 62 (len(alphabet)). So all you've learned about MT is that, at the time .choice() was called: - 0 or more 32-bit outputs x were such that (x >> 26) >= 62. - Then one 32-bit output x had (x >> 26) == 2. This isn't much to go on. To deduce the whole state easily, you need to know 19,968 consecutive output bits (624*32). Get more and more letters from the password generator, and you learn more and more about the first 6 bits of an unknowable number of MT outputs consumed, but learn nothing whatsoever about any of the lower 26 bits of any of them, and learn nothing but a range constraint on the first 6 bits of an unknowable number of outputs that were skipped over. Sure, every clue reveals _something_. In theory ;-) Note that, as explained earlier in these messages, Python's default _seeding_ of MT is already computationally infeasible to attack (urandom() is already used to set the entire massive internal state). _That_ I did worry about in the past. So in the above I'm not worried at all that an attacker exploited poor default seeding to know there were only a few billion possible MT states _before_ `c` was generated. All MT states are possible, and MT's state space is large beyond comprehension (let alone calculation). Would the user _really_ be better off using .urandom()? I don't know. Since a crypto wonk will rarely recommend doing anything _other_ than using urandom() directly, I bet they'd discourage using .choice() at all, even if it is built on urandom(). Then the user will write their own implementation of .choice(), something like: u = urandom(n) # for some n letter = alphabet[int(u / 2.0**(8*n) * len(alphabet))] If they manage to get that much right, _now_ they've introduced a statistical bias unless len(alphabet) is a power of 2. If we're assuming they're crypto morons, chances are good they're not rock stars at this kind of thing either ;-) > I will say that IMO the now-traditional API was a very unfortunate > choice. Ah, but I remember 1990 ;-) Python's `random` API was far richer than "the norm" in languages like C and FORTRAN at the time. It was a delight! Judging it by standards that didn't become trendy until much later is only fair now ;-) > If you have a CSPRNG that just generates "uniform random > numbers" and has no user-visible APIs for getting or setting state, > it's immediately obvious to the people who know they need access to > state what they need to do -- change "RNG" implementation. I don't recall any language _at the time_ that did so. > The most it might cost them is rerunning an expensive base > case simulation with a more appropriate implementation that > provides the needed APIs. As above, I'm not sure a real crypto wonk would endorse a module that provided any more than a bare-bones API, forcing the user to build everything else out of one or two core primitives. Look at, e.g., how tiny the OpenBSD arc4random API is. I do applaud it for offering arc4random_uniform(uint32_t upper_bound) That's exactly what's needed to, e.g., build a bias-free .choice() (provided you have fewer than 2**32-1 elements to choose from). > On the other hand, if you have something like the MT that "shouldn't > be allowed anywhere near a password", As above, I think that's a weak claim. The details matter. As an example of a strong claim: you should never, ever use MT to produce the keystream for a stream cipher. But only a crypto wonk would be trying to generate a keystream to begin with. Or a user who did use MT for that purpose is probably so clueless they'd forget the xor and just send the plaintext ;-) > it's easy to ignore the state access APIs, and call it the same way > that you would call a CSPRNG. In fact that's what's documented as > correct usage, as Paul Moore points out. Thus, programmers who > are using a PRNG whose parameters can be inferred from its output, > and should not be doing so, generally won't know it until the > (potentially widespread) harm is done. It would be nice if it wasn't > so easy for them to use the MT. And yet nobody so far has a produced a single example of any harm done in any of the near-countless languages that supply non-crypto RNGs. I know, my lawyer gets annoyed too when I point out there hasn't been a nuclear war ;-) Anyway, if people want to pursue this, I suggest adding a _new_ module doing exactly whatever it is certified crypto experts say is necessary. We can even give it a name shorter than "random" to encourage its use. That's all most users really care about anyway ;-) > Footnotes: > [1] I think "agree with Tim" is a pretty safe default. It's not always optimal, but, yes, you could do worse ;-) From njs at pobox.com Mon Sep 14 08:38:25 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Sep 2015 23:38:25 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Sep 13, 2015 at 10:29 PM, Tim Peters wrote: > [Stephen J. Turnbull ] >> it's easy to ignore the state access APIs, and call it the same way >> that you would call a CSPRNG. In fact that's what's documented as >> correct usage, as Paul Moore points out. Thus, programmers who >> are using a PRNG whose parameters can be inferred from its output, >> and should not be doing so, generally won't know it until the >> (potentially widespread) harm is done. It would be nice if it wasn't >> so easy for them to use the MT. > > And yet nobody so far has a produced a single example of any harm done > in any of the near-countless languages that supply non-crypto RNGs. I > know, my lawyer gets annoyed too when I point out there hasn't been a > nuclear war ;-) Here you go: https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf They present real-world attacks on PHP applications that use something like the "password generation" code we've been talking about as a way to generate cookies and password reset nonces, including in particular the case of applications that use a strongly-seeded Mersenne Twister as their RNG: "We develop a suite of new techniques and tools to mount attacks against all PRNGs of the PHP core system even when it is hardened with the Suhosin patch [which adds strong seeding] and apply our techniques to create practical exploits for a number of the most popular PHP applications (including Mediawiki, Gallery, osCommerce and Joomla) focusing on the password reset functionality. Our exploits allow an attacker to completely take over arbitrary user accounts." "Section 5.3: ... In this section we give a description of the Mersenne Twister generator and present an algorithm that allows the recovery of the internal state of the generator even when the output is truncated. Our algorithm also works in the presence of non consecutive outputs ..." Out of curiosity, I tried searching github for "random cookie language:python". The 5th hit (out of ~100k) was a web project that appears to use this insecure method to generate session cookies: https://github.com/bytasv/bbapi/blob/34e294becb22bae6e685f2e742b7ffdb53a83bcb/bbapi/utils/cookie.py https://github.com/bytasv/bbapi/blob/34e294becb22bae6e685f2e742b7ffdb53a83bcb/bbapi/api/router.py#L56-L66 (Fortunately this project doesn't appear to actually have any login or permissions functionality, so I don't think this is an actual CVE-worthy bug, but that's just a luck -- I'm sure there are plenty of projects that start out looking like this one and then add security features without revisiting how they generate session ids.) There's a reason security people are so Manichean about these kinds of things. If something is not intended to be secure or used in security-sensitive ways, then fine, no worries. But if it is, then there's no point in trying to mess around with "probably mostly secure" -- either solve the problem right or don't bother. (See also: the time Python wasted trying to solve hash randomization without actually solving hash randomization [1].) If Tim Peters can get fooled into thinking something like using MT to generate session ids is "probably mostly secure", then what chance do the rest of us have? NB this isn't an argument for *whether* we should make random cryptographically strong by default; it's just an argument against wasting time debating whether it's already "secure enough". It's not secure. Maybe that's okay, maybe it's not. For the record though I do tend to agree with the idea that it's not okay, because it's an increasingly hostile world out there, and secure-random-by-default makes whole classes of these issues just disappear. It's not often that you get to fix thousands of bugs with one commit, including at least some with severity level "all your users' private data just got uploaded to bittorrent". I like Nick's proposal here: https://code.activestate.com/lists/python-ideas/35842/ as probably the most solid strategy for implementing that idea -- the only projects that would be negatively affected are those that are using the seeding functionality of the global random API, which is a tiny fraction, and the effect on those projects is that they get nudged into using the superior object-oriented API. -n [1] https://lwn.net/Articles/574761/ -- Nathaniel J. Smith -- http://vorpus.org From stephen at xemacs.org Mon Sep 14 10:30:47 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 14 Sep 2015 17:30:47 +0900 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <878u89z9l4.fsf@uwakimon.sk.tsukuba.ac.jp> Tim Peters writes: > > "doing crypto" (== security) is like "speaking prose": a lot of folks > > doing it don't realize that's what they're doing -- and they don't > > care, either. > > I don't know that it's true, though. Crypto wonks are like lawyers > that way, always worrying about the worst possible case "in > theory". Well, my worst possible case "in theory" was that a documented MTA parameter would simply not be implemented and not error when I configured it to a non-default value -- but that's how yours truly ended up running an open relay (Smail 3.1.100 I think it was, and I got it from Debian so it wasn't like I was using alpha code). That's what taught me to do functional tests. :-) So yes, I do think there are a lot of folks out there working with software without realizing that there are any risks involved. Life being life, I'd bet on some of them being programmers working with RNG. > In my personal life, I've had to tell lawyers "enough already - I'm > not paying another N thousand dollars to insert another page about > what happens in case of nuclear war". But see, that's my main point. Analogies to *anybody's* personal life are irrelevant when we're talking about a bug that could be fixed *once* and save *millions* of users from being exploited. If the wonks are right, it's a big deal, big enough to balance the low probability of them being right. ;-) > The best social engineering is for a bot to rummage through your > email address book and send copies of itself to people you know, > appearing to be a thoroughly legitimate email from you. Add a > teaser to invite the recipient to click on the attachment, and > response rate can be terrific. Sure, but that's not what happened at AOL and Yahoo! AFAIK (of course they're pretty cagey about details). It seems that a single leak or a small number of leaks at each company exposed millions of address books. (I hasten to add that I doubt the Mersenne Twister had anything to do with the leaks.) > What I question is whether this has anything _plausible_ to do with > Python's PRNG. Me too. People who claim some expertise think so, though. > Would the user _really_ be better off using .urandom()? I don't know. > Since a crypto wonk will rarely recommend doing anything _other_ than > using urandom() directly, I bet they'd discourage using .choice() at > all, That's not unfair, but if they did, I'd go find myself another crypto wonk. But who cares about me? What matters is that Guido would, too. > Judging [the random module] by standards that didn't become trendy > until much later is only fair now ;-) You're not the only one who, when offered a choice between fair and fun, chooses the latter. ;-) > We can even give it a name shorter than "random" to encourage its > use. That's all most users really care about anyway ;-) That's beyond "unfair"! From mal at egenix.com Mon Sep 14 12:37:52 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 14 Sep 2015 12:37:52 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <55F6A380.4070609@egenix.com> On 14.09.2015 08:38, Nathaniel Smith wrote: > If Tim Peters can get fooled > into thinking something like using MT to generate session ids is > "probably mostly secure", then what chance do the rest of us have? > I don't think that Tim can get fooled into believing he is a crypto wonk ;-) The thread reveals another misunderstanding: Broken code doesn't get any better when you change the context in which it is run. By fixing the RNG used in such broken code and making it harder to run attacks, you are only changing the context in which the code is run. The code itself still remains broken. Code which uses the output from an RNG as session id without adding any additional security measures is broken, regardless of what kind of RNG you are using. I bet such code will also take any session id it receives as cookie and trust it without applying extra checks on it. Rather than trying to fix up the default RNG in Python by replacing it with a crypto RNG, it's better to open bug reports to get the broken software fixed. Replacing the default Python RNG with a new unstudied crypto one, will likely introduce problems into working code which rightly assumes the proven statistical properties of the MT. Just think of the consequences of adding unwanted bias to simulations. This is far more likely to go unnoticed than a session highjack due to a broken system and can easily cost millions (or earn you millions - it's all probability after all :-)). Now, pointing people who write broken code to a new module which provides a crypto RNG probably isn't much better either. They'd feel instantly secure because it says "crypto" on the box and forget about redesigning their insecure protocol as well. Nothing much you can do about that, I'm afraid. Too easy sometimes is too easy indeed ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 14 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Connect Remote DB-API ... http://connect.egenix.com/ >>> mxODBC Python Database Interface ... http://mxodbc.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 4 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 12 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From njs at pobox.com Mon Sep 14 14:26:50 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 14 Sep 2015 05:26:50 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F6A380.4070609@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> Message-ID: On Mon, Sep 14, 2015 at 3:37 AM, M.-A. Lemburg wrote: > On 14.09.2015 08:38, Nathaniel Smith wrote: >> If Tim Peters can get fooled >> into thinking something like using MT to generate session ids is >> "probably mostly secure", then what chance do the rest of us have? >> > > I don't think that Tim can get fooled into believing he is a > crypto wonk ;-) > > The thread reveals another misunderstanding: > > Broken code doesn't get any better when you change the context > in which it is run. As an aphorism this sounds nice, but logically it makes no sense. If the broken thing about your code is that it assumes that the output of the RNG is unguessable, and you change the context by making the output of the RNG unguessable, then now the code it isn't broken. The code would indeed remain broken when run under e.g. older interpreters, but this is not an argument that we should make sure that it stays broken in the future. > By fixing the RNG used in such broken code and making it > harder to run attacks, you are only changing the context in which > the code is run. The code itself still remains broken. > > Code which uses the output from an RNG as session id without adding > any additional security measures is broken, regardless of what kind > of RNG you are using. I bet such code will also take any session id > it receives as cookie and trust it without applying extra checks > on it. Yes, that's... generally the thing you do with session cookies? They're shared secret string that you use as keys into some sort of server-side session database? What extra checks need to be applied? > Rather than trying to fix up the default RNG in Python by replacing > it with a crypto RNG, it's better to open bug reports to get the > broken software fixed. > > Replacing the default Python RNG with a new unstudied crypto one, > will likely introduce problems into working code which rightly > assumes the proven statistical properties of the MT. > > Just think of the consequences of adding unwanted bias to simulations. > This is far more likely to go unnoticed than a session highjack due > to a broken system and can easily cost millions (or earn you > millions - it's all probability after all :-)). I'm afraid you just don't understand what you're talking about here. When it comes to adding bias to simulations, all crypto RNGs have *better* statistical properties than MT. A crypto RNG which was merely as statistically-well-behaved as MT would be considered totally broken, because MT doesn't even pass black-box tests of randomness like TestU01. > Now, pointing people who write broken code to a new module which > provides a crypto RNG probably isn't much better either. They'd feel > instantly secure because it says "crypto" on the box and forget > about redesigning their insecure protocol as well. Nothing much you > can do about that, I'm afraid. Yes, improving the RNG only helps with some problems, not others; it might merely make a system harder to attack, rather than impossible to attack. But giving people unguessable random numbers by default does solve real problems. -n -- Nathaniel J. Smith -- http://vorpus.org From jbvsmo at gmail.com Mon Sep 14 14:26:33 2015 From: jbvsmo at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Bernardo?=) Date: Mon, 14 Sep 2015 09:26:33 -0300 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F6A380.4070609@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> Message-ID: Quick fix! The problem with MT would be someone having all 624 32-byte numbers from the state. So, every now and then, the random generator should run twice and discard one of the outputs. Do this about 20 times for each 624 calls and no brute force can find the state. Thanks for your attention ;) Jo?o Bernardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Mon Sep 14 14:31:24 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Mon, 14 Sep 2015 13:31:24 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> Message-ID: On 14 September 2015 at 13:26, Jo?o Bernardo wrote: > Quick fix! > The problem with MT would be someone having all 624 32-byte numbers from the > state. So, every now and then, the random generator should run twice and > discard one of the outputs. > Do this about 20 times for each 624 calls and no brute force can find the > state. Thanks for your attention ;) 'Every now and then': what's that? Is it a deterministic interval or a random one? If a random one, where does the random number come from: MT? If deterministic, it's trivial to include the effect in your calculations. More generally, what you're doing here is gaining *information* about the state. You don't have to know it perfectly, just to reduce the space of possible states down. Even if you threw 95% of the results of MT away, each time I watch I can reduce the space of possible states the MT is in. This is not a fix. From antoine at python.org Mon Sep 14 14:59:00 2015 From: antoine at python.org (Antoine Pitrou) Date: Mon, 14 Sep 2015 12:59:00 +0000 (UTC) Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux References: Message-ID: Nick Coghlan writes: > > On 11 September 2015 at 02:05, Brett Cannon wrote: > > +1 for deprecating module-level functions and putting everything into > > classes to force a choice > > -1000, as this would be a *huge* regression in Python's usability for > educational use cases. (Think 7-8 year olds that are still learning to > read, not teenagers or adults with more fully developed vocabularies) Fully agreed with Nick. That this is being seriously considered shows a massive disregard for usability. Python is not C++, it places convenience first. Besides, a deterministic RNG is a feature: you can reproduce exactly a random sequence by re-using the same seed, which helps fix rare input-dependent failures (we actually have good example of that in CPython development with `regrtest -r`). Good luck debugging such issues when using a RNG which reseeds itself in a random (!) way. Endly, the premise of this discussion is idealistic in the first place. If someone doesn't realize their code is security-sensitive, there are other mistakes they will make than simply choosing the wrong RNG. If you want to help people generate secure passwords, best would be perhaps to write a password-generating (or more generally secret-generating, for different kinds of secrets: passwords, session ids, etc.) library. Regards Antoine. From cory at lukasa.co.uk Mon Sep 14 15:29:11 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Mon, 14 Sep 2015 14:29:11 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 14 September 2015 at 13:59, Antoine Pitrou wrote:> > Endly, the premise of this discussion is idealistic in the first place. > If someone doesn't realize their code is security-sensitive, there > are other mistakes they will make than simply choosing the wrong > RNG. If you want to help people generate secure passwords, best would > be perhaps to write a password-generating (or more generally > secret-generating, for different kinds of secrets: passwords, session > ids, etc.) library. Is your argument that there are lots of ways to get security wrong, and for that reason we shouldn't try to fix any of them? After all, I could have made this argument against PEP 466, or against the deprecation of SHA1 in TLS certificates, or against any security improvement ever made that simply changed defaults. The fact that there are secure options available is not a good excuse for leaving the insecure ones as the defaults. And let's be clear, this is not a theoretical error that people don't hit in real life. Investigating your last comment, Antoine, I googled "python password generator". The results: - The first one is a StackOverflow question which incorrectly uses random.choice (though seeded from os.urandom, which is an improvement). The answer to that says to just use os.urandom everywhere, but does not provide sample code. Only the third answer gets so far as to provide sample code, and it's way overkill. - The second option, entitled "A Better Password Generator", incorrectly uses random.randrange. This code is *aimed at beginners*, and is kindly handing them a gun to point at their own foot. - The third one uses urandom, which is fine - The fourth, an XKCD-based password generator, uses SystemRandom *if available* but then falls back to the MT approach, which is an unexpected decision, but there we go. - The fifth, from "pythonforbeginners.com", incorrectly uses random.choice - The sixth goes into an intensive discussion about 'password strength', including a discussion about the 'bit strength' of the password, despite the fact that they use random.randint which means that the analysis about bit strength is totally flawed. - For the seventh we get a security.stackexchange question with the first answer saying not to use Random, though the questioner does use it and no sample code is provided. - The eight is a library that "generates randomized strings of characters". It attempts to use SystemRandom but falls back silently if it's unavailable. At this point I gave up. Of that list of 8 responses, three are completely wrong, two provide sample code that is wrong with no correct sample code to be found on the page, two attempt to do the right thing but will fall into a silent failure mode if they can't, and only one is unambiguously correct. Similarly, a quick search of GitHub for Python repositories that contain random.choice and the string 'password' returns 40,000 results.[0] Even if 95% of them are safe, that leaves 2000 people who wrote wrong code and uploaded it to GitHub. It is disingenuous to say that only people who know enough write security-critical code. They don't. The reason for this is that most people don't know they don't know enough. And for those people, Python's default approach screws them over, and then they write blog posts which screw over more people. If the Python standard library would like to keep the insecure default of random.random that's totally fine, but we shouldn't pretend that the resulting security failures aren't our fault: they absolutely are. [0]: https://github.com/search?l=python&q=random.choice+password&ref=searchresults&type=Code&utf8=%E2%9C%93 From ncoghlan at gmail.com Mon Sep 14 15:32:17 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 14 Sep 2015 23:32:17 +1000 Subject: [Python-ideas] Globally configurable random number generation Message-ID: This is an expansion of the random module enhancement idea I previously posted to Donald's thread: https://mail.python.org/pipermail/python-ideas/2015-September/035969.html I'll write it up as a full PEP later, but I think it's just as useful in this form for now. = Defining the problem = We're moving into an era where the easiest way to publish software is as a web application, with "deployment" to client systems done at runtime via a web browser. It's regularly the case that "learn to program" classes (especially those aimed at adults picking up programming for the first time) will introduce folks to both a web development framework and how to deploy web applications on a developer focused service with a free hosting tier, like Heroku or OpenShift. It's also the case that we live in an era where there's a lot of well-intentioned-but-actually-bad advice on the internet when it comes to generating security sensitive tokens, and the folks receiving that advice through forums like Stack Overflow aren't necessarily ever going to see the "don't do that" guidance in the standard library's random module documentation, or the docs for the cryptography library, or the docs for a web framework like Flask, Django or Pyramid. One of the ways we know many of the folks doing web development often don't take admonitions in documentation seriously is because one of the most popular web servers for Python on these kinds of services is Django's "runserver", even though Django's docs specifically say only to use that for local development. It isn't OK to say "the developers deserve the consequences that come to them" as in many case, it isn't the developers that suffer the consequences, but the users of their applications. One reason we know weak RNGs can be a problem in practice is because the same kind of concern exists in PHP web applications, and https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf shows how the relative predictability of password reset tokens can be used to compromise administrator accounts. Rather than playing whackamole with individual web applications (many of which will be written by inexperienced developers), or attempting to demonstrate that a deterministic PRNG is "secure enough" for these use cases (when the research on PHP and deterministic PRNGs in general indicates that it isn't), it is proposed to migrate Python to a default random implementation that *is* known to be secure enough for these kinds of use cases. At the same time, deterministic random number generation is still desirable in many situations, and we also don't want to require that folks learning Python in the future be required to take a crash course in web application security theory first. Thus, it is also proposed that the abstraction used to present these differences to end users minimise the references to the underlying security concepts. A key outcome of this proposal is that it will retroactively upgrade a lot of existing instructions on the internet for generating default passwords and other sensitive tokens in Python from "actively harmful" to "not necessarily ideal, but at least not wrong if you're using Python 3.6+". This *is* a compatibility break for the sake of correcting default behaviours that are fine when developing applications for local use, but problematic from a network service security perspective, just as happened with the introduction of hash randomisation. Unlike the hash randomisation change, this one is readily addressed in old versions on a case by case basis, so it is only proposed to make the change in a future feature release of Python, not in any current maintenance releases. = Core abstraction = The core concept of this proposal involves classifying random number generators in Python as follows: * seedable * seedless * system These terms are chosen to make sense to folks that have *no idea* about the way different kinds of random number generator work and how that affects their security properties, but do know whether or not they need to be able to pass in a particular fixed seed in order to regenerate the same series of outputs. The guidance to Python users is then: * we use the seedless RNG by default as it provides the best balance of speed and security * if you need to be able to exactly reproduce output sequences, use the seedable RNG * if you know you're doing security sensitive work, use the system RNG directly to eliminate Python's seedless RNG as a potential source of vulnerabilities Importantly, there are relatively simple answers to the following two questions (which could be added to the Design FAQ): Q: Why isn't the seedable RNG the default random implementation (any more)? A: The same properties that make it possible to provide an explicit seed to the seedable RNG and get a predictable series of outputs make it inappropriate for tasks like generating session IDs and password reset tokens in web applications. Since folks continued to use the default RNG for those cases, even after years of the core development team, web framework developers and security engineers saying "Don't do that, use the system RNG instead", we eventually changed the default behaviour to just make those cases OK. Q: Why isn't the system RNG the default implementation? A: Due to the way operating systems work, calling into the kernel to get a random number is always going to be slower than generating one within the Python runtime. The default seedless generator provides most of the same benefits as using the system RNG directly, but is an order of magnitude faster as it doesn't need to call into the kernel as often. = Proposed change for Python 3.6 = * add a random.SeedlessRandom API that omits the seed(), getstate() and setstate() methods and uses a cryptographically secure PRNG internally (such as the ChaCha20 algorithm implemented by OpenBSD) * rename random.Random to random.SeedableRandom * make random.Random a subclass of SeedableRandom that deprecates seed(), getstate() and setstate() * deprecate the seed(), getstate() and setstate() methods on SystemRandom * expose the global SeedableRandom instance as random.seedable_random * expose a global SeedlessRandom instance as random.seedless_random * expose a global SystemRandom instance as random.system_random * provide a random.set_default_instance() API that makes it possible to specify the instance used by the module level methods * the module level seed(), getstate(), and setstate() functions will throw RuntimeError if the corresponding method is missing from the default instance In 3.6, "random.set_default_instance(random.seedless_random)" will opt in to the CSPRNG when using the module level functions process wide, while "from random import seedless_random as random" will do so on a module by module basis. "from random import system_random as random" also becomes available as a simple upgrade path for security sensitive modules. Appropriate helpers would be added to the six and future projects to allow single source Python 2/3 projects to easily cope with the change in behaviour when using the seeded RNG for its intended purposes. For many projects, compatibility code will consist of the following lines in a compatibility module: try: from random import seedable_random as random except ImportError: import random It would also be desirable for the seedless random number generator to be made available as a PyPI package for use on older Python versions. = Proposed change for Python 3.7 = * random.Random becomes an alias for random.SeedlessRandom * the default instance changes to be random.seedless_random In 3.7, "random.set_default_instance(random.seedable_random)" will opt back in to the deterministic PRNG when using the module level functions process wide, while "from random import seedable_random as random" will do so on a module by module basis. = Seedable random number generation = This is what we have today. The MT random implementation supports explicit seeding, state retrieval, and state restoration. It doesn't automatically mix in additional system entropy as it operates. This is the right choice for use cases like computer games, map generation, and randomising the order of test execution, as in these situations, it's desirable to be able to reproduce a past sequence exactly. = Seedless random number generators = This is the key proposed new addition: a cryptographically secure, non-deterministic, userspace PRNG. It's faster than the system RNG as it avoids the need to make a system API call. The "seedless" name comes from the fact that the inability to feed in a fixed seed is the most obvious API difference relative to deterministic RNGs, and hence provides a mental hook for people to remember which is which, without needing to know the relevant background security theory (which is arcane enough to be opaque even to developers with decades of experience and hence isn't something we want to be inflicting on folks in the process of learning to program). = System random number generator = The only proposed change here is providing a default instance to enable the "from random import system_random as random" pattern. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Mon Sep 14 15:32:33 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 14 Sep 2015 23:32:33 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> Message-ID: <20150914133232.GA31152@ando.pearwood.info> On Mon, Sep 14, 2015 at 09:26:33AM -0300, Jo?o Bernardo wrote: > Quick fix! > The problem with MT would be someone having all 624 32-byte numbers from > the state. So, every now and then, the random generator should run twice > and discard one of the outputs. No, that's not good enough. You can skip a few outputs, and still recover the internal state. The more outputs are skipped, the harder it becomes, but still possible. > Do this about 20 times for each 624 calls and no brute force can find the > state. Thanks for your attention ;) This is not brute force. The recovery attack does not try generating every possible internal state. The MT is a big, complicated equation (technically, it is called a recurrence relation), but being an equation, it is completely deterministic. Given enough values, we can build another equation which can be solved to give the internal state of the MT equation. Are you suggesting that every time you call random.random(), it should secretly generate 20 random numbers and throw away all but the last? (1) That would break backwards compatibility for those who need it. The output of random() is stable across versions: [steve at ando ~]$ for vers in 2.4 2.5 2.6 2.7 3.1 3.2 3.3 3.4; do > python$vers -c "from random import *; seed(25); print(random())"; > done 0.37696230239 0.37696230239 0.37696230239 0.37696230239 0.37696230239 0.376962302390386 0.376962302390386 0.376962302390386 (There's a change in the printable output starting in 3.2, but the numbers themselves are the same.) (2) it would make the random number generator twenty times slower than it is now, and MT is already not very fast; (3) most importantly, I don't think that would even solve the problem. I personally don't know how, but I would predict that somebody with more maths skills than me would be able to still recover the internal state. They would just have to collect more values. -- Steve From rosuav at gmail.com Mon Sep 14 15:32:55 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 14 Sep 2015 23:32:55 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> Message-ID: On Mon, Sep 14, 2015 at 10:26 PM, Nathaniel Smith wrote: >> Code which uses the output from an RNG as session id without adding >> any additional security measures is broken, regardless of what kind >> of RNG you are using. I bet such code will also take any session id >> it receives as cookie and trust it without applying extra checks >> on it. > > Yes, that's... generally the thing you do with session cookies? > They're shared secret string that you use as keys into some sort of > server-side session database? What extra checks need to be applied? Some systems check to see if the session was created by the same IP address. That can help, but it also annoys legitimate users who change their IP addresses. ChrisA From ncoghlan at gmail.com Mon Sep 14 15:35:00 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 14 Sep 2015 23:35:00 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 14 September 2015 at 16:38, Nathaniel Smith wrote: > I like Nick's proposal here: > https://code.activestate.com/lists/python-ideas/35842/ > as probably the most solid strategy for implementing that idea -- the > only projects that would be negatively affected are those that are > using the seeding functionality of the global random API, which is a > tiny fraction, and the effect on those projects is that they get > nudged into using the superior object-oriented API. I started a new thread breaking that out into more of a proto-PEP (including your reference to the PHP research - thanks for that!). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From skrah at bytereef.org Mon Sep 14 15:43:58 2015 From: skrah at bytereef.org (Stefan Krah) Date: Mon, 14 Sep 2015 13:43:58 +0000 (UTC) Subject: [Python-ideas] Globally configurable random number generation References: Message-ID: Nick Coghlan writes: > = Core abstraction = > > The core concept of this proposal involves classifying random number > generators in Python as follows: > > * seedable > * seedless > * system > > These terms are chosen to make sense to folks that have *no idea* > about the way different kinds of random number generator work and how > that affects their security properties, but do know whether or not > they need to be able to pass in a particular fixed seed in order to > regenerate the same series of outputs. > > The guidance to Python users is then: > > * we use the seedless RNG by default as it provides the best balance > of speed and security > * if you need to be able to exactly reproduce output sequences, use > the seedable RNG > * if you know you're doing security sensitive work, use the system RNG > directly to eliminate Python's seedless RNG as a potential source of > vulnerabilities Sorry, -1 on this. Theo proposed a simple API like: arc4random() arc4random_uniform() Go has: https://golang.org/pkg/math/rand/ https://golang.org/pkg/crypto/rand/ These are sane, unambiguously named APIs. I wish Python had more of those. If people must have their CSPRNG, please let's leave the random module alone and introduce a crypto module like Go. Stefan Krah From donald at stufft.io Mon Sep 14 15:51:29 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 09:51:29 -0400 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On September 14, 2015 at 9:33:27 AM, Nick Coghlan (ncoghlan at gmail.com) wrote: > > * seedable > * seedless > * system > > These terms are chosen to make sense to folks that have *no idea* > about the way different kinds of random number generator work and how > that affects their security properties, but do know whether or not > they need to be able to pass in a particular fixed seed in order to > regenerate the same series of outputs. I don't love the "seedable" and "seedless" names here, but I don't have a better suggestion for the userspace CSPRNG one because it's security properties are a bit nuanced. People doing security sensitive things like generating keys for cryptography should still use something based on os.urandom, so it's mostly about providing a safety net that will "probably" [1] be safe. Probably something like random.ProbablySecureRandom is a bad name :) > * provide a random.set_default_instance() API that makes it possible > to specify the instance used by the module level methods I think this particular bit is a bad idea, it makes an official API that makes it really hard for an auditor to come into a code base and determine if the use of random is correct or not. Given that going back to the MT based algorithm is fairly trivial (and could even be mechanical) what's the long ter benefit here? [1] The safety of userspace CSPRNGs is a debated topic by security experts, ? ? however I think any of them would be hard pressed to think it's a bad idea ? ? to have a userspace CSPRNG as a safety net for folks who, for whatever ? ? reason, didn't know to use os.urandom/random.SystemRandom and instead to ? ? make them more likely to be safe, or at the very least, harder to attack. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From random832 at fastmail.com Mon Sep 14 16:16:00 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 10:16:00 -0400 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 09:51, Donald Stufft wrote: > I think this particular bit is a bad idea, it makes an official API > that makes it really hard for an auditor to come into a code base and > determine if the use of random is correct or not. It's no worse than what OpenBSD itself has done with the C api for rand/random/rand48. At some point you've got to balance it with the realities of making backwards compatibility easy to achieve for the applications that really do need it with either a few lines change or none at all. And anyway, the auditor would *know* that if they see a module-level function called they need to do the extra work to find out what mode the module-level RNG is in (i.e. yes/no is there anywhere at all in the codebase that changes it from the secure default?) It's not an "official API", it's an escape hatch for allowing a minimal change to existing code that needs the old behavior. > Given that going back to the MT based algorithm is fairly trivial (and > could even be mechanical) what's the long ter benefit here? I don't see how it's trivial/mechanical, *without* the exact feature being discussed. From sturla.molden at gmail.com Mon Sep 14 16:21:11 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 16:21:11 +0200 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On 14/09/15 15:43, Stefan Krah wrote: > These are sane, unambiguously named APIs. I wish Python had more > of those. If people must have their CSPRNG, please let's leave > the random module alone and introduce a crypto module like Go. In a perfect world, every programmer would know the difference between PRNGs for numerical simulation and entropy sources for cryptography. Those that do will still use os.urandom or just read from /dev/urandom or /dev/random for cryptography. Those that do know the need for mathematical precision when simulating samples from a given distribution. Those that do know the need for a fixed seed because a Monte Carlo simulation should be exactly reproducible in a scientific context. The problem is users who have no idea that the Mersenne Twister is constructed for producing random deviates that are great for numerical simulation -- and that the Mersenne Twister is very weak for cryptography. Using os.urandom as default entropy source has the opposite effect. It is not constructed for being mathematically precise, it is slow, and it does not allow for a fixed seed and exact reproducibility. Whatever we do there are someone who are going to shoot their leg off. A crypto module would perhaps be great, but it does not solve anything. Someone who uses random.random instead of os.urandom is likely to use random.random instead of a PRNG in a crypto module as well. Mostly this is about propagating knowledge of random number generators to new developers and science students. Sturla From skrah at bytereef.org Mon Sep 14 16:24:53 2015 From: skrah at bytereef.org (Stefan Krah) Date: Mon, 14 Sep 2015 14:24:53 +0000 (UTC) Subject: [Python-ideas] Globally configurable random number generation References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> Message-ID: Random832 writes: > It's no worse than what OpenBSD itself has done with the C api for > rand/random/rand48. These functions aren't used widely in scientific computing. > It's not an "official API", it's an escape hatch for allowing a minimal > change to existing code that needs the old behavior. It's yet another case split to keep in the back of one's mind. Stefan Krah From donald at stufft.io Mon Sep 14 16:40:50 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 10:40:50 -0400 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> Message-ID: On September 14, 2015 at 10:16:49 AM, Random832 (random832 at fastmail.com) wrote: > > On Mon, Sep 14, 2015, at 09:51, Donald Stufft wrote: > > I think this particular bit is a bad idea, it makes an official API > > that makes it really hard for an auditor to come into a code base and > > determine if the use of random is correct or not. > > It's no worse than what OpenBSD itself has done with the C api for > rand/random/rand48. At some point you've got to balance it with the > realities of making backwards compatibility easy to achieve for the > applications that really do need it with either a few lines change or > none at all. And anyway, the auditor would *know* that if they see a > module-level function called they need to do the extra work to find > out what mode the module-level RNG is in (i.e. yes/no is there anywhere > at all in the codebase that changes it from the secure default?) > > It's not an "official API", it's an escape hatch for allowing a minimal > change to existing code that needs the old behavior. > > > Given that going back to the MT based algorithm is fairly trivial (and > > could even be mechanical) what's the long ter benefit here? > > I don't see how it's trivial/mechanical, *without* the exact feature > being discussed. Easily, you change your: ? ? import random to ? ? from random import seeded_random as random And then all of your code that used random.foo works without any further modification. If you were importing the individual functions, you can either change your code to use random.foo or you can do: from random import seeded_random as _random random = _random.random randint = _random.randint If you want to do this in cross language code, then you can combine this with a try: except block like: ? ? try: ? ? ? ? from random import seeded_random as random ? ? except ImportError: ? ? ? ? import random Either way, trivial and mechanical. It doesn't require much thought, it just requires some pretty simple changes. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From random832 at fastmail.com Mon Sep 14 16:45:05 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 10:45:05 -0400 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> Message-ID: <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 10:24, Stefan Krah wrote: > Random832 writes: > > It's no worse than what OpenBSD itself has done with the C api for > > rand/random/rand48. > > These functions aren't used widely in scientific computing. I don't see how that's relevant, when what I'm talking about is "providing an API that switches them from secure mode to insecure/deterministic mode" From skrah at bytereef.org Mon Sep 14 16:50:13 2015 From: skrah at bytereef.org (Stefan Krah) Date: Mon, 14 Sep 2015 14:50:13 +0000 (UTC) Subject: [Python-ideas] Globally configurable random number generation References: Message-ID: Sturla Molden writes: > On 14/09/15 15:43, Stefan Krah wrote: > > > These are sane, unambiguously named APIs. I wish Python had more > > of those. If people must have their CSPRNG, please let's leave > > the random module alone and introduce a crypto module like Go. > A crypto module would perhaps be great, but it does not solve anything. > Someone who uses random.random instead of os.urandom is likely to use > random.random instead of a PRNG in a crypto module as well. Mostly this > is about propagating knowledge of random number generators to new > developers and science students. The sentiments in the original thread (which has now been renamed two times), seem to have been lost: Theo: ===== "chacha arc4random is really fast. if you were to create such an API in python, maybe this is how it will go: say it becomes arc4random in the back end. i am unsure what advice to give you regarding a python API name. in swift, they chose to use the same prefix "arc4random" (id = arc4random(), id = arc4random_uniform(1..n)"; it is a little bit different than the C API. google has tended to choose other prefixes. we admit the name is a bit strange, but we can't touch the previous attempts like drand48.... I do suggest you have the _uniform and _buf versions. Maybe apple chose to stick to arc4random as a name simply because search engines tend to give above average advice for this search string?" Theo: ===== "that opens /dev/urandom or uses the getrandom system call depending on system. it also has support for the windows entropy API. it pulls data into a large buffer, a cache. then each subsequent call, it consumes some, until it rus out, and has to do a fresh read. it appears to not clean the buffer behind itself, probably for performance reasons, so the memory is left active. (forward secrecy violated) i don't think they are doing the best they can... i think they should get forward secrecy and higher performance by having an in-process chacha. but you can sense the trend." So the original thread is about: ================================ - Inplementing a possibly faster (and allegedly more secure) chacha20-random. - Possibly using the naming scheme of Swift. - Being careful with os.urandom(), as there are some pitfalls that the OpenBSD libcrypto (allegedly) solves. I see nothing about magically repurposing random.random() functions. Stefan Krah From steve at pearwood.info Mon Sep 14 16:50:46 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 15 Sep 2015 00:50:46 +1000 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> Message-ID: <20150914145046.GC31152@ando.pearwood.info> On Mon, Sep 14, 2015 at 10:16:00AM -0400, Random832 wrote: > > On Mon, Sep 14, 2015, at 09:51, Donald Stufft wrote: > > I think this particular bit is a bad idea, it makes an official API > > that makes it really hard for an auditor to come into a code base and > > determine if the use of random is correct or not. > > It's no worse than what OpenBSD itself has done with the C api for > rand/random/rand48. At some point you've got to balance it with the > realities of making backwards compatibility easy to achieve for the > applications that really do need it with either a few lines change or > none at all. And anyway, the auditor would *know* that if they see a > module-level function called they need to do the extra work to find > out what mode the module-level RNG is in (i.e. yes/no is there anywhere > at all in the codebase that changes it from the secure default?) > > It's not an "official API", it's an escape hatch for allowing a minimal > change to existing code that needs the old behavior. Of course it is an official API. It's a documented public function (or rather, it will be if Nick's suggest is accepted) in the standard library. That makes it an official API. The *whole purpose of it* is to give a standard API for what Python can already do: monkey-patch the random module. E.g. we can do this now: import random random.random = lambda: 9 random.uniform = lambda a, b: return 9 but if you do that, you know you're on thin ice. I don't entirely agree with everything Donald has said, but I agree that providing this API would be harmful. It would mean that any arbitrary module you import (directly or indirectly) could swap out the secure CSPRNG you're relying on for an insecure PRNG, and you would never know. (Yes, they could do that now, this is Python. But they won't, because there's no official API for swapping out the default PRNG.) -- Steve From donald at stufft.io Mon Sep 14 16:57:47 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 10:57:47 -0400 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On September 14, 2015 at 10:50:46 AM, Stefan Krah (skrah at bytereef.org) wrote: > > The sentiments in the original thread (which has now been renamed > two > times), seem to have been lost: I've actually talked to Theo and I believe he's read my summary of his proposal and he didin't mention anything amiss. He did mention that he wasn't aware of the number of APIs that we had in random.py that built ontop of the RNG. As far as I can tell from talking to him, he focused on that particular thing because he became aware of the issue via the recent issue with getentropy on Solaris, and I believe he assumed that our APIs were similar to C in what we provided. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From p.f.moore at gmail.com Mon Sep 14 17:01:02 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Sep 2015 16:01:02 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 14 September 2015 at 14:29, Cory Benfield wrote: > Is your argument that there are lots of ways to get security wrong, > and for that reason we shouldn't try to fix any of them? This debate seems to repeatedly degenerate into this type of accusation. Why is backward compatibility not being taken into account here? To be clear, the proposed change *breaks backward compatibility* and while that's allowed in 3.6, just because it is allowed, doesn't mean we have free rein to break compatibility - any change needs a good justification. The arguments presented here are valid up to a point, but every time anyone tries to suggest a weak area in the argument, the "we should fix security issues" trump card gets pulled out. For example, as this is a compatibility break, it'll only be allowed into 3.6+ (I've not seen anyone suggest that this is sufficiently serious to warrant breaking compatibility on older versions). Almost all of those SO questions, and google hits, are probably going to be referenced by people who are using 2.7, or maybe some version of 3.x earlier than 3.6 (at what stage do we allow for the possibility of 3.x users who are *not* on the latest release?) So is a solution which won't impact most of the people making the mistake, worth it? I fully expect the response to this to be "just because it'll take time, doesn't mean we should do nothing". Or "even if it just fixes it for one or two people, it's still worth it". But *that's* the argument I don't find compelling - not that a fix won't help some situations, but that because it's security, (a) all the usual trade-off calculations are irrelevant, and (b) other proposed solutions (such as education, adding specialised modules like a "shared secret" library, etc) are off the table. Honestly, this type of debate doesn't do the security community much good - there's too little willingness to compromise, and as a result the more neutral participants (which, frankly, is pretty much anyone who doesn't have a security agenda to promote) end up pushed into a "reject everything" stance simply as a reaction to the black and white argument style. Paul From storchaka at gmail.com Mon Sep 14 17:04:35 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 14 Sep 2015 18:04:35 +0300 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On 14.09.15 16:32, Nick Coghlan wrote: > * make random.Random a subclass of SeedableRandom that deprecates > seed(), getstate() and setstate() I would make seed() and setstate() to switch to seedable algorithm. If you don't use seed() or setstate(), it is not important that the algorithm is changed. If you use seed() or setstate(), you expect reproducible behavior. > * random.Random becomes an alias for random.SeedlessRandom This breaks compatibility with the data pickled in older Python. > In 3.7, "random.set_default_instance(random.seedable_random)" will opt > back in to the deterministic PRNG when using the module level > functions process wide, while "from random import seedable_random as > random" will do so on a module by module basis. What to do with "from random import random" deep in third-party module? It caches random.random in the module dictionary. From skrah at bytereef.org Mon Sep 14 17:16:04 2015 From: skrah at bytereef.org (Stefan Krah) Date: Mon, 14 Sep 2015 15:16:04 +0000 (UTC) Subject: [Python-ideas] Globally configurable random number generation References: Message-ID: Donald Stufft writes: > > On September 14, 2015 at 10:50:46 AM, Stefan Krah (skrah at ...) wrote: > > > The sentiments in the original thread (which has now been renamed > > two > > times), seem to have been lost: > > I've actually talked to Theo and I believe he's read my summary of his proposal > and he didin't mention anything amiss. He did mention that he wasn't aware of > the number of APIs that we had in random.py that built ontop of the RNG. > > As far as I can tell from talking to him, he focused on that particular thing > because he became aware of the issue via the recent issue with getentropy on > Solaris, and I believe he assumed that our APIs were similar to C in what we > provided. That addresses pretty little of what I wrote, and I'd prefer to hear anything directly from him. Your summaries have a tendency to be highly biased. Stefan Krah From donald at stufft.io Mon Sep 14 17:17:34 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 11:17:34 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 14, 2015 at 11:01:36 AM, Paul Moore (p.f.moore at gmail.com) wrote: > On 14 September 2015 at 14:29, Cory Benfield wrote: > > Is your argument that there are lots of ways to get security wrong, > > and for that reason we shouldn't try to fix any of them? > > This debate seems to repeatedly degenerate into this type of accusation. > > Why is backward compatibility not being taken into account here? To be > clear, the proposed change *breaks backward compatibility* and while > that's allowed in 3.6, just because it is allowed, doesn't mean we > have free rein to break compatibility - any change needs a good > justification. The arguments presented here are valid up to a point, > but every time anyone tries to suggest a weak area in the argument, > the "we should fix security issues" trump card gets pulled out. How has it not been taken into account? The current proposal (best summed up by Nick in the other thread) will not break compatability for anyone except those calling the functions that are specifically about setting a seed or getting/setting the current state. In looking around I don't see a lot of people using those particular functions so most people likely won't notice the change at all, and for those who there is a very trivial change they can make to their code to cope with the change. > > For example, as this is a compatibility break, it'll only be allowed > into 3.6+ (I've not seen anyone suggest that this is sufficiently > serious to warrant breaking compatibility on older versions). Almost > all of those SO questions, and google hits, are probably going to be > referenced by people who are using 2.7, or maybe some version of 3.x > earlier than 3.6 (at what stage do we allow for the possibility of 3.x > users who are *not* on the latest release?) So is a solution which > won't impact most of the people making the mistake, worth it? > > I fully expect the response to this to be "just because it'll take > time, doesn't mean we should do nothing". Or "even if it just fixes it > for one or two people, it's still worth it". But *that's* the argument > I don't find compelling - not that a fix won't help some situations, > but that because it's security, (a) all the usual trade-off > calculations are irrelevant, and (b) other proposed solutions (such as > education, adding specialised modules like a "shared secret" library, > etc) are off the table. We can't go back in time and fix those versions that is true. However, one of the biggest groups of people who are most likely to be helped by this change is new and inexperienced developers who don't fully grasp the security sensitive nature of whatever they are doing with random. That group of people are also more likely to be using Python 3.x than experienced programmers. > > Honestly, this type of debate doesn't do the security community much > good - there's too little willingness to compromise, and as a result > the more neutral participants (which, frankly, is pretty much anyone > who doesn't have a security agenda to promote) end up pushed into a > "reject everything" stance simply as a reaction to the black and white > argument style. > If I/we were not willing to compromise, I'd be pushing for it to use SystemRandom everywhere because that removes all of the possibly problematic parts of using using a user-space CSPRNG like is being proposed. However, I/we are willing to compromise by sacrificing possible security in order to not regress things where we can, in particular a user-space CSPRNG is being proposed over SystemRandom because it will provide you with random numbers almost as fast as MT will. However, when proposing this possible compromise, we are met with people refusing to meet us in the middle. There are some folks who are trying to propose other middle grounds, and there will undoubtably be some discussion around which ones are the best. We've gone from suggesting to replacing the default random with SystemRandom (a lot slower than MT) to removing the default altogether, to deprecating the default and replacing it with a fast user-space CSPRNG. However, folks who don't want to see it change at all have thus far been unwilling to compromise at all. I'm confused how you're saying that the security minded folks have been unwilling to compromise when we've done that repeatidly in this thread, whereas the backwards compat minded folks have consistently said "No, it would break compatability" or "We don't need to change" or "They are probably insecure anyways". Can you explain what compromise you're willing to accept here? If it doesn't involve breaking at least a little compatability then it's not a compromise it's you demanding that your opinion is the correct one (which isn't wrong, we're also asserting that our opinion is the correct one, we've just been willing to move the goal posts to try and limit the damage while still getting most of the benefit). ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From donald at stufft.io Mon Sep 14 17:22:51 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 11:22:51 -0400 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On September 14, 2015 at 11:16:48 AM, Stefan Krah (skrah at bytereef.org) wrote: > > That addresses pretty little of what I wrote, and I'd prefer to > hear anything directly from him. Your summaries have a tendency > to be highly biased. > Well, he's expressed that he's unlikely to participate in this discussion because he doesn't use Python and thus doesn't have any skin in the game. He just saw an opportunity to try and improve the "ambient" security of applications written in Python and thought he'd reach out to see if there was any interest in it on our end. I'd ask him personally, but given that I'm "biased" you'll have to manage to ask him on your own. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From graffatcolmingov at gmail.com Mon Sep 14 17:32:48 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Mon, 14 Sep 2015 10:32:48 -0500 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Mon, Sep 14, 2015 at 10:01 AM, Paul Moore wrote: > On 14 September 2015 at 14:29, Cory Benfield wrote: >> Is your argument that there are lots of ways to get security wrong, >> and for that reason we shouldn't try to fix any of them? > > This debate seems to repeatedly degenerate into this type of accusation. > > Why is backward compatibility not being taken into account here? To be > clear, the proposed change *breaks backward compatibility* and while > that's allowed in 3.6, just because it is allowed, doesn't mean we > have free rein to break compatibility - any change needs a good > justification. The arguments presented here are valid up to a point, > but every time anyone tries to suggest a weak area in the argument, > the "we should fix security issues" trump card gets pulled out. > > For example, as this is a compatibility break, it'll only be allowed > into 3.6+ (I've not seen anyone suggest that this is sufficiently > serious to warrant breaking compatibility on older versions). Almost > all of those SO questions, and google hits, are probably going to be > referenced by people who are using 2.7, or maybe some version of 3.x > earlier than 3.6 (at what stage do we allow for the possibility of 3.x > users who are *not* on the latest release?) So is a solution which > won't impact most of the people making the mistake, worth it? So people who are arguing that the defaults shouldn't be fixed on Python 2.7 are likely the same people who also argued that PEP 466 was a terrible, awful, end-of-the-world type change. Yes it broke things (like eventlet) but the net benefit for users who can get onto Python 2.7.9 (and later) is immense. Now I'm not arguing that we should do the same to the random module, but a backport (that is part of the stdlib) would probably be a good idea under the same idea of allowing users to opt into security early. > I fully expect the response to this to be "just because it'll take > time, doesn't mean we should do nothing". Or "even if it just fixes it > for one or two people, it's still worth it". But *that's* the argument > I don't find compelling - not that a fix won't help some situations, > but that because it's security, (a) all the usual trade-off > calculations are irrelevant, and (b) other proposed solutions (such as > education, adding specialised modules like a "shared secret" library, > etc) are off the table. They're not irrelevant. I personally think they're of a lower impact to the discussion, but the reality is that the people who are educating others are few and far between. If there are public domain works, free tutorials, etc. that all advocate using a module in the standard library and no one can update those, they still exist and are still recommendations. People prefer free to correct when possible because there's nothing free to correct them (until they get hacked or worse). Do we have a team in the Python community that goes out to educate for free people on security related best practices? I haven't seen them. The best we have is a few people on crufty mailing lists like this one trying to make an impact because education is a much larger and harder to solve problem than making something secure by default. Perhaps instead of bickering like fools on a mailing list, we could all be spending our time better educating others. That said, I can't make that decision for you just like you can't make that for me. > Honestly, this type of debate doesn't do the security community much > good - there's too little willingness to compromise, and as a result > the more neutral participants (which, frankly, is pretty much anyone > who doesn't have a security agenda to promote) end up pushed into a > "reject everything" stance simply as a reaction to the black and white > argument style. Except you seem to have missed much of the compromises being discussed and conceded by the security minded folks. Personally, names that describe the outputs of the algorithms make much more sense to me than "Seedless" and "Seeded" but no one has really bothered to shave that yak further out of a desire to compromise and make things better as a whole. Much of the lack of gradation has come from the opponents to this change who seem to think of security as a step function where a subjective measurement of "good enough for me" counts as secure. From skrah at bytereef.org Mon Sep 14 17:35:19 2015 From: skrah at bytereef.org (Stefan Krah) Date: Mon, 14 Sep 2015 15:35:19 +0000 (UTC) Subject: [Python-ideas] Globally configurable random number generation References: Message-ID: Donald Stufft writes: > On September 14, 2015 at 11:16:48 AM, Stefan Krah (skrah at ...) wrote: > > > > That addresses pretty little of what I wrote, and I'd prefer to > > hear anything directly from him. Your summaries have a tendency > > to be highly biased. > > > > Well, he's expressed that he's unlikely to participate in this discussion > because he doesn't use Python and thus doesn't have any skin in the game. He > just saw an opportunity to try and improve the "ambient" security of > applications written in Python and thought he'd reach out to see if there was > any interest in it on our end. > > I'd ask him personally, but given that I'm "biased" you'll have to manage to > ask him on your own. No one has asked you to do anything. Ironically, this is another example how you manage to put a spin on basically anything you respond to. Stefan Krah From sturla.molden at gmail.com Mon Sep 14 17:39:42 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 17:39:42 +0200 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> Message-ID: On 14/09/15 16:45, Random832 wrote: >> These functions aren't used widely in scientific computing. > > I don't see how that's relevant, when what I'm talking about is > "providing an API that switches them from secure mode to > insecure/deterministic mode" It is not just a matter of security versus determinism. It is also a matter of numerical accuracy. The distribution of the output sequence must be proven and be as close as possible to the distribution of interest. MT19937 is loved by scientists because it emulates sampling from the uniform distribution so well. Faster alternatives exist, more secure alternatives too. But when we simulate a stochastic process we also care about numerical accuracy. MT19937 is considered state of the art for this purpose. It does not seem that the issue of numerical accuracy is appreciated in this debate. Cryptographers just want random bits that cannot be predicted. Numerical accuracy is not their primary concern. If you replace MT19937 with "something more secure" you likely also loose its usefulness for scientific computing. Sturla From donald at stufft.io Mon Sep 14 17:41:58 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 11:41:58 -0400 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> Message-ID: On September 14, 2015 at 11:40:53 AM, Sturla Molden (sturla.molden at gmail.com) wrote: > On 14/09/15 16:45, Random832 wrote: > > >> These functions aren't used widely in scientific computing. > > > > I don't see how that's relevant, when what I'm talking about is > > "providing an API that switches them from secure mode to > > insecure/deterministic mode" > > It is not just a matter of security versus determinism. It is also a > matter of numerical accuracy. The distribution of the output sequence > must be proven and be as close as possible to the distribution of interest. > > MT19937 is loved by scientists because it emulates sampling from the > uniform distribution so well. Faster alternatives exist, more secure > alternatives too. But when we simulate a stochastic process we also care > about numerical accuracy. MT19937 is considered state of the art for > this purpose. > > It does not seem that the issue of numerical accuracy is appreciated in > this debate. Cryptographers just want random bits that cannot be > predicted. Numerical accuracy is not their primary concern. If you > replace MT19937 with "something more secure" you likely also loose its > usefulness for scientific computing. > Nobody is suggesting to remove MT, just make it so you have to explicitly opt-in to it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From robert.kern at gmail.com Mon Sep 14 17:50:15 2015 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 14 Sep 2015 16:50:15 +0100 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> Message-ID: On 2015-09-14 16:39, Sturla Molden wrote: > On 14/09/15 16:45, Random832 wrote: > >>> These functions aren't used widely in scientific computing. >> >> I don't see how that's relevant, when what I'm talking about is >> "providing an API that switches them from secure mode to >> insecure/deterministic mode" > > It is not just a matter of security versus determinism. It is also a matter of > numerical accuracy. The distribution of the output sequence must be proven and > be as close as possible to the distribution of interest. > > MT19937 is loved by scientists because it emulates sampling from the uniform > distribution so well. Faster alternatives exist, more secure alternatives too. > But when we simulate a stochastic process we also care about numerical accuracy. > MT19937 is considered state of the art for this purpose. Actually, it's well behind the state of the art as it fails BigCrush. The proposed alternative does better in this regard. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From random832 at fastmail.com Mon Sep 14 17:52:10 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 11:52:10 -0400 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> Message-ID: <1442245930.209341.383250457.5F839815@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 11:39, Sturla Molden wrote: > It does not seem that the issue of numerical accuracy is appreciated in > this debate. Cryptographers just want random bits that cannot be > predicted. Numerical accuracy is not their primary concern. If you > replace MT19937 with "something more secure" you likely also loose its > usefulness for scientific computing. Who is doing scientific computing but not using the seeding functions? From sturla.molden at gmail.com Mon Sep 14 17:53:53 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 17:53:53 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <55F0BD0F.10508@sdamon.com> Message-ID: On 10/09/15 03:55, Tim Peters wrote: > Would your answer change if a crypto generator were _faster_ than MT? > MT isn't speedy by modern standards, and is cache-hostile (about 2500 > bytes of mutable state). > > Not claiming a crypto hash _would_ be faster. But it is possible. Speed is not the main matter of concern. MT19937 is not very fast, it is very accurate. It is used in scientific computing when we want to simulate sampling from a given distribution as accurately as possible. Its strength is in the distribution of number it generates, not in its security or speed. MT19937 allows us to produce a very precise simulation of a stochastic process. The alternatives cannot compare in numerical quality, though they might be faster or more secure, or both. When we use MT19937 in scientific computing we deliberately sacrifice speed for accuracy. A cryto hash might be faster, but will it be more accurate? Accuracy means how well the generated sequence emulates sampling from a perfect uniform distribution. MT19937 does not have any real competition in this game. Sturla From antoine at python.org Mon Sep 14 17:55:30 2015 From: antoine at python.org (Antoine Pitrou) Date: Mon, 14 Sep 2015 15:55:30 +0000 (UTC) Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux References: Message-ID: Donald Stufft writes: > > How has it not been taken into account? The current proposal (best summed up > by Nick in the other thread) will not break compatability for anyone except > those calling the functions that are specifically about setting a seed or > getting/setting the current state. That's a pretty big "except". Paul's and my concern is about compatibility breakage, saying "it doesn't break compatibility except..." sounds like a lot of empty rhetoric. > In looking around I don't see a lot of > people using those particular functions Given that when you "look around" you only end up looking around amongst the Web developer crowd, I may not be surprised. You know, when I "look around" I don't see a lot of people using the random module to generate passwords. Your anecdote would be more valuable than other people's? > However, one of > the biggest groups of people who are most likely to be helped by this change is > new and inexperienced developers who don't fully grasp the security sensitive > nature of whatever they are doing with random. Yes, because generating passwords is a common and reasonable task for new and inexperienced developers? Really? Again, why don't you propose a dedicated API for that? That's what we did for constant-time comparisons. That's what people did for password hashing. That's what other people did for cryptography. I haven't seen a reasonable rebuttal to this. Why would generating passwords be any different from all those use cases? After all, if you provide a convenient API people should flock to it, instead of cumbersomely reinventing the wheel... That's what libraries are for. > However, I/we > are willing to compromise by sacrificing possible security in order to not > regress things where we can, in particular a user-space CSPRNG is being > proposed over SystemRandom because it will provide you with random numbers > almost as fast as MT will. Really, it's not so much a performance issue as a compatibility issue. The random module provides, by default, a *deterministic* stream of random numbers. That os.urandom() may be a tad slower isn't very important when you're generating one number at a time and processing it with a slow interpreter (besides, MT itself is hardly the fastest PRNG out there). That os.urandom() doesn't give you a way to seed it once and get predictable results is a big *regression* if made the default RNG in the random module. And the same can be said for a user-space CSRNG, as far as I understand the explanations here. > However, when proposing this possible compromise, we are met with people > refusing to meet us in the middle. See, people are fed up with the incompatibilities arising "in the name of the public good" in each new feature release of Python. When the "middle" doesn't sound much more desirable than the "extreme", I don't see why I should call it a "compromise". Some people have to support code in 4 different Python versions and further gratuitous breakage in the stdlib doesn't help. Yes, they can change their code. Yes, they can use the "six" module, the "future" module or whatever new bandaid exists on PyPI. Still they must change their code in a way or another because it was deemed "necessary" to break compatibility to solve a concern that doesn't seem grounded in any reasonable analysis. Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. Not Python 3.6. (in case you're wondering, trying to make all published code on the Internet secure by appropriately changing the interpreter's "behaviour" to match erroneous expectations - even *documented* as erroneous - is *not* reasonable - no matter how hard you try, there will always be occurrences of broken code that people copy and paste around) > Can you explain what > compromise you're willing to accept here? Let's rephrase this: are *you* willing to accept an admittedly "insecure by default" compromise? No you aren't, evidently. There's no evidence that you would accept to leave the top-level random functions intact, even if a new UserSpaceSecureRandom class was added to the module, right? So why would we accept a compatibility-breaking compromise? Because we are more "reasonable" than you? (which in this context really reads: more willing to quit the discussion because of boredom, exhaustion, lack of time or any other quite humane reason; which, btw, sums up of significant part of what the dynamics of python-ideas have become: "victory of the most obstinate") Yeah, that's always what you are betting on, because it's not like *you* will ever be reasonable except if it's the last resort for getting something accepted. And that's why every discussion about security with security-minded (read: "obsessed") people is a massive annoyance, even if at the end it succeeds in reaching a "compromise", after 500+ excruciating backs and forths on a mailing-list. Regards Antoine. From p.f.moore at gmail.com Mon Sep 14 17:57:01 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Sep 2015 16:57:01 +0100 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On 14 September 2015 at 14:32, Nick Coghlan wrote: > I'll write it up as a full PEP later, but I think it's just as useful > in this form for now. Please provide costs and benefits. At the moment, the proposal takes an implied stance that fixing security issues warrants disruption to users (and in particular to users with *no* security requirements). I appreciate that there's the usual 2-release long deprecation process, and that the only visible disruption is to the state/seed APIs. But I'd like to see that expanded on a little more, precisely to convince those people who *aren't* automatically convinced by "there's a security issue" arguments, that the trade-offs have been properly analyzed. For example, in terms of costs: 1. The module API is more complex and harder to teach. 2. The new API deliberately introduces a global state setting API. 3. People using "from random import choice" can't use the "simple upgrade" recommendation "from random import system_random as random". The benefits seem to be solely: 1. Users of code written based on bad advice will be protected from the consequences (as long as the code runs on a sufficiently new version of Python). (I'm serious - that's how the benefit statement reads to me. Although I agree it'd be nice if I worded it a bit more unemotionally, I genuinely don't know how to without either overstating it or making it a paragraph long...) I'm not trying to say that the cost/benefit analysis doesn't justify the change (I'm currently unconvinced, and trying to remain open in spite of the over-abundance of security rhetoric in the thread), just that it's a key point of the debate here, and it's not captured in your summary/pre-PEP. Paul From cory at lukasa.co.uk Mon Sep 14 18:06:03 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Mon, 14 Sep 2015 17:06:03 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 14 September 2015 at 16:55, Antoine Pitrou wrote: > Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. > Not Python 3.6. To clarify: your position is that we cannot break backward compatibility in Python 3.6? From cory at lukasa.co.uk Mon Sep 14 18:00:45 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Mon, 14 Sep 2015 17:00:45 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 14 September 2015 at 16:01, Paul Moore wrote: > Why is backward compatibility not being taken into account here? To be > clear, the proposed change *breaks backward compatibility* and while > that's allowed in 3.6, just because it is allowed, doesn't mean we > have free rein to break compatibility - any change needs a good > justification. The arguments presented here are valid up to a point, > but every time anyone tries to suggest a weak area in the argument, > the "we should fix security issues" trump card gets pulled out. What makes you think that I didn't take it into account? I did: and then rejected it. On a personal level, I believe that defaulting to more secure is worth backward compatibility breaks. I believe that a major reason for the overwhelming prevalence of security vulnerabilities in modern software is because we are overly attached to making people's lives *easy* at the expense of making them *safe*. I believe that software communities in general are too concerned about keeping the stuff that people used around for far too long, and not concerned enough about pushing users to make good choice. The best example of this is OpenSSL. When compiled from source naively (e.g. ./config && make && make install), OpenSSL includes support for SSLv3, SSL Compression, and SSLv2, all of which are known-broken options. To clarify, SSLv2 has been deprecated for security reasons since 1996, but a version of OpenSSL 1.0.2d you build today will happily enable *and use* it. Hell, OpenSSL's own build instructions include this note[0]: > OpenSSL has been around a long time, and it carries around a lot of > cruft. For example, from above, SSLv2 is enabled by default. SSLv2 is > completely broken, and you should disable it during configuration. Why is it that users who do not read the wiki (most of them) get an insecure build? Backwards compatibility is why. This is necessarily a reductio ad absurdum type of argument, because I'm trying to make a rhetorical point: I believe that sacrificing security on the altar of backwards compatibility is a bad idea in the long term, and I want to discourage it as best I can. I appreciate your desire to maintain backward compatibility, Paul, I really do. And I think it is probably for the best that people like you work on projects like CPython, while people like me work outside the standard library. However, that won't stop me trying to drag the stdlib towards more secure defaults: it just might make it futile. From antoine at python.org Mon Sep 14 18:15:55 2015 From: antoine at python.org (Antoine Pitrou) Date: Mon, 14 Sep 2015 16:15:55 +0000 (UTC) Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux References: Message-ID: Cory Benfield writes: > > On 14 September 2015 at 16:55, Antoine Pitrou wrote: > > Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. > > Not Python 3.6. > > To clarify: your position is that we cannot break backward > compatibility in Python 3.6? It is. Not breaking backward compatibility in feature releases (except 3.0, which was a deliberate special case) is a very long standing policy, and it is so because users have a much better time with such a policy, especially when people have to maintain code that's compatible accross multiple versions (again, the 2->3 transition is a special case, which justifies the existence of tools such as "six", and has incidently created a lot of turmoil in the community that has only recently begin to recede). Of course, fixing a bug is not necessarily breaking compatibility (although sometimes we may even refuse to fix a bug because the impact on working code would be too large). But changing or removing a documented behaviour that people rely on definitely is. We do break feature compatibility, from time to time, in exceptional and generally discussed-at-length cases, but there is a sad pressure recently to push for more compatibility breakage - and, strangely, always in the name of "security". (also note that some library modules such as asyncio are or were temporarily exempted from the compatibility requirements, because they are in very active development; the random module evidently isn't part of them) Regards Antoine. From bussonniermatthias at gmail.com Mon Sep 14 18:21:08 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 14 Sep 2015 09:21:08 -0700 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: > On Sep 14, 2015, at 06:51, Donald Stufft wrote: > > I don't love the "seedable" and "seedless" names here, but I don't have a > better suggestion for the userspace CSPRNG one because it's security properties > are a bit nuanced. People doing security sensitive things like generating keys > for cryptography should still use something based on os.urandom, so it's mostly > about providing a safety net that will "probably" [1] be safe. > Probably > something like random.ProbablySecureRandom is a bad name :) Yes but unsecureRandom for the unsecure one (which obviously is insecure) is not unreasonable. (unsafe can be shorter) -- M Also seedless does not mean secure: https://xkcd.com/221/ :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From graffatcolmingov at gmail.com Mon Sep 14 18:25:09 2015 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Mon, 14 Sep 2015 11:25:09 -0500 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 14, 2015 at 11:08:39 AM, Cory Benfield (cory at lukasa.co.uk) wrote: > On 14 September 2015 at 16:01, Paul Moore wrote: > > Why is backward compatibility not being taken into account here? To be > > clear, the proposed change *breaks backward compatibility* and while > > that's allowed in 3.6, just because it is allowed, doesn't mean we > > have free rein to break compatibility - any change needs a good > > justification. The arguments presented here are valid up to a point, > > but every time anyone tries to suggest a weak area in the argument, > > the "we should fix security issues" trump card gets pulled out. > > What makes you think that I didn't take it into account? I did: and > then rejected it. On a personal level, I believe that defaulting to > more secure is worth backward compatibility breaks. I believe that a > major reason for the overwhelming prevalence of security > vulnerabilities in modern software is because we are overly attached > to making people's lives *easy* at the expense of making them *safe*. > I believe that software communities in general are too concerned about > keeping the stuff that people used around for far too long, and not > concerned enough about pushing users to make good choice. > > The best example of this is OpenSSL. When compiled from source naively > (e.g. ./config && make && make install), OpenSSL includes support for > SSLv3, SSL Compression, and SSLv2, all of which are known-broken > options. To clarify, SSLv2 has been deprecated for security reasons > since 1996, but a version of OpenSSL 1.0.2d you build today will > happily enable *and use* it. Hell, OpenSSL's own build instructions > include this note[0]: > > > OpenSSL has been around a long time, and it carries around a lot of > > cruft. For example, from above, SSLv2 is enabled by default. SSLv2 is > > completely broken, and you should disable it during configuration. > > Why is it that users who do not read the wiki (most of them) get an > insecure build? Backwards compatibility is why. So I will counter this with what I am fully expecting to be the response: People use distributions that compile and configure OpenSSL for them, e.g., `apt-get install openssl` (not obviously the example that works, but you get the idea). That said, last year, Debian, Ubuntu, Fedora, and other distributions all started compiling openssl without SSLv3 as an available symbol which broke backwards compatibility and TONS of python projects (eventlet, urllib3, requests, etc.). Why did it break backwards compatibility? Because they knew that they were responsible for the security of their users and expecting users to recompile OpenSSL themselves with the correct flags was unrealistic. Their users come from a wide range of people: - System administrators - Desktop users (if you believe anyone actually uses linux on the desktop ;)) - Researchers - Developers - etc. > This is necessarily a reductio ad absurdum type of argument, because > I'm trying to make a rhetorical point: I believe that sacrificing > security on the altar of backwards compatibility is a bad idea in the > long term, and I want to discourage it as best I can. > > I appreciate your desire to maintain backward compatibility, Paul, I > really do. And I think it is probably for the best that people like > you work on projects like CPython, while people like me work outside > the standard library. However, that won't stop me trying to drag the > stdlib towards more secure defaults: it just might make it futile. That said, I?d also like to combat the idea that security experts won?t use random. Currently Helios which is a voting piece of software (that anyone can deploy) uses the random module (https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b719457255c3a8b8/helios/utils.py) They use it to generate passwords:?https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b719457255c3a8b8/helios/models.py#L944?https://github.com/benadida/helios-server/blob/b07c43dee5f51ce489b6fcb7b719457255c3a8b8/helios/management/commands/load_voter_files.py#L55 Ben Adida is a security professional who has written papers on creating secure voting systems but even he uses the random module arguably incorrectly in what should be secure software. Arguing that anyone who knows they need secure random functions will use them, is clearly invalidated. Not everyone who knows they should be generating securely random things are aware that the random module is insufficient for their needs. Perhaps that code was written before the big red box was added to the documentation and so it was ineffective. Perhaps Ben googled and found that everyone else was using random for passwords (as people have shown is easy to find in this discussion several times). That said, your arguments are easily reduced to ?No language should protect its users from themselves? which is equivalent to Python?s ?We?re all consenting adults philosophy?. In that case, we?re absolutely safe from any blame for the horrible problems that users inflict on themselves. Anyone that used urllib2/httplib/etc. from the standard library to talk to a site over HTTPS (prior to PEP 466) are all to blame because they didn?t read the source and know that their sensitive information was easily intercepted by anyone on their network. Clearly, that?s their fault. This makes core language development so much easier, doesn?t it? Place all the blame on the users for the sake of X (where in this discussion X is the holy grail of backwards compatibility). From cody.piersall at gmail.com Mon Sep 14 18:28:49 2015 From: cody.piersall at gmail.com (Cody Piersall) Date: Mon, 14 Sep 2015 11:28:49 -0500 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On Mon, Sep 14, 2015 at 8:32 AM, Nick Coghlan wrote: > > This is an expansion of the random module enhancement idea I > previously posted to Donald's thread: > https://mail.python.org/pipermail/python-ideas/2015-September/035969.html > > I'll write it up as a full PEP later, but I think it's just as useful > in this form for now. > > [snip] > > * expose a global SystemRandom instance as random.system_random > * provide a random.set_default_instance() API that makes it possible > to specify the instance used by the module level methods > * the module level seed(), getstate(), and setstate() functions will > throw RuntimeError if the corresponding method is missing from the > default instance One problem that people (I can't remember who) have pointed out about random.set_default_instance() is that any imported module in the same process can change the random from secure -> insecure at a distance. One way to solve this is to ensure that set_default_instance() can be called only once; if it is called more than once, a RuntimeError could be raised. I think the logging module does something like this for setting the logging level? I think the only way that this really would make sense would be to make set_default_instance() be called before any of the module level functions. The first time a module level function is called, you could default to selecting the CSRNG. If you call one of the seeded API functions (getstate, setstate, seed) before the other module-level functions the instance could default to the deterministic RNG, but that might be confusing to debug. I could imagine people getting really confused if this program worked: import random random.seed(1234) random.random() but this program failed: import random random.random() random.seed(1234) # would raise a RuntimeError random.random() # would not be reached I'm not crazy about the idea of changing the default instance based on the first module level function called; that might be a terrible idea. But I _do_ think it's a good idea not to let the default instance change throughout the life of the program. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Mon Sep 14 18:36:43 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Mon, 14 Sep 2015 17:36:43 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 14 September 2015 at 17:15, Antoine Pitrou wrote: > Cory Benfield writes: >> >> On 14 September 2015 at 16:55, Antoine Pitrou wrote: >> > Python 3 was there to break compatibility. Not Python 3.4. Not Python 3.5. >> > Not Python 3.6. >> >> To clarify: your position is that we cannot break backward >> compatibility in Python 3.6? > > It is. Not breaking backward compatibility in feature releases > (except 3.0, which was a deliberate special case) is a very long > standing policy, and it is so because users have a much better > time with such a policy, especially when people have to maintain > code that's compatible accross multiple versions (again, the 2->3 > transition is a special case, which justifies the existence of > tools such as "six", and has incidently created a lot of turmoil > in the community that has only recently begin to recede). This neatly resolves the problem. I have no further input to the discussion. From sturla.molden at gmail.com Mon Sep 14 18:56:15 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 18:56:15 +0200 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> Message-ID: On 14/09/15 17:50, Robert Kern wrote: > Actually, it's well behind the state of the art as it fails BigCrush. > The proposed alternative does better in this regard. Is that one of the PCGs? Or Arc4Random, ChaCha20 or XorShift64/32? The three latter fails on k-dimensional equi-distribution, MT does not. Some of the PCGs do too, but some should be as good as MT. Not sure if that is worse or better than failing some parts of BigCrush. Which PCG would you recommend, by the way? Sturla From g.brandl at gmx.net Mon Sep 14 18:59:50 2015 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 14 Sep 2015 18:59:50 +0200 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 09/10/2015 07:00 PM, Nick Coghlan wrote: >> +0 for deprecating the seed-related functions and saying "the stdlib uses >> was it uses as a RNG and you have to live with it if you don't make your own >> choice" and switching to a crypto-secure RNG. > > However, this I'm +1 on. People *do* use the module level APIs > inappropriately, and we can get them to a much safer place, while > nudging folks that genuinely need deterministic randomness towards an > alternative API. I agree. Deprecating (and eventually removing) the 4 seed-related functions seems like the least intrusive, but still effective, solution to this issue. Georg From mal at egenix.com Mon Sep 14 19:15:48 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 14 Sep 2015 19:15:48 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> Message-ID: <55F700C4.4030900@egenix.com> [This is getting off-topic, so I'll stop after this reply] On 14.09.2015 14:26, Nathaniel Smith wrote: > On Mon, Sep 14, 2015 at 3:37 AM, M.-A. Lemburg wrote: >> On 14.09.2015 08:38, Nathaniel Smith wrote: >>> If Tim Peters can get fooled >>> into thinking something like using MT to generate session ids is >>> "probably mostly secure", then what chance do the rest of us have? >>> >> >> I don't think that Tim can get fooled into believing he is a >> crypto wonk ;-) >> >> The thread reveals another misunderstanding: >> >> Broken code doesn't get any better when you change the context >> in which it is run. > > As an aphorism this sounds nice, but logically it makes no sense. If > the broken thing about your code is that it assumes that the output of > the RNG is unguessable, and you change the context by making the > output of the RNG unguessable, then now the code it isn't broken. It's still broken, because it's making wrong assumptions on the documented context and given that it did in the first place, suggests that this is not the only aspect of it being broken (pure speculation, but experience shows that bugs usually run around in groups ;-)). > The code would indeed remain broken when run under e.g. older > interpreters, but this is not an argument that we should make sure > that it stays broken in the future. > >> By fixing the RNG used in such broken code and making it >> harder to run attacks, you are only changing the context in which >> the code is run. The code itself still remains broken. >> >> Code which uses the output from an RNG as session id without adding >> any additional security measures is broken, regardless of what kind >> of RNG you are using. I bet such code will also take any session id >> it receives as cookie and trust it without applying extra checks >> on it. > > Yes, that's... generally the thing you do with session cookies? > They're shared secret string that you use as keys into some sort of > server-side session database? What extra checks need to be applied? You will at least want to add checks that the session id string was indeed generated by the server and not some bot trying to find valid session ids, e.g. by signing the session id and checking the sig on incoming requests. Other things you can do: fold timeouts into the id, add IP addresses, browser sigs, request sequence numbers. You also need to make sure that the session ids are taken from a large enough set to make it highly unlikely that someone can guess the id simply in case the number of active sessions is significant compared to the universe of possible ids, e.g. 32-bit ids are great for database indexes, but a pretty bad idea if you have millions of active sessions. >> Rather than trying to fix up the default RNG in Python by replacing >> it with a crypto RNG, it's better to open bug reports to get the >> broken software fixed. >> >> Replacing the default Python RNG with a new unstudied crypto one, >> will likely introduce problems into working code which rightly >> assumes the proven statistical properties of the MT. >> >> Just think of the consequences of adding unwanted bias to simulations. >> This is far more likely to go unnoticed than a session highjack due >> to a broken system and can easily cost millions (or earn you >> millions - it's all probability after all :-)). > > I'm afraid you just don't understand what you're talking about here. > > When it comes to adding bias to simulations, all crypto RNGs have > *better* statistical properties than MT. A crypto RNG which was merely > as statistically-well-behaved as MT would be considered totally > broken, because MT doesn't even pass black-box tests of randomness > like TestU01. I am well aware that MT doesn't satisfy all empirical tests and also that it is not a CSPRNG (see the code Tim and I discussed in this thread showing how easy it is to synchronize to an existing MT RNG if you can gain knowledge of 624 output values). However, it has been extensively studied and it is proven to be equidistributed which is a key property needed for it to be used as basis for other derived probability distributions (as it done by the random module). For CSPRNGs you can empirically test properties, but due to their nature not prove e.g. them being equidistributed - even though they usually will pass standard frequency tests. For real-life purposes, you're probably right with them not being biased. I'm a mathematician, though, so like provable more than empirical :-) The main purpose of CSPRNGs is producing output which you cannot guess, not to produce output which has provable distribution qualities. They do this in a more efficient way than having to wait for enough entropy to be collected - basically making true random number generators practically usable. There's a new field which appears to be popular these days: "Chaotic Pseudo Random Number Generators" (CPRNGs). These are based on chaotic systems and are great for making better use of available entropy. I'm sure we'll have something similar to the MT for these chaotic systems come out of this research in a while and then Python should follow this by implementing it in a new module. Until then, I think it's worthwhile using the existing rand code in OpenSSL and exposing this through the ssl module: https://www.openssl.org/docs/man1.0.1/crypto/rand.html It interfaces to platform hardware true RNGs where available, falls back to an SHA-1 based 1k pool based generator where needed. It's being used for SSL session keys, key generation, etc., trusted by millions of people and passes the NIST tests. This paper explains the algorithm in more detail: http://webpages.uncc.edu/yonwang/papers/lilesorics.pdf The downside of the OpenSSL implementation is that it can fail if there isn't enough entropy available. Here's a slightly better algorithm, but it's just one of many which you can find when searching for CPRNGs: https://eprint.iacr.org/2012/471.pdf >> Now, pointing people who write broken code to a new module which >> provides a crypto RNG probably isn't much better either. They'd feel >> instantly secure because it says "crypto" on the box and forget >> about redesigning their insecure protocol as well. Nothing much you >> can do about that, I'm afraid. > > Yes, improving the RNG only helps with some problems, not others; it > might merely make a system harder to attack, rather than impossible to > attack. But giving people unguessable random numbers by default does > solve real problems. Drop the "by default" and I agree, as will probably everyone else in this thread :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 14 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 4 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 12 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From Nikolaus at rath.org Mon Sep 14 20:32:26 2015 From: Nikolaus at rath.org (Nikolaus Rath) Date: Mon, 14 Sep 2015 11:32:26 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F700C4.4030900@egenix.com> (M.'s message of "Mon, 14 Sep 2015 19:15:48 +0200") References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: <87oah4rgw5.fsf@thinkpad.rath.org> On Sep 14 2015, "M.-A. Lemburg" wrote: >>> Code which uses the output from an RNG as session id without adding >>> any additional security measures is broken, regardless of what kind >>> of RNG you are using. I bet such code will also take any session id >>> it receives as cookie and trust it without applying extra checks >>> on it. >> >> Yes, that's... generally the thing you do with session cookies? >> They're shared secret string that you use as keys into some sort of >> server-side session database? What extra checks need to be applied? > > You will at least want to add checks that the session id string was > indeed generated by the server and not some bot trying to > find valid session ids, e.g. by signing the session id and > checking the sig on incoming requests. The chance of a bot hitting a valid (randomly generated) session key by chance should be just as high as the bot generating a correctly signed session key by chance, if I'm not mistaken. (Assuming, of course, that the completely random key has the same number of bits as they other key + signature). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From sturla.molden at gmail.com Mon Sep 14 21:07:40 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 21:07:40 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F700C4.4030900@egenix.com> References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: On 14/09/15 19:15, M.-A. Lemburg wrote: > I am well aware that MT doesn't satisfy all empirical tests > and also that it is not a CSPRNG > However, it has been extensively studied and it is proven to be > equidistributed which is a key property needed for it to be used as > basis for other derived probability distributions (as it done by the > random module). And with this criterion, only MT and certain PCG generators are acceptable. Those are (to my knowledge) the only ones with proven equidistribution. Sturla From p.f.moore at gmail.com Mon Sep 14 21:14:05 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Sep 2015 20:14:05 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 14 September 2015 at 16:32, Ian Cordasco wrote: >> I fully expect the response to this to be "just because it'll take >> time, doesn't mean we should do nothing". Or "even if it just fixes it >> for one or two people, it's still worth it". But *that's* the argument >> I don't find compelling - not that a fix won't help some situations, >> but that because it's security, (a) all the usual trade-off >> calculations are irrelevant, and (b) other proposed solutions (such as >> education, adding specialised modules like a "shared secret" library, >> etc) are off the table. > > They're not irrelevant. I personally think they're of a lower impact > to the discussion, but the reality is that the people who are > educating others are few and far between. If there are public domain > works, free tutorials, etc. that all advocate using a module in the > standard library and no one can update those, they still exist and are > still recommendations. People prefer free to correct when possible > because there's nothing free to correct them (until they get hacked or > worse). Do we have a team in the Python community that goes out to > educate for free people on security related best practices? I haven't > seen them. The best we have is a few people on crufty mailing lists > like this one trying to make an impact because education is a much > larger and harder to solve problem than making something secure by > default. > > Perhaps instead of bickering like fools on a mailing list, we could > all be spending our time better educating others. You may well be right. Personally, I'm pretty sick of the way all of these debates degenerate into content-free reiteration of the same old points, and unwillingness to hear other people's views. Here's a point - it seems likely that the people arguing for this change are of the opinion that I'm not appreciating their position. (For the record, I'm not being deliberately obstructive in case anyone thought otherwise. In my view at least, I don't understand the security guys' position). Assuming that's the case, then I'm probably one of the people who needs educating. But I don't feel like anyone's trying to educate me, just that I'm being browbeaten until I give in. Education != indoctrination. > That said, I can't > make that decision for you just like you can't make that for me. Indeed. Personally, I spend quite a lot of time in my day job (closed source corporate environment) trying to educate people in sane security practices, usually ones I have learned from people in communities like this one. One of the biggest challenges I have is stopping people from viewing security as "an annoying set of rules that get in the way of what I'm trying to do". But you would not believe the sorts of things I see routinely - I'm not willing to give examples or even outlines on a public mailing list because I can't assess whether such information could be turned into an exploit. I can say, though, that crypto-safe RNGs is *not* a relevant factor :-) At its best, good security practice should *help* people write reliable, easy to use systems. Or at a minimum, not get in the way. But the PR message needs always to be "I understand the constraints you're dealing with", not "you must do this for your own good". Otherwise the "follow the rules until the auditors go away" attitude just gets reinforced. Hence my focus on seeing proof that breakages are justified *in the context of the target audience I am responsible for*. Conversely, you're right that I can't force anyone else to try to educate people in good security practices, however much better than me at it I might think they are. In actual fact, though, I think a lot of people do a lot of good work educating others - as I say, most of what I've learned has been from lists like these. >> Honestly, this type of debate doesn't do the security community much >> good - there's too little willingness to compromise, and as a result >> the more neutral participants (which, frankly, is pretty much anyone >> who doesn't have a security agenda to promote) end up pushed into a >> "reject everything" stance simply as a reaction to the black and white >> argument style. > > Except you seem to have missed much of the compromises being discussed > and conceded by the security minded folks. OK, you have a point - there have been changes to the proposals. But there are fundamental points that have (as far as I can see) never been acknowledged. As a result, the changes feel less like compromises based on understanding each other's viewpoints, and more like repeated attempts to push something through, even if it's not what was originally proposed. (I *know* this is an emotional position - please understand I'm fed up and not always managing to word things objectively). Specifically, I have been told that I can't argue my "convenience" over the weight of all the other people who could fall into security traps with the current API. Let's review that, shall we? * My argument is that breaking backward compatibility needs to be justified. People have different priorities. "Security risks should be fixed" isn't (IMO) a free pass. Why should it be? "Windows compatibility issues should be fixed" isn't a free pass. "PyPy/Jython compatibility issues should be fixed" isn't a free pass. Forcing me to adjust my priorities so that I care about security when I don't want (or IMO need) to isn't acceptable. * The security arguments seem to be largely in the context of web application development (cookies, passwords, shared secrets, ...) That's not the only context that matters. * As I said above, in my experience, a compatibility break "to make things more secure" is seen as equating security with inconvenience, and can actually harm attempts to educate users in better security practices. * In many environments, reproducibility of random streams is important. I'm not an expert on those fields, although I've hit some situations where seeding is a requirement. As far as I am aware, most of those situations have no security implications. So for them, the PEP is all cost, no benefit. Sure the cost is small, but it's non-zero. How come the web application development community is the only one whose voice gets heard? Is it because the fact that they *are* public-facing, and frequently open-source, means that data is available? So "back it up with facts or we won't believe you" becomes a debating stance? I'm not arguing that everyone should be allowed to climb up on their soapbox and rant - but I would like to think that bringing a different perspective to the table could be treated with respect and genuine attempts to understand. And "in my experience" is viewed as an offer of information, not as an attempt to bluff on a worthless hand. Just to be clear, I think the current proposal (Nick's pre-PEP) is relatively unobtrusive, and unlikely to cause serious compatibility issues. I'm uncomfortable with the fact that it feels like yet another "imposition in the name of security", and while I'm only one person I feel that I'm not alone. I'm concerned that the people pushing security seem unable to recognise that people becoming sick of such changes is a PR problem they need to address, but that's their issue not mine. So I'm unlikely to vote against the proposal, but I'll feel sad if it's accepted without a more balanced discussion than we've currently had. On the meta-issue of how debates like this are conducted, I think people probably need to listen more than they talk. I'm as guilty as anyone else here. But in particular, when multiple people all end up responding to rebut *every* counter-argument, essentially with the same response, maybe it's time to think "we're in the majority here, let's stop talking so much and see if we're missing anything from what the people with other views are saying". He who shouts loudest isn't always right. Not necessarily wrong, either, but sometimes it's bloody hard to tell one way or the other, if they won't shut up long enough to analyze the objections. > Personally, names that > describe the outputs of the algorithms make much more sense to me than > "Seedless" and "Seeded" but no one has really bothered to shave that > yak further out of a desire to compromise and make things better as a > whole. I'm frankly long past caring. I think we'll end up with whatever was on the table when people got too tired to argue any more. > Much of the lack of gradation has come from the opponents to > this change who seem to think of security as a step function where a > subjective measurement of "good enough for me" counts as secure. Wait, what? It's *me* that's claiming that security is a yes/no thing??? When all I'm hearing is "education isn't sufficient", "dedicated libraries aren't sufficient", "keeping a deterministic RNG as default isn't an option"? And when I'm suggesting that fixing the PRNG use in code that misuses a PRNG may not be the only security issue with that code? I knew the two sides weren't communicating, but this statement staggers me. We have clearly misunderstood each other even more fundamentally that I had thought possible :-( Thinking hard about the implications of what you said there, I start to see why you might have misinterpreted my stance as the black and white one. But I have absolutely no idea how to explain to you that I find your stance equally (and before I took the time to think through what your statement implied, even more) so. There's little more I can say. I'm going to take my own advice now, and stop talking. I'll keep listening, in the hope that either this post or something else will somehow break the logjam, but right now I'm not sure I have much hope of that. Paul From robert.kern at gmail.com Mon Sep 14 21:25:14 2015 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 14 Sep 2015 20:25:14 +0100 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> Message-ID: On 2015-09-14 17:56, Sturla Molden wrote: > On 14/09/15 17:50, Robert Kern wrote: > >> Actually, it's well behind the state of the art as it fails BigCrush. >> The proposed alternative does better in this regard. > > Is that one of the PCGs? Or Arc4Random, ChaCha20 or XorShift64/32? The alternative proposed in this thread is ChaCha20. > The three latter fails on k-dimensional equi-distribution, MT does not. Some of > the PCGs do too, but some should be as good as MT. Not sure if that is worse or > better than failing some parts of BigCrush. There is a reason that exact k-dimensional equidistribution for such a large k is not tested even in BigCrush. It's a nifty feature useful in a few applications, but not for simulations. It is important that the PRNG is *well*-distributed, but exact equidistribution is mostly neither here nor there. It can be trivially implemented by statistically bad PRNGs, like a simple counter. Obtaining it requires implementing an astronomically long period (and consequent growth in the state size) that adds significant costs without any realizable improvement to the statistics. If I'm drawing millions of numbers, k=623 is not much better than k=1, provided that the generator is otherwise good. > Which PCG would you recommend, by the way? Probably pcg64 (128-bit state, 64-bit output). Having the 64-bit output is nice so you only have to draw one value to make a uniform(0,1) double, and a period of 2**128 is nice and roomy without being excessively large. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From random832 at fastmail.com Mon Sep 14 21:25:52 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 15:25:52 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: <1442258752.255504.383449553.23795E2B@webmail.messagingengine.com> On Mon, Sep 14, 2015, at 15:14, Paul Moore wrote: > * My argument is that breaking backward compatibility needs to be > justified. I don't think it does. I think that there needs to be a long roadmap of deprecation and provided workarounds for *almost any* backwards-compatibility-breaking change, but that special justification beyond "is this a good feature" is only needed for ignoring that roadmap, not for deprecating/replacing a feature in line with it. No-one, as far as I have seen in this thread to date, has actually put a timeline on this change. No-one's talking about getting rid of the global functions in 3.5.1, or in 3.6, or in 3.7. So with that in mind I can only conclude that the people against making the change are against *ever* making it *at all* - and certainly a lot of the arguments they're making have to do with nebulous educational use-cases (class instances are hard, let's use mutable global state) rather than backwards compatibility. Would you likewise have been against every single thing that Python 3 did? From p.f.moore at gmail.com Mon Sep 14 21:26:49 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Sep 2015 20:26:49 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 14 September 2015 at 17:00, Cory Benfield wrote: > What makes you think that I didn't take it into account? I did: and > then rejected it. On a personal level, I believe that defaulting to > more secure is worth backward compatibility breaks. I believe that a > major reason for the overwhelming prevalence of security > vulnerabilities in modern software is because we are overly attached > to making people's lives *easy* at the expense of making them *safe*. > I believe that software communities in general are too concerned about > keeping the stuff that people used around for far too long, and not > concerned enough about pushing users to make good choice. OK. In *my* experience, systems with appallingly bad security practices run for many years with no sign of an exploit. The vulnerabilities described in this thread pale into insignificance compared to many I have seen. On the other hand, I regularly see systems not being upgraded because the cost of confirming that there are no regressions (much less the cost of making fixes for deliberate incompatibilities) is deemed too high. I'm not trying to justify those things, nor am I trying to say that my experience is in any way "worth more" than yours. These aren't all Python systems. But the culture where such things occur is real, and I have no reason to believe that I'm the only person in this position. (But as it's in-house closed-source, it's essentially impossible to get any good view of how common it is). Paul From donald at stufft.io Mon Sep 14 22:23:17 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 16:23:17 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 14, 2015 at 3:14:45 PM, Paul Moore (p.f.moore at gmail.com) wrote: > > Here's a point - it seems likely that the people arguing for this > change are of the opinion that I'm not appreciating their position. > (For the record, I'm not being deliberately obstructive in case anyone > thought otherwise. In my view at least, I don't understand the > security guys' position). Assuming that's the case, then I'm probably > one of the people who needs educating. But I don't feel like anyone's > trying to educate me, just that I'm being browbeaten until I give in. > > Education != indoctrination. For the record, I'm not sure what part you don't understand. I'm happy to try and explain it, but I think I'm misunderstanding what you're not understanding or something because I personally feel like I did explain what I think you're misunderstanding. Part of the problem (probably) here is that there isn't an exact person we're trying to protect here. The general gist is that if you use the deterministic APIs in a security sensitive situation, then you may be vulnerable depending on exactly what you're doing. We think that in particular, the API of the random module will lead inexperienced or un(der)informed developers to use the API in situations that it's not appropiate and from that, have an insecure piece of software they wrote. We're people who think that the defaults of the software should be "generally" secure (as much so as is reasonable) and that if you want to do something that isn't safe then you should explicitly opt in to that (the flipside is, things shouldn't be so locked down as to be unusable without having to turn off all of the security knobs, this is where the "generally" in generally secure comes into play). A particularly nasty side effects of this, is that it's almost never the people who wrote this software who are harmed by it being broken and it's almost always their users who didn't have anything to do with it. So essentially the goal is to try and make it harder for people to accidently misuse the random module. If that doesn't answer your confusion, if you can try to reword it to get it through my thick skull better, I'm happy to continue to try an answer it (on or off list). > > At its best, good security practice should *help* people write > reliable, easy to use systems. Or at a minimum, not get in the way. > But the PR message needs always to be "I understand the constraints > you're dealing with", not "you must do this for your own good". > Otherwise the "follow the rules until the auditors go away" attitude > just gets reinforced. Hence my focus on seeing proof that breakages > are justified *in the context of the target audience I am responsible > for*. Right, and this is actually trying to do that. By removing a possibly dangerous default and making the default safer. Defaults matter a lot in security (and sadly, a lot of software doesn't have safe defaults) because a lot of software will never use anything but the defaults. > > Conversely, you're right that I can't force anyone else to try to > educate people in good security practices, however much better than me > at it I might think they are. In actual fact, though, I think a lot of > people do a lot of good work educating others - as I say, most of what > I've learned has been from lists like these. > > >> Honestly, this type of debate doesn't do the security community much > >> good - there's too little willingness to compromise, and as a result > >> the more neutral participants (which, frankly, is pretty much anyone > >> who doesn't have a security agenda to promote) end up pushed into a > >> "reject everything" stance simply as a reaction to the black and white > >> argument style. > > > > Except you seem to have missed much of the compromises being discussed > > and conceded by the security minded folks. > > OK, you have a point - there have been changes to the proposals. But > there are fundamental points that have (as far as I can see) never > been acknowledged. As a result, the changes feel less like compromises > based on understanding each other's viewpoints, and more like repeated > attempts to push something through, even if it's not what was > originally proposed. (I *know* this is an emotional position - please > understand I'm fed up and not always managing to word things > objectively). I think part of this is that a lot of the folks proposing these changes are also sensitive to the backwards compatability needs and have already baked that into their thoughts. We don't generally come into these with "scorched earth" suggestions of fixing some situation where security could be improved but instead try and figure out a decent balance of security and not breaking things to try and cover most of the ground with as little cost as possible. My very first email in this particular thread (that started this thread) was the first one I had with a fully solid proposal in it. The last paragraph in that proposal asked the question "Do we want to protect users by default?" My next email presents two possible options depending on which we considered to be "less" breaking, either deprecating the module scoped functions completely or change their defaults to something secure and mentioned that if we can't change the default, the user-land CSPRNG probably isn't a useful addition because it's benefit is primarily in being able to make it the default option. I don't see anyone who is talking about making a change not also talking about what areas of backwards compatibility it would actually break. I think part of this too is that security is a bit weird, it's not a boolean property but there are particular bars you need to pass before it's an actual solution to the problem. So for a lot of us, we'll figure out that bar and draw a line in the sand and say "If this proposal crosses this line, then doing nothing is better than doing something" because it'd just be churn for churns sake at that point. That's why you'll see particular points that we essentially won't give up, because if they are given up we might as well do nothing. In this particular instance, the point is that the API of the random module leads people to use it incorrectly, so unless we address that, we might as well just leave it alone. > > Specifically, I have been told that I can't argue my "convenience" > over the weight of all the other people who could fall into security > traps with the current API. Let's review that, shall we? I think I was the one who said that to you, and I'd like to explain why I said it (beyond the fact I was riled up). Essentially I had in my mind something like what Nick has proposed, which you've said later on you think is relatively? unobtrusive, and unlikely to cause serious compatibility, which I agree with. Then I saw you arguing against what I felt was a pretty mundane API break that was fairly trivial to work around, and it signaled to me that you were saying that having to type a few extra letters was a bridge too far. This reads to me like someone saying "Well I know how to use it correctly, it's their own fault if others don't". I'm not saying that's what you actually think but that's how it read to me. > > * My argument is that breaking backward compatibility needs to be > justified. People have different priorities. "Security risks should be > fixed" isn't (IMO) a free pass. Why should it be? "Windows > compatibility issues should be fixed" isn't a free pass. "PyPy/Jython > compatibility issues should be fixed" isn't a free pass. Forcing me to > adjust my priorities so that I care about security when I don't want > (or IMO need) to isn't acceptable. The justification is essentially that it will protect some people with minimal impact to others. The main impact will be people who actually needed a deterministic RNG will need to use something like ``random.seeded_random`` instead of just ``random`` and importantly, this will break in a fairly obvious manner instead of the silently wrong situation for people who are currently using the top level API incorrectly. As a bit of a divergence, the "silently wrong" part is why defaults tend to matter a lot in security. Unless you're well versed in it, most people don't think about it and since it "works" they don't inquire further. Something that is security sensitive that always "works" (as in, doesn't raise an error) is broken which is the inverse of how most people think about software. To put it another way, it's the job of security sensitive APIs to break things, ideally only in cases where it's important to break, but unless you're actually testing that it breaks in those attack scenarios, secure and insecure looks exactly the same. > * The security arguments seem to be largely in the context of web > application development (cookies, passwords, shared secrets, ...) > That's not the only context that matters. You're right it's not the only context that matters, however it's often brought up for a few reasons: * Security largely doesn't matter for software that doesn't accept or send ?input from some untrusted source which narrows security down to be mostly ?network based applications. * The HTTP protocol is "eating the world" and we're seeing more and more things ? using it as their communication protocol (even for things that are not ? traditional browser based applications). * Traditional Web Applications/Sites are a pretty large target audience for ? Python and in particular a lot of the security folks come from that world ? because the web is a hostile place. But you can replace web application with anything that an untrusted user can interact with over any protocol and the argument is basically the same. > * As I said above, in my experience, a compatibility break "to make > things more secure" is seen as equating security with inconvenience, > and can actually harm attempts to educate users in better security > practices. Sadly, I don't think this is fully resolvable :( It is the nature of security that it's purpose is to take something that otherwise "works" and make it no longer work because it doesn't satisfy the constraints of the security system. > * In many environments, reproducibility of random streams is > important. I'm not an expert on those fields, although I've hit some > situations where seeding is a requirement. As far as I am aware, most > of those situations have no security implications. So for them, the > PEP is all cost, no benefit. Sure the cost is small, but it's > non-zero. Right, and I don't think anyone is saying this isn't an important use case, just that if you need a deterministic RNG and you don't get one, that is a fairly obvious problem but if you need a CSPRNG and you don't get one, that is not obvious. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From donald at stufft.io Mon Sep 14 22:36:16 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 16:36:16 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 14, 2015 at 3:27:22 PM, Paul Moore (p.f.moore at gmail.com) wrote: > On 14 September 2015 at 17:00, Cory Benfield wrote: > > What makes you think that I didn't take it into account? I did: and > > then rejected it. On a personal level, I believe that defaulting to > > more secure is worth backward compatibility breaks. I believe that a > > major reason for the overwhelming prevalence of security > > vulnerabilities in modern software is because we are overly attached > > to making people's lives *easy* at the expense of making them *safe*. > > I believe that software communities in general are too concerned about > > keeping the stuff that people used around for far too long, and not > > concerned enough about pushing users to make good choice. > > OK. In *my* experience, systems with appallingly bad security > practices run for many years with no sign of an exploit. The > vulnerabilities described in this thread pale into insignificance > compared to many I have seen.? What does "no sign of an exploit" mean? Does it mean that if there was an exploit that the attackers didn't put metaphorical giant signs up to say that "Zero Cool" was here? Or is there an active security team running IDS software to ensure that there wasn't a breach? I ask because in my experience, "no sign of an exploit" is often synonymous with "we've never really looked to see if we were exploited, but we haven't noticed anything". This is a dangerous way to look at it, because a lot of exploitation is being done by organized crime where they don't want you to notice that you were exploited because they want to make you part of a botnet or to silently steal data or whatever you have. For these, if they get detected that is a bad thing because they lose that node in their botnet (or whatever). It's a very rare exploit that gets publically exposed like the Ashley Madison hacks, they are jsut the ones that get the most attention because they are bombastic and public. > On the other hand, I regularly see > systems not being upgraded because the cost of confirming that there > are no regressions (much less the cost of making fixes for deliberate > incompatibilities) is deemed too high. Absolutely! However, I think these systems largely don't upgrade *at all* and are still on whatever version of $LANG they originally wrote the software for. These systems tend to be so regression adverse that they don't even risk bug fixes because that might cause a regression. For these people, it doesn't really matter what we do because they aren't going to upgrade anyways, and they keep Red Hat in business by paying them for Python 2.4 until the heat death of the universe. I think the more likely case for concern is people who do upgrade and are willing to tolerate some regression in order to stay somewhat current. These people will push back against *massive* breakage (as seen with the Python 3.x migration taking forever) but are often perfectly fine dealing with small breakages. As someone who does write software that supports a lot of versions (currently, 6-7 versions of CPython alone is my standard depending if you count pre-releases or not) having to tweak import statements doesn't even really register in my "give a damn" meter, nor did it for the folks I know who are in similar situations (though this is admittingly a biased and small sample). > > I'm not trying to justify those things, nor am I trying to say that my > experience is in any way "worth more" than yours. These aren't all > Python systems. But the culture where such things occur is real, and I > have no reason to believe that I'm the only person in this position. > (But as it's in-house closed-source, it's essentially impossible to > get any good view of how common it is). > I think maybe a problem here is a difference in how we look at the data. It seems that you might focus on the probability of you personally (or the things you work on) getting attacked and thus benefiting from these changes, whereas I, and I suspect the others like me, think about the probability of *anyone* being attacked. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From sturla.molden at gmail.com Mon Sep 14 22:39:42 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 22:39:42 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: On 14/09/15 21:07, Sturla Molden wrote: > And with this criterion, only MT and certain PCG generators are > acceptable. Those are (to my knowledge) the only ones with proven > equidistribution. Just to explain, for those who do not know... Equidistribution means that the numbers are "uniformly distributed", or specifically that "the proportion of the sequence that fall within an interval is proportional to the length of the interval". With one-dimensional equidistribution, the deviates are uniformly distributed on a line. With two-dimensional equidistribution, the deviates are uniformly distributed in a square. With three-dimensional equidistribution, the deviates are uniformly distributed in a cube. k-dimensional equi-distribution generalizes this up to a k-dimensional space. Let us say you want to simulate a shooter firing a gun at a target. Every bullet is aimed at the target and hits in a sightly different place. The shooter is unbiased, but there will be some random jitter. The probability of hitting the target should be proportional to its size, right? Perhaps! Mersenne Twister MT19937 (used in Python) is proven to have 623 dimensional equidistribution. Certain PCG generators are proven to have equidistribution of arbitrary dimensionality. Your simulation of the shooter will agree with common sence if you pick one of these. With other generators, such there are no k-dimensional equidistribution. Your simulation of the shooter will disagree with common sence. Which is correct? Common sence. From a mathematical point of view, this is so important than anything else than Mersenne Twister or PCG is not worth considering in a Monte Carlo simulation. Sturla From robert.kern at gmail.com Mon Sep 14 22:45:25 2015 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 14 Sep 2015 21:45:25 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: On 2015-09-14 20:07, Sturla Molden wrote: > On 14/09/15 19:15, M.-A. Lemburg wrote: > >> I am well aware that MT doesn't satisfy all empirical tests >> and also that it is not a CSPRNG > >> However, it has been extensively studied and it is proven to be >> equidistributed which is a key property needed for it to be used as >> basis for other derived probability distributions (as it done by the >> random module). > > And with this criterion, only MT and certain PCG generators are acceptable. > Those are (to my knowledge) the only ones with proven equidistribution. Do not confuse k-dimensional equidistribution with "equidistribution". The latter property (how uniformly a single draw is distributed) is the one that the derived probability distributions rely upon, not the former. Funny story: MT is provably *not* strictly equidistributed; it produces a exactly 624 fewer 0s than it does any other uint32 if you run it over its entire period. Not that it really matters, practically speaking. FWIW, lots of PRNGs can prove either property. To Nate's point, I think he is primarily thinking of counter-mode block ciphers when we talks of CSPRNGs, and they are trivially proved to be equidistributed. The counter is obviously equidistributed, and the symmetric encryption function is a bijective function from counter to output. However, not all CSPRNGs are constructed alike. In particular, ChaCha20 is a stream cipher rather than a block cipher, and I think Marc-Andre is right that it would be difficult to prove equidistribution. Proving substantial *non*-equidistribution could eventually happen though, as it did to ARC4, which prompted its replacement with ChaCha20 in OpenBSD, IIRC. And all that said, provable equidistribution (much less provable k-dimensional equidistribution) doesn't make a good PRNG. A simple counter satisfies equidistribution, but it is a terrible PRNG. The empirical tests are more important IMO. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sturla.molden at gmail.com Mon Sep 14 22:49:14 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 22:49:14 +0200 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> Message-ID: On 14/09/15 21:25, Robert Kern wrote: >> Which PCG would you recommend, by the way? > > Probably pcg64 (128-bit state, 64-bit output). Having the 64-bit output > is nice so you only have to draw one value to make a uniform(0,1) > double, and a period of 2**128 is nice and roomy without being > excessively large. Thanks :) Sturla From sturla.molden at gmail.com Mon Sep 14 22:56:05 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 22:56:05 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: On 14/09/15 22:45, Robert Kern wrote: > On 2015-09-14 20:07, Sturla Molden wrote: >> On 14/09/15 19:15, M.-A. Lemburg wrote: >> >>> I am well aware that MT doesn't satisfy all empirical tests >>> and also that it is not a CSPRNG >> >>> However, it has been extensively studied and it is proven to be >>> equidistributed which is a key property needed for it to be used as >>> basis for other derived probability distributions (as it done by the >>> random module). >> >> And with this criterion, only MT and certain PCG generators are >> acceptable. >> Those are (to my knowledge) the only ones with proven equidistribution. > > Do not confuse k-dimensional equidistribution with "equidistribution". > The latter property (how uniformly a single draw is distributed) is the > one that the derived probability distributions rely upon, not the > former. Yes, there was something fishy about this. k-dimensional equidistribution matters if we simulate a k-dimensional tuple, as I understand it. Sturla From sturla.molden at gmail.com Mon Sep 14 22:59:02 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 14 Sep 2015 22:59:02 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: On 14/09/15 22:45, Robert Kern wrote: > Funny story: MT is provably *not* strictly equidistributed; it > produces a exactly 624 fewer 0s than it does any other uint32 if you run > it over its entire period. Not that it really matters, practically > speaking. I probably would not live long enough to see it ;) Sturla From robert.kern at gmail.com Mon Sep 14 23:19:09 2015 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 14 Sep 2015 22:19:09 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: On 2015-09-14 21:56, Sturla Molden wrote: > On 14/09/15 22:45, Robert Kern wrote: >> On 2015-09-14 20:07, Sturla Molden wrote: >>> On 14/09/15 19:15, M.-A. Lemburg wrote: >>> >>>> I am well aware that MT doesn't satisfy all empirical tests >>>> and also that it is not a CSPRNG >>> >>>> However, it has been extensively studied and it is proven to be >>>> equidistributed which is a key property needed for it to be used as >>>> basis for other derived probability distributions (as it done by the >>>> random module). >>> >>> And with this criterion, only MT and certain PCG generators are >>> acceptable. >>> Those are (to my knowledge) the only ones with proven equidistribution. >> >> Do not confuse k-dimensional equidistribution with "equidistribution". >> The latter property (how uniformly a single draw is distributed) is the >> one that the derived probability distributions rely upon, not the >> former. > > Yes, there was something fishy about this. k-dimensional equidistribution > matters if we simulate a k-dimensional tuple, as I understand it. Yeah, but we do that every time we draw k numbers in a simulation at all. And we usually draw millions. In that case, perfect k=623-dimensional equidistribution is not really any better than k=1, provided that the PRNG is otherwise good. The requirement for a good PRNG for simulation work is that it be *well* distributed in reasonable dimensions, not that it be *exactly* equidistributed for some k. And well-distributedness is exactly what is tested in TestU01. It is essentially a collection of simulations designed to expose known statistical flaws in PRNGs. So to your earlier question as to which is more damning, failing TestU01 or not being perfectly 623-dim equidistributed, failing TestU01 is. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mal at egenix.com Tue Sep 15 00:09:13 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Sep 2015 00:09:13 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: <55F74589.4030805@egenix.com> On 14.09.2015 23:19, Robert Kern wrote: > On 2015-09-14 21:56, Sturla Molden wrote: >> On 14/09/15 22:45, Robert Kern wrote: >>> On 2015-09-14 20:07, Sturla Molden wrote: >>>> On 14/09/15 19:15, M.-A. Lemburg wrote: >>>> >>>>> I am well aware that MT doesn't satisfy all empirical tests >>>>> and also that it is not a CSPRNG >>>> >>>>> However, it has been extensively studied and it is proven to be >>>>> equidistributed which is a key property needed for it to be used as >>>>> basis for other derived probability distributions (as it done by the >>>>> random module). >>>> >>>> And with this criterion, only MT and certain PCG generators are >>>> acceptable. >>>> Those are (to my knowledge) the only ones with proven equidistribution. >>> >>> Do not confuse k-dimensional equidistribution with "equidistribution". >>> The latter property (how uniformly a single draw is distributed) is the >>> one that the derived probability distributions rely upon, not the >>> former. >> >> Yes, there was something fishy about this. k-dimensional equidistribution >> matters if we simulate a k-dimensional tuple, as I understand it. > > Yeah, but we do that every time we draw k numbers in a simulation at all. And we usually draw > millions. In that case, perfect k=623-dimensional equidistribution is not really any better than > k=1, provided that the PRNG is otherwise good. Depends on your use case, but the fact that you can prove it is what really matters - well, at least for me :-) > The requirement for a good PRNG for simulation work is that it be *well* distributed in reasonable > dimensions, not that it be *exactly* equidistributed for some k. And well-distributedness is exactly > what is tested in TestU01. It is essentially a collection of simulations designed to expose known > statistical flaws in PRNGs. So to your earlier question as to which is more damning, failing TestU01 > or not being perfectly 623-dim equidistributed, failing TestU01 is. TestU01 includes tests which PRNGs of the MT type have trouble passing, since they are linear. This makes them poor choices for crypto applications, but does not have much effect on simulations using only a tiny part of the available period. For MT there's an enhanced version called SFMT which performs better in this respect: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/M062821.pdf (the paper also discusses the linear dependencies) See http://www.jstatsoft.org/v50/c01/paper for a discussion of MT vs. SFMT. You can also trick TestU01 to have all tests pass by applying a non-linear transformation (though I don't really see the point in doing this). The WELL family of generators is an newer development, which provides even better characteristics: http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wellrng.pdf Also note that by seeding the MT in Python with truly random data, the shortcomings of MT w/r to having problems escaping "zeroland" (many 0 bits in the seed) are mostly avoided. Anyway, it's been an interesting discussion, but I think it's time to let go :-) Here's a firt cut at an implementation of the idea to use OpenSSL's rand API as basis for an RNG. It even supports monkey patching the random module, though I don't think that's good design. """ RNG based on OpenSSL's rand API. Marc-Andre lemburg, 2015. License: MIT """ # Needs OpenSSL installed: pip install egenix-pyopenssl from OpenSSL import rand import random, struct, binascii # Number of bits in an IEEE float BITS_IN_FLOAT = 53 # Scale to apply to RNG output to make uniform UNIFORM_SCALING = 2 ** -BITS_IN_FLOAT ### Helpers # Unpacker def str2long(value): value_len = len(value) if value_len <= 4: if value_len < 4: value = '\0' * (4 - value_len) + value return struct.unpack('>L', value)[0] elif value_len <= 8: if value_len < 8: value = '\0' * (8 - value_len) + value return struct.unpack('>Q', value)[0] return long(binascii.hexlify(value), 16) ### class OpenSSLRandom(random.Random): """ RNG using the OpenSSL rand API, which provides a cross-platform cryptographically secure RNG. """ def random(self): """ Return a random float from [0.0, 1.0). """ return (str2long(rand.bytes(7)) >> 3) * UNIFORM_SCALING def getrandbits(self, bits): """ Return an integer with the given number of random bits. """ if bits <= 0: raise ValueError('bits must be >0') if bits != int(bits): raise TypeError('bits must be an integer') # Get enough bytes for the requested number of bits numbytes = (bits + 7) // 8 x = str2long(rand.bytes(numbytes)) # Truncate bits, if needed return x >> (numbytes * 8 - bits) def seed(self, value=None): """ Feed entropy to the RNG. value may be None, an integer or a string. If None, 2.5k bytes data are read from /dev/urandom and fed into the RNG. """ if value is None: try: value = random._urandom(2500) entropy = 2500 except NotImplementedError: return if isinstance(value, (int, long)): value = hexlify(value) entropy = len(value) else: value = str(value) entropy = len(value) # Let's be conservative regarding the available entropy in # value rand.add(value, entropy / 2) def _notimplemented(self, *args, **kwds): raise NotImplementedError( 'OpenSSL RNG does not implement this method') getstate = _notimplemented setstate = _notimplemented ### Testing def install_as_default_rng(): """ Monkey patch the random module """ _inst = OpenSSLRandom() random._inst = _inst for attr in ('seed', 'random', 'uniform', 'triangular', 'randint', 'choice', 'randrange', 'sample', 'shuffle', 'normalvariate', 'lognormvariate', 'expovariate', 'vonmisesvariate', 'gammavariate', 'gauss', 'betavariate', 'paretovariate', 'weibullvariate', 'getstate', 'setstate', 'jumpahead', 'getrandbits', ): setattr(random, attr, getattr(_inst, attr)) def _test(): # Install install_as_default_rng() # Now run the random module tests random._test() ### if __name__ == '__main__': _test() -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 14 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 4 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 12 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tjreedy at udel.edu Tue Sep 15 00:31:24 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 14 Sep 2015 18:31:24 -0400 Subject: [Python-ideas] Globally configurable random number generation) In-Reply-To: References: Message-ID: On 9/14/2015 11:04 AM, Serhiy Storchaka wrote: > On 14.09.15 16:32, Nick Coghlan wrote: >> * make random.Random a subclass of SeedableRandom that deprecates >> seed(), getstate() and setstate() An alternate proposal is to initialize the module so that random uses a something more 'secure' than MT. Then... > I would make seed() and setstate() to switch to seedable algorithm. In particular, to MT. Also switch on a getstate() call. > If you don't use seed() or setstate(), it is not important that the > algorithm is changed. If you use seed() or setstate(), you expect > reproducible behavior. There is more than one possible internal implementation. But for any of them, the change should be invisible to callers. (Representations and introspection results would be a different matter.) I understand that the docs currently say that random uses MT. But I wonder if any version of the above could be used in current versions, so as to immediately "upgrade a lot of existing instructions on the internet" and code that follows such instructions. -- Terry Jan Reedy From p.f.moore at gmail.com Tue Sep 15 00:39:25 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Sep 2015 23:39:25 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: (The rest of your emails, I'm going to read fully and digest before responding. Might take a day or so.) On 14 September 2015 at 21:36, Donald Stufft wrote: > I think maybe a problem here is a difference in how we look at the data. It > seems that you might focus on the probability of you personally (or the things > you work on) getting attacked and thus benefiting from these changes, whereas > I, and I suspect the others like me, think about the probability of *anyone* > being attacked. This may be true, in some sense. But I'm not willing to accept that you are thinking about everyone, but I'm somehow selfishly only thinking of myself. If that's what you were implying, then frankly it's a pretty offensive way of disregarding my viewpoint. Knowing you, I'm sure that's *not* how you meant it - but do you see how easy it is for the way you word something to make it nearly impossible for me to see past your wording to get to the actual meaning of what you're trying to say? I didn't even consciously notice the implication myself, at first. I simply started writing a pretty argumentative rebuttal, because I felt that somehow I needed to correct what you said, but I couldn't quite say why. Looking at the reality of what I focus on, I'd say it's more like this. I mistrust arguments that work on the basis that "someone, somewhere, might do X bad thing, therefore we must all pay cost Y". The reasons are complex (and I don't know that I fully understand all of my thought processes here) but some aspects that immediately strike me are: * The probability of X isn't really quantified. I may win the lottery, but I don't quit my job - the probability is low. The probability of X matters. * My experience of the probability of X happening varies wildly from that of whoever's making the point. Who is right? Why must one of us "win" and be right? Can't it simply be that my data implies that over the full data set, the actual probability of X is lower than you thought? * The people paying cost Y are not the cause of, nor are they impacted by, X (except in an abstract "we all suffer if bad things happen" sense). I believe in the general principle of "you pay for what you use", so to me you're arguing for the wrong people to be made to pay. Hopefully, those are relatively objective measures. More subjectively, * It's way too easy to say "if X happens once, we have a problem". If you take the stance that we have to prevent X from *ever* happening, you allow yourself the freedom to argue with vague phrases like "might", while leaving the burden of absolute proofs on me. (In the context of RNG proposals, this is where arguments like "let's implement a secure secret library" get dismissed - they still leave open the possibility of *someone* using an inappropriate RNG, so "they don't solve the issue" - even if they reduce the chance of that happening by a certain amount - and neither you nor I can put a figure on how much, so let's not try). * There's little evidence that I can see of preventative security measures having improved things. Maybe this is because it's an "arms race" situation, and keeping up is all we can hope for. Maybe it's because it's hard to demonstrate a lack of evidence, so the demand for evidence is unreasonable. I don't know. * For many years I ran my PC with no anti-virus software. I never got a virus. Does that prove anything? Probably not. The anti-virus software on my work PC is the source of *far* more issues than I have ever seen caused by a virus. Does *that* prove anything? Again, probably not. But my experience with at least *that* class of pressure to implement security is that the cure is worse than the disease. Where does that leave the burden of proof? Again, I don't know, but my experience should at least be considered as relevant data. * Everyone I have ever encountered in a work context (as opposed to in open-source communities) seems to me to be in a similar situation to mine. I believe I'm speaking for them, but because it's a closed-source in house environment, I've got no public data to back my comments. And totally subjective, * I'm extremely tired of the relentless pressure of "we need to do X, because security". While the various examples of X may all have ended up being essentially of no disadvantage to me, feeling obliged to read, understand, and comment on the arguments presented every time, gets pretty wearing. * I can't think of a single occasion where we *don't* do X. That may well be confirmation bias, but again subjectively, it feels like nobody's listening to the objections. I get that the original proposals get modified, but if never once has the result been "you're right, the cost is too high, we'll not do X" then that puts security-related proposals in a pretty unique position. Finally, in relation to that last point, and one thing I think is a key difference in our thinking. I do *not* believe that security proposals (as opposed to security bug fixes) are different from any other type of proposal. I believe that they should be subject to all the same criteria for acceptance that anything else is. I suspect that you don't agree with that stance, and believe that security proposals should be held to different standards (e.g., a demonstrated *probability* of benefit is sufficient, rather than evidence of actual benefit being needed). But please speak for yourself on this - I'm not trying to put words into your mouth, it's just my impression. All of which is completely unrelated to either the default RNG for the Python stdlib, or whether I understand and/or accept the security arguments presented here (for clarity, I believe I understand them, I just don't accept them). Paul From emile at fenx.com Tue Sep 15 00:50:25 2015 From: emile at fenx.com (Emile van Sebille) Date: Mon, 14 Sep 2015 15:50:25 -0700 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 9/14/2015 3:39 PM, Paul Moore wrote: > * Everyone I have ever encountered in a work context (as opposed to in > open-source communities) seems to me to be in a similar situation to > mine. I believe I'm speaking for them, but because it's a > closed-source in house environment, I've got no public data to back my > comments. You can certainly speak for me. It's much easier to guard the gates than everything inside the walls. Emile From abarnert at yahoo.com Tue Sep 15 01:10:19 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 14 Sep 2015 16:10:19 -0700 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On Sep 14, 2015, at 06:32, Nick Coghlan wrote: > > This is an expansion of the random module enhancement idea I > previously posted to Donald's thread: > https://mail.python.org/pipermail/python-ideas/2015-September/035969.html Since I suggested the set_default_instance and the singleton instances that can be imported in place of the module, I'm obviously happy with those parts. However, I think you still haven't solved the problem with my proposal that you set out to solve. The main difference is that I wanted to deprecate (and eventually make it an error) to use the top-level functions without calling set_default_instance, while you want to allow them and gradually shift the semantics from using the seeded to the seedless PRNG. As I understand it, the reason for this is that you want to make it possible for someone to write "from random import choice", and not get a warning or error telling them they need to call set_default_instance or import one of the singletons instead. But then you're encouraging people to write code that's broken in 3.6 and earlier--and that's also potentially broken in 3.7 if used together with any code that calls set_default_instance (because that can't retroactive fix anything from-imported before the call). So, it takes 18 more months to provide any benefit, and it adds an extra cost. Maybe the suggestion of not allowing set_default_instance to be called more than once and/or after any other functions is sufficient, but I'm not sure that it is. What about this change: replace the three singleton instances with three modules, so we can tell people (and 2to3 and similar mechanical tools) to replace "from random import choice" with "from random.seedless_random import choice"? Would that be acceptable for novices? (And, If so, would that mean we no longer need the set_default_instance and can just flat-out deprecate the top-level functions in random?) If that's not sufficient because the name is too long/too nested, could we just flatten the names out, so it's "from seedless_random import choice", and then the deprecation process is just making random an alias for seeded_random and then switching it to seedless_random later? (I don't think there's any official cross-platform way to alias modules like that, and having to do some ugly sys.modules munging to force them to be the same instance, or using a special module finder just for this case, etc. is obviously ugly, but it may be worth doing anyway.) One nice advantage of this is that it's dead-easy to backport; if I need seeded_random, I can write code that works for 2.6+/3.3+ by just spending on seeded_random from PyPI... From ncoghlan at gmail.com Tue Sep 15 02:04:19 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Sep 2015 10:04:19 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 15 September 2015 at 01:32, Ian Cordasco wrote: > On Mon, Sep 14, 2015 at 10:01 AM, Paul Moore wrote: >> On 14 September 2015 at 14:29, Cory Benfield wrote: >>> Is your argument that there are lots of ways to get security wrong, >>> and for that reason we shouldn't try to fix any of them? >> >> This debate seems to repeatedly degenerate into this type of accusation. >> >> Why is backward compatibility not being taken into account here? To be >> clear, the proposed change *breaks backward compatibility* and while >> that's allowed in 3.6, just because it is allowed, doesn't mean we >> have free rein to break compatibility - any change needs a good >> justification. The arguments presented here are valid up to a point, >> but every time anyone tries to suggest a weak area in the argument, >> the "we should fix security issues" trump card gets pulled out. >> >> For example, as this is a compatibility break, it'll only be allowed >> into 3.6+ (I've not seen anyone suggest that this is sufficiently >> serious to warrant breaking compatibility on older versions). Almost >> all of those SO questions, and google hits, are probably going to be >> referenced by people who are using 2.7, or maybe some version of 3.x >> earlier than 3.6 (at what stage do we allow for the possibility of 3.x >> users who are *not* on the latest release?) So is a solution which >> won't impact most of the people making the mistake, worth it? > > So people who are arguing that the defaults shouldn't be fixed on > Python 2.7 are likely the same people who also argued that PEP 466 was > a terrible, awful, end-of-the-world type change. Yes it broke things > (like eventlet) but the net benefit for users who can get onto Python > 2.7.9 (and later) is immense. They don't even have to get onto 2.7.9 per se - the RHEL 7.2 beta just shipped with Robert Kuska's backport of those changes (minus the eventlet breaking internal API change), so it will also filter out through the RHEL/CentOS ecosystem via 7.x and SCLs. (We also looked at a Python 2.6 backport, but decided it was too much work for not enough benefit - folks really need to just upgrade to RHEL/CentOS 7 already, or at least switch to using Software Collections for their Python runtime needs). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Tue Sep 15 02:14:32 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 20:14:32 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 14, 2015 at 6:39:28 PM, Paul Moore (p.f.moore at gmail.com) wrote: > (The rest of your emails, I'm going to read fully and digest before > responding. Might take a day or so.) > > On 14 September 2015 at 21:36, Donald Stufft wrote: > > I think maybe a problem here is a difference in how we look at the data. It > > seems that you might focus on the probability of you personally (or the things > > you work on) getting attacked and thus benefiting from these changes, whereas > > I, and I suspect the others like me, think about the probability of *anyone* > > being attacked. > > This may be true, in some sense. But I'm not willing to accept that > you are thinking about everyone, but I'm somehow selfishly only > thinking of myself. If that's what you were implying, then frankly > it's a pretty offensive way of disregarding my viewpoint. Knowing you, > I'm sure that's *not* how you meant it - but do you see how easy it is > for the way you word something to make it nearly impossible for me to > see past your wording to get to the actual meaning of what you're > trying to say? I didn't even consciously notice the implication > myself, at first. I simply started writing a pretty argumentative > rebuttal, because I felt that somehow I needed to correct what you > said, but I couldn't quite say why. No, I don?t mean it in the way of you being selfish. I'm not quite sure the right wording here, essentially the probably of an event happening to a particular indivdual vs the probablity of an event occuring at all. To use your lottery example, I *think*, and perhaps I'm wrong, that you're looking at it in terms of, the chance of any particular person participating in the lottery winning the lottery is low, so why should each of these people, as an invidual make plans for how to get the money when they win the lottery, because as inviduals they are unlikely to win. Whereas I flip it around and think, that someone, somewhere is likely going to win the lottery, so the lottery system should make plans for how to get them the money when they win. I'm not sure the right "name" for each type, and I don't want to continue to try and hamfist it, because I don't mean it in an offensive or an "I'm better than you" way and I fear putting my foot in my mouth again :( > > Looking at the reality of what I focus on, I'd say it's more like > this. I mistrust arguments that work on the basis that "someone, > somewhere, might do X bad thing, therefore we must all pay cost Y". > The reasons are complex (and I don't know that I fully understand all > of my thought processes here) but some aspects that immediately strike > me are: > > * The probability of X isn't really quantified. I may win the lottery, > but I don't quit my job - the probability is low. The probability of X > matters. > * My experience of the probability of X happening varies wildly from > that of whoever's making the point. Who is right? Why must one of us > "win" and be right? Can't it simply be that my data implies that over > the full data set, the actual probability of X is lower than you > thought? > * The people paying cost Y are not the cause of, nor are they impacted > by, X (except in an abstract "we all suffer if bad things happen" > sense). I believe in the general principle of "you pay for what you > use", so to me you're arguing for the wrong people to be made to pay. > > Hopefully, those are relatively objective measures. More subjectively, > > * It's way too easy to say "if X happens once, we have a problem". If > you take the stance that we have to prevent X from *ever* happening, > you allow yourself the freedom to argue with vague phrases like > "might", while leaving the burden of absolute proofs on me. (In the > context of RNG proposals, this is where arguments like "let's > implement a secure secret library" get dismissed - they still leave > open the possibility of *someone* using an inappropriate RNG, so "they > don't solve the issue" - even if they reduce the chance of that > happening by a certain amount - and neither you nor I can put a figure > on how much, so let's not try). Just to be clear, I don?t think that "If X happens once, it's a problem" is a reasonable belief and I don't personally have that belief. It's a sliding scale where we need to figure out where the right solution for Python is for each particular problem. I certainly wouldn't want to use a language that took the approach that if X can ever happen, we need to prevent X. I have seen a number of users incorrectly use the random.py module to where I think that the danger is "real". I also think that, if this were a brand new module, it would be a no brainer (but perhaps I'm wrong) for the default, module level to have a safe by default API. Going off that assumption then I think the question is really just "Is it worth it?" not "does this make more sense then the current?". > * There's little evidence that I can see of preventative security > measures having improved things. Maybe this is because it's an "arms > race" situation, and keeping up is all we can hope for. Maybe it's > because it's hard to demonstrate a lack of evidence, so the demand for > evidence is unreasonable. I don't know. By preventive security measures, do you mean things like PEP 466? I don't quite know how to accurately state it, but I'm certain that PEP 466 directly improved the security of the entire internet (and continues to do so as it propagates). > * For many years I ran my PC with no anti-virus software. I never got > a virus. Does that prove anything? Probably not. The anti-virus > software on my work PC is the source of *far* more issues than I have > ever seen caused by a virus. Does *that* prove anything? Again, > probably not. But my experience with at least *that* class of pressure > to implement security is that the cure is worse than the disease. > Where does that leave the burden of proof? Again, I don't know, but my > experience should at least be considered as relevant data. Antivirus is a particularly bad example of security software :/ It's a massive failing of the security industry that they exist in the state they do. There's a certain bias here though, because it is the job of security sensitive code to "break" things (as in, take otherwise valid input and make it not work). In an ideal world, security software just sits there doing "nothing" from the POV of someone who isn't a security engineer and then will, often through no fault of their own, pop and and make things go kabloom because it detected something insecure happening. This means that for most people, the only interaction they have with something designed to protect them, is when it steps in to make things stop working. It is relevant data, but I think it goes back to the different way of looking at things (what is the individual chance of an event happening, vs the chance of an event happening across the entire population). This might also be why you'll see the backwards compat folks focus more on experienced driven data and security folks focus more on hypotheticals about what could happen. > * Everyone I have ever encountered in a work context (as opposed to in > open-source communities) seems to me to be in a similar situation to > mine. I believe I'm speaking for them, but because it's a > closed-source in house environment, I've got no public data to back my > comments. > > And totally subjective, > > * I'm extremely tired of the relentless pressure of "we need to do X, > because security". While the various examples of X may all have ended > up being essentially of no disadvantage to me, feeling obliged to > read, understand, and comment on the arguments presented every time, > gets pretty wearing. I'm not sure what to do about this :( On one side, you're not obligated to read, understand, and comment on every thing that's raised but I totally understand why you do, because I do too, but I'm not sure how to help this without saying that people who care about security shouldn't bring it up either? > * I can't think of a single occasion where we *don't* do X. That may > well be confirmation bias, but again subjectively, it feels like > nobody's listening to the objections. I get that the original > proposals get modified, but if never once has the result been "you're > right, the cost is too high, we'll not do X" then that puts > security-related proposals in a pretty unique position. Off the top of my head I remember the on by default hash randomization for Python 2.x (or the actual secure hash randomization since 2.x still has the one that is trivial to recover the original seed). I don't actually remember that many cases where python-dev choose to broke backwards compatability for security. The only ones I can think of are: * The hash randomization on Python 3.x (sort of? Only if you depended on dict ? ordering, which wasn't a guarentee anyways). * The HTTPS improvements where we switched Python to default to default to ? verifying certificates. * The backports of several security features to 2.7 (backport of 3.4's ssl ? module, hmac.compare_digest, os.urandom's persistent FD, hashlib.pbkdf2_hmac, ? hashlib.algorithms_guaranteed, hashlib.algorithms_available). There are probably things that I'm not thinking of, but the hash randomization only broke things if you were depending on dict/set having ordering which isn't a promised property of dict/set. The backports of security features was done in a pretty minimally invasive way where it would (ideally) only break things if you relied on those names *not* existing on Python 2.7 (which was a nonzero but small set). The HTTPS verification is the main thing I can think of where python-dev actually broke backwards compatibility in an obvious way for people relying on something that was documented to work a particular way. Are there example I'm not remembering (probably!)? It doesn't feel like 2 sort of backwards incompatible changes and 1 backwards incompatible change in the lifetime of Python is really that much to me? Is there some cross over between distutils-sig maybe? I've focused a lot more on pushing security on that side of things both because it personally affects me more and because I think insecure defaults there are a lot worse than insecure defaults in any particular module in the Python standard library. > > Finally, in relation to that last point, and one thing I think is a > key difference in our thinking. I do *not* believe that security > proposals (as opposed to security bug fixes) are different from any > other type of proposal. I believe that they should be subject to all > the same criteria for acceptance that anything else is. I suspect that > you don't agree with that stance, and believe that security proposals > should be held to different standards (e.g., a demonstrated > *probability* of benefit is sufficient, rather than evidence of actual > benefit being needed). But please speak for yourself on this - I'm not > trying to put words into your mouth, it's just my impression. Well, I think that all proposals are based on what the probability is it's going to help some particular percentage of people, and whether it's going to help enough people to be worth the cost. What I think is special about security is the cost of *not* doing something. Security "fails open" in that if someone does something insecure, it's not going to raise an exception or give different results or something like that. It's going to appear to "work" (in that you get the results you expect) while the user is silently insecure. Compare this to, well let's pretend that there was never a deterministic RNG in the standard library. If a scientist or a game designer inappropiately used random.py they'd pretty quickly learn that they couldn't give the RNG a seed, and that even if it was a CSPRNG that had an "add_seed" method that might confuse them it'd be pretty obvious on the second execution of their program that it's giving them a different result. I think that the bar *should* be lower for something that just silently or subtlety does the "wrong" thing vs something that obviously and loudly does the wrong thing. Particularly when the downside of doing the "wrong" thing is as potentionally disasterous as it is with security. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From donald at stufft.io Tue Sep 15 02:18:50 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 14 Sep 2015 20:18:50 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 14, 2015 at 8:14:33 PM, Donald Stufft (donald at stufft.io) wrote: > > Security "fails open" in that if someone does something insecure, > it's not > going to raise an exception or give different results or something > like that. This should read: Security "fails open" in that if someone uses an API that allows something insecure to happen (like not validating HTTPS) it's not going to raise an exception or give different results or something like that. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From ncoghlan at gmail.com Tue Sep 15 02:36:52 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Sep 2015 10:36:52 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 15 September 2015 at 01:01, Paul Moore wrote: > I fully expect the response to this to be "just because it'll take > time, doesn't mean we should do nothing". Or "even if it just fixes it > for one or two people, it's still worth it". This may be at the core of the disagreement, as we're not talking "one or two people", we're talking tens of millions. While wearing my "PSF Director" hat, I spend a lot of time talking to professional educators, and recently organised the first "Python in Education" miniconf at PyCon Australia. If you look at the inroads we're making across primary, secondary and tertiary education, as well as through workshops like Software Carpentry and DjangoGirls, a *lot* of people around the world are going to be introduced to text based programming over the coming decades by way of Python. That level of success brings with it a commensurate level of responsibility: if we're setting those students up for future security failures, that's *on us* as language designers, not on them for failing to learn to avoid traps we've accidentally laid for them (because *we* previously didn't know any better). Switching back to my "security wonk" hat, the historical approach to computer security has been "secure settings are opt in, so only qualified experts should be allowed to write security sensitive software". What we've learned as an industry (the hard way) is that this approach *doesn't work*. The main reason it doesn't work is the one that was part of the rationale for the HTTPS changes in PEP 476: when security failures are silent by default, you generally don't find out that you forgot to flip the "I need this to be secure" switch until *after* the system you're responsible for has been compromised (with whatever consequences that may have for your users). The law of large numbers then tells us that even if (for example) only 1 in 1000 people forget to flip the "be secure" switch when they needed it (or don't even know that the switch *exists*), it's a practical certainty that when you have millions of programmers using your language (and you don't climb to near the top of the IEEE rankings without that), you're going to be hitting that failure mode regularly as a collective group. We have the power to mitigate that harm permanently *just by changing the default behaviour of the random module*. However, that has a cost: it causes problems for some current users for the sake of better serving future users. That's what transition strategy design is about, and I'll take that up in the other thread. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Sep 15 03:07:46 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Sep 2015 11:07:46 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 15 September 2015 at 08:50, Emile van Sebille wrote: > On 9/14/2015 3:39 PM, Paul Moore wrote: >> >> * Everyone I have ever encountered in a work context (as opposed to in >> open-source communities) seems to me to be in a similar situation to >> mine. I believe I'm speaking for them, but because it's a >> closed-source in house environment, I've got no public data to back my >> comments. > > You can certainly speak for me. It's much easier to guard the gates than > everything inside the walls. Historically, yes, but relying solely on perimeter defence is becoming less and less viable as the workforce decentralises, and we see more people using personal devices and untrusted networks to connect to work systems (whether that's their home network or the local coffee shop), as well as relying on public web services rather than internal applications. Enterprise IT is simply *wrong* in the way we currently go about a lot of things, and the public web service sector is showing us all how to do it right. Facilitating that transition is a key part of my day job in Red Hat's Developer Experience team (it's getting a bit off topic, but for a high level company perspective on that: http://www.redhat-cloudstrategy.com/towards-a-frictionless-it-whether-you-like-it-or-not/). And for folks tempted to think "this is just about the web", for a non-web related example of what we as an industry have unleashed through our historical "security is optional" mindset: http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/ That's an article on remotely hacking the UConnect system in a Jeep Cherokee to control all sorts of systems that had no business being connected to the internet in the first place. The number of SCADA industrial control systems accessible through the internet is frankly terrifying - one of the reasons we can comfortably assume most humans are either nice or lazy is because we *don't* see most of the vulnerabilities that are lying around being exploited. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Sep 15 04:07:38 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Sep 2015 12:07:38 +1000 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On 15 September 2015 at 09:10, Andrew Barnert wrote: > On Sep 14, 2015, at 06:32, Nick Coghlan wrote: >> >> This is an expansion of the random module enhancement idea I >> previously posted to Donald's thread: >> https://mail.python.org/pipermail/python-ideas/2015-September/035969.html > > Since I suggested the set_default_instance and the singleton instances that can be imported in place of the module, I'm obviously happy with those parts. > > However, I think you still haven't solved the problem with my proposal that you set out to solve. > > The main difference is that I wanted to deprecate (and eventually make it an error) to use the top-level functions without calling set_default_instance, while you want to allow them and gradually shift the semantics from using the seeded to the seedless PRNG. > > As I understand it, the reason for this is that you want to make it possible for someone to write "from random import choice", and not get a warning or error telling them they need to call set_default_instance or import one of the singletons instead. > > But then you're encouraging people to write code that's broken in 3.6 and earlier--and that's also potentially broken in 3.7 if used together with any code that calls set_default_instance (because that can't retroactive fix anything from-imported before the call). So, it takes 18 more months to provide any benefit, and it adds an extra cost. This entire problem is one that I put in the "fix it eventually" category, rather than "fix it urgently" - folks really are better off learning to use things like cryptography.io for security sensitive software, so this change is just about harm mitigation given that it's inevitable that a non-trivial proportion of the millions of current and future Python developers won't do that. Since there's really only one transition I want to enable (seedable -> seedless as the default RNG), I now think the "switch implicitly as needed" is a better idea than a permanent support API for switching the default instance - I'd just add a deprecation warning to that behaviour, with the intent of removing it some time after 2.7 goes EOL. I also realised based on Paul's comments that we really do want "random.seedable" and "random.seedless" submodules, since that allows proper interaction with the import system in constructs like "from random.seedable import randint" That would make the proposed change for Python 3.6: * add a random.SeedlessRandom API that omits the seed(), getstate() and setstate() methods and uses a cryptographically secure PRNG internally (such as the ChaCha20 algorithm implemented by OpenBSD) * deprecate the seed(), getstate() and setstate() methods on SystemRandom * convert random to a pseudo-module with "seedless", "seedable" and "system" submodules (keeping most code in __init__ for easy pickle compatibility) * these would each work like the current top-level random module - a default instance, with bound methods as module level callables * random._inst would be an alias for random.seedless._inst by default * the top level random functions would change to be functions lazily looking up methods on random._inst, rather than bound methods * if you call the module level seed(), getstate(), or setstate() methods, and random._inst is set to random.seedless._inst, it will issue a deprecation warning recommending the direct use of "random.seedable" and switch random._inst to refer to random.seedable._inst instead Compared to my original proposal, the seedable MT RNG retains the random.Random name, so any code already using explicit instances is entirely unaffected by the proposed change. This means the only code that will receive a deprecation warning is code calling the module level seed(), getstate() and setstate() functions, and that warning will just recommend importing "random.seedable" rather than importing "random". The API used to replace the default instance at runtime for backwards compatibility purposes becomes private rather than public, so we only need to support our specific reasons for doing that, rather than supporting it as a general feature. Future security audits would focus on the use of the module "seed()", "getstate()" and "setstate()" functions (since they'd trigger the deterministic RNG process wide), and it would also still be encouraged to use random.SystemRandom() or os.urandom() for security sensitive use cases (since that's both version independent, and immune to other modules modifying the active default RNG). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From random832 at fastmail.com Tue Sep 15 04:30:28 2015 From: random832 at fastmail.com (Random832) Date: Mon, 14 Sep 2015 22:30:28 -0400 Subject: [Python-ideas] Globally configurable random number generation References: Message-ID: Nick Coghlan writes: > Compared to my original proposal, the seedable MT RNG retains the > random.Random name, so any code already using explicit instances is > entirely unaffected by the proposed change. So, if you use random.Random() without seeding, you still get "MT seeded from os.urandom"? From ncoghlan at gmail.com Tue Sep 15 04:39:38 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Sep 2015 12:39:38 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 15 September 2015 at 08:39, Paul Moore wrote: > * I can't think of a single occasion where we *don't* do X. That may > well be confirmation bias, but again subjectively, it feels like > nobody's listening to the objections. I get that the original > proposals get modified, but if never once has the result been "you're > right, the cost is too high, we'll not do X" then that puts > security-related proposals in a pretty unique position. Most of the time, when the cost of change is clearly too high, we simply *don't ask*. hmac.compare_digest() is an example of that, where having a time-constant comparison operation readily available in the standard library is important from a security perspective, but having standard equality comparisons be as fast as possible is obviously more important from a language design perspective. Historically, it was taken for granted that backwards compatibility concerns would always take precedence over improving security defaults, but the never-ending cascade of data breaches involving personally identifiable information are proving that we, as a collective industry are *doing something wrong*: http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/ A lot of the problems we need to address are operational ones as we upgrade the industry from a "perimiter defence" mindset to a "defence in depth" mindset, and hence we have things like continuous integration, continuous deployment, application and service sandboxing, containerisation, infrastructure-as-code, immutable infrastructure, etc, etc, etc. That side of things is mostly being driven by infrastructure software vendors (whether established ones or startups), where we have the fortunate situation that the security benefits are tied in together with a range of operational efficiency and capability benefits [1]. However, there's also increasing recognition that some of the problems are due to the default behaviours of the programming languages we use to *create* applications, and in particular the fact that many security issues involve silent failure modes. Sometimes the right answer to those is to turn the silent failure into a noisy failure (as with certificate verification in PEP 476), other times it is about turning the silent failure into a silent success (as is being proposed for the random module API), and yet other times it is simply about lowering the barriers to someone doing the right thing once they're alerted to the problem (as with the introduction of hmac.compare_digest() and ssl.create_default_context(), and their backports to the Python 2.7 series) At a lower level, languages like Go and Rust are challenging some of the assumptions in the still dominant C-based memory management model for systems programming. Rust in particular is interesting in that it has a much richer compile time enforced concept of memory ownership than C does, while still aiming to keep the necessary runtime support very light. Regards, Nick. [1] For folks wanting more background on some of the factors this shift, I highly recommend Google's "BeyondCorp" research paper: http://research.google.com/pubs/pub43231.html -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Sep 15 05:22:27 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Sep 2015 13:22:27 +1000 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: On 15 September 2015 at 12:30, Random832 wrote: > Nick Coghlan writes: >> Compared to my original proposal, the seedable MT RNG retains the >> random.Random name, so any code already using explicit instances is >> entirely unaffected by the proposed change. > > So, if you use random.Random() without seeding, you still get "MT seeded > from os.urandom"? Yes, with the revised proposal, only the module level functions would change their behaviour to use a CSPRNG by default. If you trawl the various cryptographically unsound password generation recipes, they're almost all using the module level functions, so changing the meaning of random.Random itself would add a lot of additional pain for next to no gain. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Sep 15 05:53:36 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 15 Sep 2015 13:53:36 +1000 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> Message-ID: <20150915035334.GF31152@ando.pearwood.info> On Mon, Sep 14, 2015 at 10:19:09PM +0100, Robert Kern wrote: > The requirement for a good PRNG for simulation work is that it be *well* > distributed in reasonable dimensions, not that it be *exactly* > equidistributed for some k. And well-distributedness is exactly what is > tested in TestU01. It is essentially a collection of simulations designed > to expose known statistical flaws in PRNGs. So to your earlier question as > to which is more damning, failing TestU01 or not being perfectly 623-dim > equidistributed, failing TestU01 is. I'm confused here. Isn't "well-distributed" a less-strict test than "exactly equidistributed"? MT is (almost) exactly k-equidistributed up to k = 623, correct? So how does it fail the "well-distributed" test? -- Steve From abarnert at yahoo.com Tue Sep 15 06:03:05 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 14 Sep 2015 21:03:05 -0700 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: Message-ID: <305B13C9-BA39-4133-8BDC-794E82EBF254@yahoo.com> On Sep 14, 2015, at 20:22, Nick Coghlan wrote: > >> On 15 September 2015 at 12:30, Random832 wrote: >> Nick Coghlan writes: >>> Compared to my original proposal, the seedable MT RNG retains the >>> random.Random name, so any code already using explicit instances is >>> entirely unaffected by the proposed change. >> >> So, if you use random.Random() without seeding, you still get "MT seeded >> from os.urandom"? > > Yes, with the revised proposal, only the module level functions would > change their behaviour to use a CSPRNG by default. If you trawl the > various cryptographically unsound password generation recipes, they're > almost all using the module level functions, so changing the meaning > of random.Random itself would add a lot of additional pain for next to > no gain. That definitely makes things simpler. The only part of my set_default_instance patch that was at all tricky was how to make sure Random instances worked the same as top-level functions, but still providing a way to explicitly select one (hence renaming the base class to DeterministicRandom, making a new subclass UnsafeRandom that subclasses it and added the warning, and making both Random and the top-level functions point at that). If we don't need that, then your simpler solution makes more sense. Also, while I'm not 100% sold on the auto-switching and the delegate-at-call-time wrappers, I'll play with them and see, and if they do work, then you're definitely right that your second version does solve your problem with my proposal, so it doesn't matter whether your first version did anymore. First, on delegating top-level function: have you tested the performance? Is MT so slow that an extra lookup and function call don't matter? One quick thought on auto-switching vs. explicitly setting the instance before any functions have been called: if I get you to install a plugin that calls random.seed(), I've now changed your app to use seeded random numbers. And it might even still pass security tests, because it doesn't switch until someone hits some API that activates the plugin. Is that a realistic danger for any realistic apps? If so, doesn't that potentially make 3.6 more dangerous than 3.5? For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random. The fact that many apps will never actually issue a deprecation warning or any other signal that anything is changing may be leaning over too far toward convenience. I realize the benefit of having books and course materials written for 3.4 continue to work in 3.8, but really, if those books are giving people bad ideas, removing any incentive for anyone to change the next edition may not be a good idea. And finally: it _seems like_ people who want MT for simulation/game/science stuff will have a pretty easy time finding the migration path, but I'm having a really hard time coming up with a convincing argument. Does anyone have a handful of science guys they can hack up a system for and test them empirically? Because if you can establish that fact, I think the naysayers have very little reason left to say nay, and a consensus would surely be better than having that horribly contentious thread end with "too bad, overruled, the PEP has been accepted". From stephen at xemacs.org Tue Sep 15 06:34:44 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 15 Sep 2015 13:34:44 +0900 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: <1442245930.209341.383250457.5F839815@webmail.messagingengine.com> References: <1442240160.186432.383146609.471F1B7D@webmail.messagingengine.com> <1442241905.192420.383177025.382A3D7E@webmail.messagingengine.com> <1442245930.209341.383250457.5F839815@webmail.messagingengine.com> Message-ID: <871te0z4ez.fsf@uwakimon.sk.tsukuba.ac.jp> Random832 writes: > Who is doing scientific computing but not using the seeding functions? People whose papers should be rejected. From ncoghlan at gmail.com Tue Sep 15 06:53:48 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Sep 2015 14:53:48 +1000 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: <305B13C9-BA39-4133-8BDC-794E82EBF254@yahoo.com> References: <305B13C9-BA39-4133-8BDC-794E82EBF254@yahoo.com> Message-ID: On 15 September 2015 at 14:03, Andrew Barnert wrote: > Also, while I'm not 100% sold on the auto-switching and the delegate-at-call-time wrappers, I'll play with them and see, and if they do work, then you're definitely right that your second version does solve your problem with my proposal, so it doesn't matter whether your first version did anymore. > > First, on delegating top-level function: have you tested the performance? Is MT so slow that an extra lookup and function call don't matter? If folks are in a situation where the performance impact of the additional layer of indirection is a problem, they can switch to using random.Random explicitly, or import from random.seedable rather than the top level random module. > One quick thought on auto-switching vs. explicitly setting the instance before any functions have been called: if I get you to install a plugin that calls random.seed(), I've now changed your app to use seeded random numbers. And it might even still pass security tests, because it doesn't switch until someone hits some API that activates the plugin. Is that a realistic danger for any realistic apps? If so, doesn't that potentially make 3.6 more dangerous than 3.5? This isn't an applicable concern, as we already provide zero runtime protections against hostile monkeypatching of other modules (by design choice). You can subvert even os.urandom in a hostile plugin: def not_random(num_bytes): return b'A' * num_bytes import os os.urandom = not_random Once "potentially hostile code running in the current process" is part of your threat model, CPython is out of the running, and even PyPy's sandboxing capabilities rely on running the potentially hostile code in a separate process. IronPython and Jython can rely on CLR/JVM sandboxing, but that's still a case of delegating the problem to the host platform, rather than trying to solve it at the Python level. > For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random. The fact that many apps will never actually issue a deprecation warning or any other signal that anything is changing may be leaning over too far toward convenience. I realize the benefit of having books and course materials written for 3.4 continue to work in 3.8, but really, if those books are giving people bad ideas, removing any incentive for anyone to change the next edition may not be a good idea. Forcing people to make choices they're ill-equipped to make just because we think they "should" know enough to make a wise decision is one of the leading causes of user hostile software (consider the respective user experiences of a HTTP site and a HTTPS site with a self-signed certificate). People are busy, and life is full of decisions that need to be made where there's no good default, so when we're able to deliver a good default that fails *noisily* when it's the wrong answer, that's what we should be aiming for. > And finally: it _seems like_ people who want MT for simulation/game/science stuff will have a pretty easy time finding the migration path, but I'm having a really hard time coming up with a convincing argument. Does anyone have a handful of science guys they can hack up a system for and test them empirically? Because if you can establish that fact, I think the naysayers have very little reason left to say nay, and a consensus would surely be better than having that horribly contentious thread end with "too bad, overruled, the PEP has been accepted". Given the general lack of investment in sustaining engineering for scientific software, I think the naysayers are right on that front, which is why I switched my proposal to give them a transparent upgrade path - I was originally thinking primarily of the educational and gaming use cases, and hadn't considered randomised simulations in the scientific realm. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tim.peters at gmail.com Tue Sep 15 07:49:15 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 15 Sep 2015 00:49:15 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Tim] >> And yet nobody so far has a produced a single example of any harm done >> in any of the near-countless languages that supply non-crypto RNGs. I >> know, my lawyer gets annoyed too when I point out there hasn't been a >> nuclear war ;-) [Nathaniel Smith ] > Here you go: > https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf The most important question I have about that is from its Appendix D, where they try to give a secure "all-purpose" token generator. If openssl_random_pseudo_bytes is available, they use just that and call it done. Otherwise they go on to all sorts of stuff. My question is, even if /dev/urandom is available, they're _not_ content to use that.alone. They continue to mix it up with all other kinds of silly stuff. So why do they trust urandom less than OpenSLL's gimmick? It's important to me because, far as I'm concerned, os.urandom() is already Python's way to spell "crypto random" (yes, it would be better to have one guaranteed to be available on all Python platforms). > They present real-world attacks on PHP applications that use something > like the "password generation" code we've been talking about as a way > to generate cookies and password reset nonces, including in particular > the case of applications that use a strongly-seeded Mersenne Twister > as their RNG: I couldn't help but note ;-) that at least 3 of the apps had already attempted to repair bug reports filed against their insecure password-reset schemes at least 5 years ago. You can lead a PHP'er to water, but ... ;-) > ... > "Section 5.3: ... In this section we give a description of the > Mersenne Twister generator and present an algorithm that allows the > recovery of the internal state of the generator even when the output > is truncated. Our algorithm also works in the presence of non > consecutive outputs ..." It's cute, but I doubt anyone but the authors had the patience - or knowledge - to write a solver dealing with tens of thousands of picky equations over about 20000 variables. They don't explain enough about the details for a script kiddie to do it. Even very bright hackers attack what's easiest to topple, like poor seeding - that's just easy brute force. It remained unclear to me _exactly_ what "in the presence of non consecutive outputs" is supposed to mean. In the only examples, they knew exactly how many times MT was called. "Non consecutive" in all those contexts appeared to mean "but we couldn't observe_any_ output bits in some cases - the ones we could know something about were sometimes non-consecutive". So in the MT output sequence, they had no knowledge of _some_ of the outputs, but they nevertheless knew exactly _which_ of the outputs they were wholly ignorant about. That's no problem for the equation solver. They just skip adding any equations for the affected bits, keep collecting more outputs and potentially "wrap around", probably leading to an overdetermined system in the end. But Python doesn't work the way PHP does here. As explained in another message, in Python you can have _no idea_ how many MT outputs are consumed by a single .choice() call. In the PHP equivalent, you always consume exactly one MT output. PHP's method suffers statistical bias, but under the covers Python uses an accept/reject method to avoid that. Any number of MT outputs may be (invisibly!) consumed before "accept" is reached, although typically only one or two. You can deduce some of the leading MT output bits from the .choice() result, but _only_ for the single MT output .choice() reveals anything about. About the other MT outputs it may consume, you can't even know that some _were_ skipped over, let alone how many. Best I can tell, that makes a huge difference to whether their solver is even applicable to cracking idiomatic "password generators" in Python. You can't know which variables correspond to the bits you can deduce. You could split the solver into multiple instances to cover all feasible possibilities (for how many MT outputs may have been invisibly consumed), but the number of solver instances needed then grows exponentially with the number of outputs you do see something about. In the worst case (31 bits are truncated), they need over 19000 outputs to deduce the state. Even a wildly optimistic "well, let's guess no more than 1 MT output was invisibly rejected each time" leads to over 2**19000 solver clones then. Sure, there's doubtless a far cleverer way to approach that. But unless another group of PhDs looking to level up in Security World burns their grant money to tackle it, that's yet another attack that will never be seen in the real world ;-) > Out of curiosity, I tried searching github for "random cookie > language:python". The 5th hit (out of ~100k) was a web project that > appears to use this insecure method to generate session cookies: > https://github.com/bytasv/bbapi/blob/34e294becb22bae6e685f2e742b7ffdb53a83bcb/bbapi/utils/cookie.py > https://github.com/bytasv/bbapi/blob/34e294becb22bae6e685f2e742b7ffdb53a83bcb/bbapi/api/router.py#L56-L66 And they all use .choice(), which is overwhelmingly the most natural way to do this kind of thing in Python. > ... > There's a reason security people are so Manichean about these kinds of > things. If something is not intended to be secure or used in > security-sensitive ways, then fine, no worries. But if it is, then > there's no point in trying to mess around with "probably mostly > secure" -- either solve the problem right or don't bother. (See also: > the time Python wasted trying to solve hash randomization without > actually solving hash randomization [1].) If Tim Peters can get fooled > into thinking something like using MT to generate session ids is > "probably mostly secure", then what chance do the rest of us have? > As above, I'm still not much worried about .choice(). Even if I were, I'd be happy to leave it at "use .choice() from a random.SystemRandom instance instead". Unless there's some non-obvious (to me) reason these authors appear to be unhappy with urandom. > NB this isn't an argument for *whether* we should make random > cryptographically strong by default; it's just an argument against > wasting time debating whether it's already "secure enough". It's not > secure. Maybe that's okay, maybe it's not. _I_ would use SystemRandom. But, as you can tell, I'm extremely paranoid ;-) > For the record though I do tend to agree with the idea that it's not > okay, because it's an increasingly hostile world out there, and > secure-random-by-default makes whole classes of these issues just > disappear. It's not often that you get to fix thousands of bugs with > one commit, including at least some with severity level "all your > users' private data just got uploaded to bittorrent". > > I like Nick's proposal here: > https://code.activestate.com/lists/python-ideas/35842/ > as probably the most solid strategy for implementing that idea -- the > only projects that would be negatively affected are those that are > using the seeding functionality of the global random API, which is a > tiny fraction, and the effect on those projects is that they get > nudged into using the superior object-oriented API. Have you released software used by millions of people? If not, you have no idea how ticked off users get by needing to change anything. But Guido does ;-) Why not add a new "saferandom" module and leave it at that? Encourage newbies to use it. Nobody's old code ever breaks. But nobody's old code is saved from problems it likely didn't have anyway ;-) > -n > > [1] https://lwn.net/Articles/574761/ From njs at pobox.com Tue Sep 15 09:36:09 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 15 Sep 2015 00:36:09 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150915035334.GF31152@ando.pearwood.info> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> Message-ID: On Mon, Sep 14, 2015 at 8:53 PM, Steven D'Aprano wrote: > On Mon, Sep 14, 2015 at 10:19:09PM +0100, Robert Kern wrote: > >> The requirement for a good PRNG for simulation work is that it be *well* >> distributed in reasonable dimensions, not that it be *exactly* >> equidistributed for some k. And well-distributedness is exactly what is >> tested in TestU01. It is essentially a collection of simulations designed >> to expose known statistical flaws in PRNGs. So to your earlier question as >> to which is more damning, failing TestU01 or not being perfectly 623-dim >> equidistributed, failing TestU01 is. > > I'm confused here. Isn't "well-distributed" a less-strict test than > "exactly equidistributed"? MT is (almost) exactly k-equidistributed up > to k = 623, correct? So how does it fail the "well-distributed" test? No, "well-distributed" means "distributed like true random stream", which is a strong and somewhat fuzzy requirement. "k-equidistributed" is one particular way of operationalizing this requirement, but it's not a very good one. The idea of k-equidistribution is historically important because it spurred the development of generators that avoided the really terrible flaws of early designs like Unix rand, but it's not terribly relevant to modern RNG design. Here's how it works. Formally speaking, a randomized algorithm or Monte Carlo simulation or similar can be understood as a function mapping an infinite bitstring to some output value, F : R -> O. If we sample infinite bitstrings R uniformly at random (using a theoretical "true" random number generator), and then apply F to each bitstring, then this produces some probability distribution over output values O. Now given some *pseudo* random number generator, we consider our generator to be successful if that repeatedly running F(sample from this RNG) gives us the same distribution over output values as if we had repeatedly run F(true random sample). So for example, you could have a function F that counts up how many times it sees a zero or a one in the first 20,000 entries in the bitstring, and you expect those numbers to come up at ~10,000 each with some distribution around that. If your RNG is such that you reproduce that distribution, then you pass this function. Note that this is a little counterintuitive: if your RNG is such that over the first 20,000 entries it always produces *exactly* 10,000 zeros and 10,000 ones, then it fails this test. The Mersenne Twister will pass this test. Or you could have a function F that takes the first 19937 bits from the random stream, uses it to construct a model of the internal state of a Mersenne Twister, predicts the next 1000 bits, and returns True if they match and False if they don't match. On a real random stream this function returns True with probability 2^-1000; on a MT random stream it returns True with probability 1. So the MT fails this test. OTOH this is obviously a pretty artificial example. The only case the scientists actually care about is the one where F is "this simulation right here that I'm trying to run before the conference deadline". But since scientists don't really want to design a new RNG for every simulation, we instead try to design our RNGs such that for all "likely" or "reasonable" functions F, they'll probably work ok. In practice this means that we write down a bunch of explicit test functions F inside a test battery like TestU01, run the functions on the RNG stream, and if their output is indistinguishable in distribution from what it would be for a true random stream then we say they pass. And we hope that this will probably generalize to the simulation we actually care about. Cryptographers are worried about the exact same issue -- they want RNGs that have the property that for all functions F, the behavior is indistinguishable from true randomness. But unlike the scientists, they're not content to say "eh, I checked a few functions and it seemed to work on those, probably the ones I actually care about are okay too". The cryptographers consider it a failure if an adversary with arbitrary computing power, full knowledge of the RNG algorithm, plus other magic powers like the ability to influence the RNG seeding, can invent *any* function F that acts differently (produces a different distribution over outputs) when fed the input from the RNG as compared to a true random stream. The only rule for them is that the function has to be one that you can actually implement on a computer that masses, like, less than Jupiter, and only has 1000 years to run. And as far as we can tell, modern crypto RNGs succeed at this very well. Obviously the thing the scientists worry about is a *strict* subset of what the cryptographers are worried about. This is why it is silly to worry that a crypto RNG will cause problems for a scientific simulation. The cryptographers take the scientists' real goal -- the correctness of arbitrary programs like e.g. a monte carlo simulation -- *much* more seriously than the scientists themselves do. (This is because scientists need RNGs to do their real work, whereas for cryptographers RNGs are their real work.) Compared to this, k-dimensional equidistribution is a red herring: it requires that you have a RNG that repeats itself after a while, and within each repeat it produces a uniform distribution over bitstrings of some particular length. By contrast, a true random bitstring does not repeat itself, and it gives a uniform distribution over bitstrings of *arbitrary* length. In this regard, crypto RNGs are like true random bitstrings, not like k-equidistributed RNGs. This is a good thing. k-equidistribution doesn't really hurt (theoretically it introduces flaws, but for realistic designs they don't really matter), but if randomness is what you want then crypto RNGs are better. I-should-really-get-a-blog-shouldn't-I'ly-yrs, -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Tue Sep 15 10:42:26 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 15 Sep 2015 01:42:26 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Sep 14, 2015 at 10:49 PM, Tim Peters wrote: > [Tim] >>> And yet nobody so far has a produced a single example of any harm done >>> in any of the near-countless languages that supply non-crypto RNGs. I >>> know, my lawyer gets annoyed too when I point out there hasn't been a >>> nuclear war ;-) > > [Nathaniel Smith ] >> Here you go: >> https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf > > The most important question I have about that is from its Appendix D, > where they try to give a secure "all-purpose" token generator. If > openssl_random_pseudo_bytes is available, they use just that and call > it done. Otherwise they go on to all sorts of stuff. > > My question is, even if /dev/urandom is available, they're _not_ > content to use that.alone. They continue to mix it up with all other > kinds of silly stuff. So why do they trust urandom less than > OpenSLL's gimmick? It's important to me because, far as I'm > concerned, os.urandom() is already Python's way to spell "crypto > random" (yes, it would be better to have one guaranteed to be > available on all Python platforms). Who knows why they wrote the code in that exact way. /dev/urandom is fine. >> They present real-world attacks on PHP applications that use something >> like the "password generation" code we've been talking about as a way >> to generate cookies and password reset nonces, including in particular >> the case of applications that use a strongly-seeded Mersenne Twister >> as their RNG: > > I couldn't help but note ;-) that at least 3 of the apps had already > attempted to repair bug reports filed against their insecure > password-reset schemes at least 5 years ago. You can lead a PHP'er to > water, but ... ;-) > >> ... >> "Section 5.3: ... In this section we give a description of the >> Mersenne Twister generator and present an algorithm that allows the >> recovery of the internal state of the generator even when the output >> is truncated. Our algorithm also works in the presence of non >> consecutive outputs ..." > > It's cute, but I doubt anyone but the authors had the patience - or > knowledge - to write a solver dealing with tens of thousands of picky > equations over about 20000 variables. They don't explain enough about > the details for a script kiddie to do it. Even very bright hackers > attack what's easiest to topple, like poor seeding - that's just easy > brute force. > > It remained unclear to me _exactly_ what "in the presence of non > consecutive outputs" is supposed to mean. In the only examples, they > knew exactly how many times MT was called. "Non consecutive" in all > those contexts appeared to mean "but we couldn't observe_any_ output > bits in some cases - the ones we could know something about were > sometimes non-consecutive". So in the MT output sequence, they had no > knowledge of _some_ of the outputs, but they nevertheless knew exactly > _which_ of the outputs they were wholly ignorant about. > > That's no problem for the equation solver. They just skip adding any > equations for the affected bits, keep collecting more outputs and > potentially "wrap around", probably leading to an overdetermined > system in the end. > > But Python doesn't work the way PHP does here. As explained in > another message, in Python you can have _no idea_ how many MT outputs > are consumed by a single .choice() call. In the PHP equivalent, you > always consume exactly one MT output. PHP's method suffers > statistical bias, but under the covers Python uses an accept/reject > method to avoid that. Any number of MT outputs may be (invisibly!) > consumed before "accept" is reached, although typically only one or > two. You can deduce some of the leading MT output bits from the > .choice() result, but _only_ for the single MT output .choice() > reveals anything about. About the other MT outputs it may consume, > you can't even know that some _were_ skipped over, let alone how many. This led me to look at the implementation of Python's choice(), and it's interesting; I hadn't realized that it was using such an inefficient method. (To make a random selection between, say, 36 items, it rounds up to 64 = 2**6, draws a 32-bit sample from MT, discards 26 of the bits (!) to get a number between 0-63, and then repeats until this number happens to fall in the 0-35 range, so it rejects with probability ~0.45. A more efficient algorithm is the one that it uses if getrandbits is not available, where it uses all 32 bits and only rejects with probability (2**32 % 36) / (2**32) = ~1e-9.) I guess this does add a bit of obfuscation. OTOH the amount of obfuscation is very sensitive to the size of the password alphabet. If I use uppercase + lowercase + digits, that gives me 62 options, so I only reject with probability 1/32, and I can expect that any given 40-character session key will contain zero skips with probability ~0.28, and that reveals 240 bits of seed. I don't have time right now to go look up the MT equations to see how easy it is to make use of such partial information, but there certainly are lots of real-world weaponized exploits that begin with something like "first, gather 10**8 session keys...". I certainly wouldn't trust it. Also, if I use the base64 or hex alphabets, then the probability of rejection is 0, and I can deterministically read off bits from the underlying MT state. (Alternatively, if someone in the future makes the obvious optimization to choice(), then it will basically stop rejecting in practice, and again it becomes trivial to read off all the bits from the underlying MT state.) The point of "secure by default" is that you don't have to spend all these paragraphs doing the math to try and guess whether some RNG usage might maybe be secure; it just is secure. > Best I can tell, that makes a huge difference to whether their solver > is even applicable to cracking idiomatic "password generators" in > Python. You can't know which variables correspond to the bits you can > deduce. You could split the solver into multiple instances to cover > all feasible possibilities (for how many MT outputs may have been > invisibly consumed), but the number of solver instances needed then > grows exponentially with the number of outputs you do see something > about. In the worst case (31 bits are truncated), they need over > 19000 outputs to deduce the state. Even a wildly optimistic "well, > let's guess no more than 1 MT output was invisibly rejected each time" > leads to over 2**19000 solver clones then. Your "wildly optimistic" estimate is wildly conservative under realistic conditions. How confident are you that the rest of your analysis is totally free of similar errors? Would you willing to bet, say, the public revelation of every website you've visited in the last 5 years on it? > Sure, there's doubtless a far cleverer way to approach that. But > unless another group of PhDs looking to level up in Security World > burns their grant money to tackle it, that's yet another attack that > will never be seen in the real world ;-) Grant money is a drop in the bucket of security research funding these days. Criminals and governments have very deep pockets, and it's well documented that there are quite a few people with PhDs who make their living by coming up with exploits and then auctioning them on the black market. BTW, it looks like that PHP paper was an undergraduate project. You don't need a PhD to solve linear equations :-). >> Out of curiosity, I tried searching github for "random cookie >> language:python". The 5th hit (out of ~100k) was a web project that >> appears to use this insecure method to generate session cookies: >> https://github.com/bytasv/bbapi/blob/34e294becb22bae6e685f2e742b7ffdb53a83bcb/bbapi/utils/cookie.py >> https://github.com/bytasv/bbapi/blob/34e294becb22bae6e685f2e742b7ffdb53a83bcb/bbapi/api/router.py#L56-L66 > > And they all use .choice(), which is overwhelmingly the most natural > way to do this kind of thing in Python. > >> ... >> There's a reason security people are so Manichean about these kinds of >> things. If something is not intended to be secure or used in >> security-sensitive ways, then fine, no worries. But if it is, then >> there's no point in trying to mess around with "probably mostly >> secure" -- either solve the problem right or don't bother. (See also: >> the time Python wasted trying to solve hash randomization without >> actually solving hash randomization [1].) If Tim Peters can get fooled >> into thinking something like using MT to generate session ids is >> "probably mostly secure", then what chance do the rest of us have? >> > > As above, I'm still not much worried about .choice(). Even if I were, > I'd be happy to leave it at "use .choice() from a random.SystemRandom > instance instead". Unless there's some non-obvious (to me) reason > these authors appear to be unhappy with urandom. No, SystemRandom.choice is certainly fine. But people clearly don't use it, so it's fine-ness doesn't matter that much in practice... -n -- Nathaniel J. Smith -- http://vorpus.org From mal at egenix.com Tue Sep 15 10:45:34 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Sep 2015 10:45:34 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> Message-ID: <55F7DAAE.5010401@egenix.com> On 15.09.2015 09:36, Nathaniel Smith wrote: > > [Using empirical tests to check RNGs] > > Obviously the thing the scientists worry about is a *strict* subset of > what the cryptographers are worried about. I think this explains why we cannot make ends meet: A scientist wants to be able to *repeat* a simulation in exactly the same way without having to store GBs of data (or send them to colleagues to have them very the results). Crypto RNGs cannot provide this feature per design. What people designing PRNGs are after is to improve the statistical properties of these PRNGs while still maintaining the repeatability of the output. > This is why it is silly to > worry that a crypto RNG will cause problems for a scientific > simulation. The cryptographers take the scientists' real goal -- the > correctness of arbitrary programs like e.g. a monte carlo simulation > -- *much* more seriously than the scientists themselves do. (This is > because scientists need RNGs to do their real work, whereas for > cryptographers RNGs are their real work.) Yes, cryptographers are the better folks, understood. These arguments are not really helpful. They are not even arguments. It's really simple: If you don't care about being able to reproduce your simulation results, you can use a crypto RNG, otherwise you can't. > Compared to this, k-dimensional equidistribution is a red herring: it > requires that you have a RNG that repeats itself after a while, and > within each repeat it produces a uniform distribution over bitstrings > of some particular length. k-dim equidistribution is a way to measure how well your PRNG behaves, because it describes in analytical terms how far you can get with increasing the linear complexity of your RNG output. The goal is not to design an PRNG with specific k, but to increase k as far as possible, given the RNGs deterministic limitations. It's also not something you require of a PRNG, it's simply a form of analytical measurement, just like the tests in TestU01 or the NIST test set are statistical measurements for various aspects of RNGs. Those statistical tests are good in detecting flaws of certain kinds, but they are not perfect. If you know the tests, you can work around them and have your RNG appear to pass them, e.g. you can trick a statistical test for linear dependencies by applying a non-linear transform. That doesn't make the RNGs better, but it apparently is a way to convince some people of the quality of your RNG. > By contrast, a true random bitstring does > not repeat itself, and it gives a uniform distribution over bitstrings > of *arbitrary* length. In this regard, crypto RNGs are like true > random bitstrings, not like k-equidistributed RNGs. This is a good > thing. k-equidistribution doesn't really hurt (theoretically it > introduces flaws, but for realistic designs they don't really matter), > but if randomness is what you want then crypto RNGs are better. If you can come up with a crypto RNG that allows repeating the results, I think you'd have us all convinced, otherwise it doesn't really make sense to compare apples and oranges, and insisting that orange juice is better for you than apple juice ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 15 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 3 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rosuav at gmail.com Tue Sep 15 10:58:01 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 15 Sep 2015 18:58:01 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On Tue, Sep 15, 2015 at 6:23 AM, Donald Stufft wrote: >> * The security arguments seem to be largely in the context of web >> application development (cookies, passwords, shared secrets, ...) >> That's not the only context that matters. > > You're right it's not the only context that matters, however it's often brought > up for a few reasons: > > * Security largely doesn't matter for software that doesn't accept or send > input from some untrusted source which narrows security down to be mostly > network based applications. > > * The HTTP protocol is "eating the world" and we're seeing more and more things > using it as their communication protocol (even for things that are not > traditional browser based applications). > > * Traditional Web Applications/Sites are a pretty large target audience for > Python and in particular a lot of the security folks come from that world > because the web is a hostile place. To add to that: Web application development is a *huge* area (every man and his dog wants a web site, and more than half of them want logins and users and so on), which means that the number of non-experts writing security-sensitive code is higher there than in a lot of places. The only other area I can think of that would be comparably popular would be mobile app development - and a lot of the security concerns there are going to be in a web context anyway. Is it fundamentally insecure to receive passwords over an encrypted HTTP connection and use those to verify user identities? I don't think so (although I'm no expert) - it's what you do with them afterward that matters (improperly hashing - or, worse, using a reversible transformation). Why are so many people advised not to do user authentication at all, but to tie in with one of the auth APIs like Google's or Facebook's? Because it's way easier to explain how to get that right than to explain how to get security/encryption right. How bad is it, really, to tell everyone "use random.SystemRandom for anything sensitive", and leave it at that? ChrisA From ncoghlan at gmail.com Tue Sep 15 12:14:35 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Sep 2015 20:14:35 +1000 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 15 September 2015 at 18:58, Chris Angelico wrote: > How bad is it, really, to tell everyone "use random.SystemRandom for > anything sensitive", and leave it at that? That's the status quo, and has been for a long time. If it was ever going to work in terms of discouraging folks from use the module level functions for security sensitive tasks, it would have worked by now. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Tue Sep 15 13:04:49 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 15 Sep 2015 12:04:49 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 14 September 2015 at 23:39, Paul Moore wrote: > (The rest of your emails, I'm going to read fully and digest before > responding. Might take a day or so.) Point by point responses exhaust and frustrate me, and don't really serve much purpose other than to perpetuate the debate. So I'm going to make some final points, and then stop. This is based on having read the various emails responding to my earlier comments. If it looks like I haven't read something, please assume I have but either you didn't get your point across, or maybe I simply don't agree with you. Why now? -------- First of all, the big question for me is why now? The random module has been around in its current form for many, many years. Security issues are not new, maybe they are slowly increasing, but there's been no step change. The only thing that seems to have changed is that someone (Theo) has drawn attention to the random module. So I feel that the onus is on the people proposing change to address that. Show me the evidence that we've had an actual problem for many years, and demonstrate that it's a good job we spotted it at last, and now have a chance to fix it. Explain to me what has been going wrong all these years that I'd never even noticed. Arguments that people are misusing the module aren't sufficient in themselves - they've (presumably) been doing that for years. In all that time, who was hacked? Who lost data? As a result of random.random being a PRNG rather than being crypto-secure? I'm not asking for an unassailable argument, just acknowledgement that it's *your* job to address that question, and not mine to persuade you that "we've been alright so far" is a compelling reason to reject your proposal. Incorrect code on SO etc ------------------------ As regards people picking up insecure code snippets from the internet and using them, there's no news there. I can look round and find hundreds of bits of incorrect code in any area you want. People copy/paste garbage code all the time. To my embarassment, I've done it myself in the past :-( But I'm reminded of https://xkcd.com/386/ - "somebody is wrong on the internet!" This proposal, and in particular the suggestion that we need to retrospectively make the code snippets quoted here secure, strikes me as a huge exercise in trying to correct all the people who are wrong on the internet. There's certainly value in "safe by default" APIs, I don't disagree with that, but I honestly fail to see how quoting incorrect code off the internet is a compelling argument for anything. Millions of users are affected ------------------------------ The numbers game is also a frustrating exercise here. We keep hearing that "millions of users are affected by bad code", that scans of Google almost immediately find sites with vulnerabilities. But I don't see anyone pointing at a single documented case of an actual exploit caused by Python's random module. There's no bug report. There's no security alert notification. How are those millions of users affected? Their level of risk is increased? Who can tell that? Are any of the sites identified holding personal data? Not all websites on the internet are *worth* hacking. And I feel that expressing that view is somehow frowned on. That "it doesn't matter" is an unacceptable view to hold. And so, the responses to my questions feel personal, they feel like criticisms of me personally, that I'm being unprofessional. I don't want to make this a big deal, but the code of conduct says "we're tactful when approaching differing views", and it really doesn't feel like that. I understand that the whole security thing is a numbers game. And that it's about assessing risk. But what risk is enough to trigger a response? A 10% increased chance of any given website being hacked? 5%? 1%? Again, I'm not asking to use the information to veto a change. I'm asking to *understand your position*. To better assess your arguments, so that I can be open to persuasion, and to *agree* with you, if your arguments are sound. Furthermore, should we not take into account other languages and approaches at this point? Isn't PHP a well-known "soft target"? Isn't phishing and social engineering the best approach to hacking these days, rather than cracking RNGs? I don't know, and I look to security experts for advice here. So please explain for me, how are you assessing the risks, and why do you judge this specific risk high enough to warrant a response? The impression I get is that the security view is that *any* risk, no matter how small, once identified, warrants a response. "Do nothing" is never an option. If that's your position, then I'm sorry, but I simply don't agree with you. I don't want to live in a world that paranoid, and I'm unsure how to get past this point to have a meaningful dialog. History, and security's "bad rep" --------------------------------- Donald asked if I was experiencing some level of spill-over from distutils-sig, where there has *also* been a lot of security churn (far more than here). Yes, I am. No doubt about that. On distutils-sig, and pip in particular, it's clear to see a lot of frustration from users with the long-running series of security changes. The tone of bug reports is frustrated and annoyed. Users want a break from being forced to make changes. Outside of Python, and speaking purely from my own experience in the corporate world, security is pretty uniformly seen as an annoying overhead, and a block on actually getting the job done. You can dismiss that as misguided, but it's a fact. "We need to do this for security" is a direct challenge to people to dismiss it as unnecessary, and often to immediately start looking for ways to bypass the requirement "so that it doesn't get in the way". I try not to take that attitude in this sort of debate, but at the same time, I do try to *represent* that view and ask for help in addressing it. The level of change in core Python is far less than on distutils-sig, and has been relatively isolated from "non-web" areas. People understand (and are grateful for) increases in "secure by default" behaviour in code like urllib and ssl. They know that these are places where security is important, where getting it right is harder than you'd think, and where trusting experts to do the hard thinking for you is important. But things like hash randomisation and the random module are less obviously security related. The feedback from hash randomisation focused on "why did you break my code?". It wasn't a big deal, people were relying on undocumented behaviour and accepted that, but they did see it as a breakage from a security fix. I expect the same to be true with the random module, but with the added dimension that we're proposing changing documented behaviour this time. As a result of similar arguments applying to every security change, and those arguments never *really* seeming to satisfy people, there's a lot of reiterated debate. And that's driving interested but non-expert people away from contributing to the discussion. So we end up with a lack of checks and balances because people without a vested interest in tightening security "tune out" of the debates. I see that as a problem. But ultimately, if we can't find a better way of running these discussions, I don't know how we fix it. I certainly can't continue being devil's advocate every time. Anyway, that's me done on this thread. I hope I've added more benefit than cost to the discussion. Thanks to everyone for responding to my questions - even if we all felt like we were both just repeating the same thing, it's a lot of effort doing so and I appreciate your time. Paul From robert.kern at gmail.com Tue Sep 15 13:21:57 2015 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 15 Sep 2015 12:21:57 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <20150915035334.GF31152@ando.pearwood.info> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> Message-ID: On 2015-09-15 04:53, Steven D'Aprano wrote: > On Mon, Sep 14, 2015 at 10:19:09PM +0100, Robert Kern wrote: > >> The requirement for a good PRNG for simulation work is that it be *well* >> distributed in reasonable dimensions, not that it be *exactly* >> equidistributed for some k. And well-distributedness is exactly what is >> tested in TestU01. It is essentially a collection of simulations designed >> to expose known statistical flaws in PRNGs. So to your earlier question as >> to which is more damning, failing TestU01 or not being perfectly 623-dim >> equidistributed, failing TestU01 is. > > I'm confused here. Isn't "well-distributed" a less-strict test than > "exactly equidistributed"? MT is (almost) exactly k-equidistributed up > to k = 623, correct? So how does it fail the "well-distributed" test? k=623 is a tiny number of dimensions for testing "well-distributedness". You should be able to draw millions of values without detecting significant correlations. Perfect k-dim equidistribution is not a particularly useful metric on its own (at least for simulation work). You can't just say "PRNG A has a bigger k than PRNG B therefore PRNG A is better". You need a minimum period to even possibly reach a certain k, and that period goes up exponentially with k. Given two PRNGs that have the same period, but one has a much smaller k than the other, *then* you can start making inferences about relative quality (again for simulation work; ChaCha20 has a long period but no guarantees of k that I am aware of, but its claim to fame is security, not simulation work). Astronomical periods have costs, so you only want to pay for what is actually worth it, so it's certainly a good thing that the MT has a k near its upper bound. PRNGs with shorter, but still roomy periods like 2**128 are not worse because they have necessarily smaller ks. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From njs at pobox.com Tue Sep 15 13:41:30 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 15 Sep 2015 04:41:30 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F7DAAE.5010401@egenix.com> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> Message-ID: On Tue, Sep 15, 2015 at 1:45 AM, M.-A. Lemburg wrote: > On 15.09.2015 09:36, Nathaniel Smith wrote: >> >> [Using empirical tests to check RNGs] >> >> Obviously the thing the scientists worry about is a *strict* subset of >> what the cryptographers are worried about. > > I think this explains why we cannot make ends meet: > > A scientist wants to be able to *repeat* a simulation in exactly the > same way without having to store GBs of data (or send them to colleagues > to have them very the results). > > Crypto RNGs cannot provide this feature per design. > > What people designing PRNGs are after is to improve the statistical > properties of these PRNGs while still maintaining the repeatability > of the output. > >> This is why it is silly to >> worry that a crypto RNG will cause problems for a scientific >> simulation. The cryptographers take the scientists' real goal -- the >> correctness of arbitrary programs like e.g. a monte carlo simulation >> -- *much* more seriously than the scientists themselves do. (This is >> because scientists need RNGs to do their real work, whereas for >> cryptographers RNGs are their real work.) > > Yes, cryptographers are the better folks, understood. These arguments > are not really helpful. They are not even arguments. Err... I think we're arguing past each other. (Hint: I'm a scientist, not a cryptographer ;-).) My email was *only* trying to clear up the argument that keeps popping up about whether or not a cryptographic RNG could introduce bias in simulations etc., as compared to the allegedly-better-behaved Mersenne Twister. (As in e.g. your comment upthread that "[MT] is proven to be equidistributed which is a key property needed for it to be used as basis for other derived probability distributions".) This argument is incorrect -- equidistribution is not a guarantee that an RNG will produce good results when deriving other probability distributions, and in general cryptographic RNGs will produce as-or-better results than MT in terms of correctness of output. On this particular axis, using a cryptographic RNG is not at all dangerous. Obviously this is only one of the considerations in choosing an RNG; the quality of the randomness is totally orthogonal to considerations like determinism. (Cryptographers also have deterministic RNGs -- they call them "stream ciphers" -- and these will also meet or beat MT in any practically relevant test of correctness for the same reasons I outlined, despite not being provably equidistributed. Of course there are then yet other trade-offs like speed. But that's not really relevant to this thread, because no-one is proposing replacing MT as the standard deterministic RNG in Python; I'm just trying to be clear about how one judges the quality of randomness that an RNG produces.) -n -- Nathaniel J. Smith -- http://vorpus.org From sturla.molden at gmail.com Tue Sep 15 13:54:45 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 15 Sep 2015 13:54:45 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> Message-ID: On 15/09/15 09:36, Nathaniel Smith wrote: > Obviously the thing the scientists worry about is a *strict* subset of > what the cryptographers are worried about. This is why it is silly to > worry that a crypto RNG will cause problems for a scientific > simulation. The cryptographers take the scientists' real goal -- the > correctness of arbitrary programs like e.g. a monte carlo simulation > -- *much* more seriously than the scientists themselves do. No. Cryptographers care about predictability, not the exact distribution. Any distribution can be considered randomness with a given entropy, but not any distribution is uniform. Only the uniform distribution is uniform. That is where our needs fail to meet. Cryptographers damn any RNG that allow the internal state to be reconstructed. Scientists damn any RNG that do not produce the distribution of interest. Sturla From skrah at bytereef.org Tue Sep 15 14:08:53 2015 From: skrah at bytereef.org (Stefan Krah) Date: Tue, 15 Sep 2015 12:08:53 +0000 (UTC) Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux References: Message-ID: Paul Moore writes: [snip well-reasoned paragraphs] I want to add that the dichotomy between "security-minded" and "non-security-minded" that has been used for rhetoric purposes has no basis in reality. Several "non-security-minded" devs (of the kind who have *actually* contributed a lot of code to CPython) have a pretty good grasp of cryptography and just don't like security theater. Stefan Krah From sturla.molden at gmail.com Tue Sep 15 14:09:07 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 15 Sep 2015 14:09:07 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F7DAAE.5010401@egenix.com> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> Message-ID: On 15/09/15 10:45, M.-A. Lemburg wrote: > k-dim equidistribution is a way to measure how well your > PRNG behaves, because it describes in analytical terms how > far you can get with increasing the linear complexity of your > RNG output. Yes and no. Conceptually it means that k subsequent samples will have exactly zero correlation. But any PRNG that produces detectable correlation between samples 623 steps apart is junk anyway. The MT have proven equidistribution for k=623, but many have measured equidistribution for far longer periods than that. Numerical computations are subject to rounding error and truncation error whatever you do. The question is whether the deviation from k-dim equidistribution will show up in your simulation result or drown in the error terms. Sturla From p.f.moore at gmail.com Tue Sep 15 14:12:30 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 15 Sep 2015 13:12:30 +0100 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On 15 September 2015 at 13:08, Stefan Krah wrote: > I want to add that the dichotomy between "security-minded" and > "non-security-minded" that has been used for rhetoric purposes > has no basis in reality. > > Several "non-security-minded" devs (of the kind who have *actually* > contributed a lot of code to CPython) have a pretty good grasp of > cryptography and just don't like security theater. Agreed, and every time I ended up looking for words for the two "sides", I ended up feeling uncomfortable. There are no "sides" here, just a variety of people with a variety of experiences, who want to feel assured that their voices are being heard. Paul From donald at stufft.io Tue Sep 15 14:27:44 2015 From: donald at stufft.io (Donald Stufft) Date: Tue, 15 Sep 2015 08:27:44 -0400 Subject: [Python-ideas] Python's Source of Randomness and the random.py module Redux In-Reply-To: References: Message-ID: On September 15, 2015 at 7:04:52 AM, Paul Moore (p.f.moore at gmail.com) wrote: > On 14 September 2015 at 23:39, Paul Moore wrote: > > (The rest of your emails, I'm going to read fully and digest before > > responding. Might take a day or so.) > > Point by point responses exhaust and frustrate me, and don't really serve much > purpose other than to perpetuate the debate. So I'm going to make some final > points, and then stop. This is based on having read the various emails > responding to my earlier comments. If it looks like I haven't read something, > please assume I have but either you didn't get your point across, or maybe I > simply don't agree with you. > > Why now? > -------- > > First of all, the big question for me is why now? The random module has been > around in its current form for many, many years. Security issues are not new, > maybe they are slowly increasing, but there's been no step change. The only > thing that seems to have changed is that someone (Theo) has drawn attention to > the random module. > > So I feel that the onus is on the people proposing change to address that. > Show me the evidence that we've had an actual problem for many years, and > demonstrate that it's a good job we spotted it at last, and now have a chance > to fix it. Explain to me what has been going wrong all these years that I'd > never even noticed. Arguments that people are misusing the module aren't > sufficient in themselves - they've (presumably) been doing that for years. In > all that time, who was hacked? Who lost data? As a result of random.random > being a PRNG rather than being crypto-secure? > > I'm not asking for an unassailable argument, just acknowledgement that it's > *your* job to address that question, and not mine to persuade you that "we've > been alright so far" is a compelling reason to reject your proposal. The answer to "Why Now?"" is basically because someone brought it up. I realize that's a pretty arbitrary thing but I'm not sure what answer would even be acceptable here. When is an OK time to do it in your eye? Is it only after there is a public, known attack against the RNG? Is it only when the module is first being added? The sad state of affairs is that it's only been relatively recently that our industry as a whole has really taken security seriously so there is a lot of things out there that are not well designed from a security POV. We can't go back in time and change the original mistake, but we can repair it going into the future. > > Incorrect code on SO etc > ------------------------ > > As regards people picking up insecure code snippets from the internet and > using them, there's no news there. I can look round and find hundreds of bits > of incorrect code in any area you want. People copy/paste garbage code all the > time. To my embarassment, I've done it myself in the past :-( > > But I'm reminded of https://xkcd.com/386/ - "somebody is wrong on the > internet!" > > This proposal, and in particular the suggestion that we need to > retrospectively make the code snippets quoted here secure, strikes me as a > huge exercise in trying to correct all the people who are wrong on the > internet. There's certainly value in "safe by default" APIs, I don't disagree > with that, but I honestly fail to see how quoting incorrect code off the > internet is a compelling argument for anything. The argument is basically that security is an important part of API design, and that if you look at what people are doing in practice, it gives you an idea of how people think they should use the API. It's kind of like looking at a situation like this: https://i.imgur.com/0gnb7Us.jpg and concluding that maybe we should pave that worn down footpath, because people are going to use it anyways. > > Millions of users are affected > ------------------------------ > > The numbers game is also a frustrating exercise here. We keep hearing that > "millions of users are affected by bad code", that scans of Google almost > immediately find sites with vulnerabilities. > > But I don't see anyone pointing at a single documented case of an actual > exploit caused by Python's random module. There's no bug report. There's no > security alert notification. So a big part of this is certainly preventative. It's a fairly relatively recent development that hacking went from indivduals or small teams doing it to big targets to a business on it's own. There are literally giant office complexes in places like Russia and China filled with employees in cubicles, but they aren't writing software like at a normal company, they are just trawling around the internet, looking for targets, trying to expand botnets looking for anything and everything they can get their hands on. It's also true that there isn't going to be a big fanfaire for *most* actual hacked computers/sites. Most of the time the people running the site simply won't ever know, they'll just be silently hosting malware or having their user's passwords being fed into other sites. It's very few exploits that actually get noticed and when noticed it's unlikely they get public attention. I'd also suggest that for changes like these, if someone was exploited by this they'd probably look at the documentation for random.py and see that they were accidently using the module wrong, and then blame themselves and not ever bother to file a bug report. It is my opinion that it's not really their fault that the API lead them to believe that what they were doing was right. > > How are those millions of users affected? Their level of risk is increased? > Who can tell that? Are any of the sites identified holding personal data? Not > all websites on the internet are *worth* hacking. Actually, all sites on the internet *are* worth hacking, depending on what you call hacking. Malware is constantly being hosted on tiny sites that most wouldn't call "worth" hacking, but malware authors were able to hack in some way and then they uploaded their malware there. If there are user logins it's likely that people reused username and passwords, so if you can get the passwords from one smaller site, it's possible you can use that as a door into a larger, more important site. Plus, there's also the desire for botnets to add more and more nodes into their swarm, they don't care what site you're hosting, they just want the machine. One key problem to the security of the internet as a whole is that there are a lot of small sites without dedicated security teams, or anyone who really knows security at all. These are easy targets for people and most languages and libraries make it far too easy for people to do the wrong thing. > > And I feel that expressing that view is somehow frowned on. That "it doesn't > matter" is an unacceptable view to hold. And so, the responses to my questions > feel personal, they feel like criticisms of me personally, that I'm being > unprofessional. I don't want to make this a big deal, but the code of conduct > says "we're tactful when approaching differing views", and it really doesn't > feel like that. > > I understand that the whole security thing is a numbers game. And that it's > about assessing risk. But what risk is enough to trigger a response? A 10% > increased chance of any given website being hacked? 5%? 1%? Again, I'm not > asking to use the information to veto a change. I'm asking to *understand > your position*. To better assess your arguments, so that I can be open to > persuasion, and to *agree* with you, if your arguments are sound. It's basically a gut feeling since we can't get any hard data here. Things like being able to look online and find code in the wild that does this wrong within minutes gives us an idea at how likely it is as well as reasoning about what people who don't know what the difference is between ``random.random()`` and ``random.SystemRandom().random()`` as well as just a little bit of guessing based on experience with similar situations. Another input into this equation is how much it's likely that this change would break someone and once broken, how easy it will be to fix things. I sadly can't give anything more specific than that here, because it's a bit of an artform crossed with personal biases :( > > Furthermore, should we not take into account other languages and approaches at > this point? Isn't PHP a well-known "soft target"? Isn't phishing and social > engineering the best approach to hacking these days, rather than cracking > RNGs? I don't know, and I look to security experts for advice here. So please > explain for me, how are you assessing the risks, and why do you judge this > specific risk high enough to warrant a response? > > The impression I get is that the security view is that *any* risk, no matter > how small, once identified, warrants a response. "Do nothing" is never an > option. If that's your position, then I'm sorry, but I simply don't agree with > you. I don't want to live in a world that paranoid, and I'm unsure how to get > past this point to have a meaningful dialog. Do nothing is absolutely an option, but most security focused folks don't take a scorched earth view of security so we often times don't bother to even mention a possible change unless we think that doing nothing is the wrong answer. An example going back to PEP 476 where we enabled TLS verification by default on HTTPS, we limited it to *only* HTTPS even though TLS is used by many other protocols because it was our opinion that doing nothing for those protocols was the right call. Those are protocols are still insecure by default, but doing something about that by default would break too much for us to be willing to even suggest it. On top of that, we tend to want to prioritize the things we do try to have happen, so we focus on things with the smallest fallout or the biggest upsides and we ignore other things until later. This is probably why there's some bias that it looks like doing nothing is an option, because we already self select what we choose to push forward because we *do* care about backwards compatability too. > > History, and security's "bad rep" > --------------------------------- > > Donald asked if I was experiencing some level of spill-over from > distutils-sig, where there has *also* been a lot of security churn (far more > than here). Yes, I am. No doubt about that. On distutils-sig, and pip in > particular, it's clear to see a lot of frustration from users with the > long-running series of security changes. The tone of bug reports is frustrated > and annoyed. Users want a break from being forced to make changes.? I think a lot of these changes are paying down technical debt of two decades of (industry standard) lack of focus on security. It sucks, but when we come out the other side (because hopefully, new APIs and modules will be better designed with security in mind given our new landscape) we should hopefully be in a much better situation. In the distutils-sig side, I think that PEP 470 was the last breaking change that I can think of that we'll need to do in the name of security, we've paid down that particular bit of technical debt, and once that lands we'll have a pretty decent story. We still have other kinds of techincal debt to pay down though :( > > Outside of Python, and speaking purely from my own experience in the corporate > world, security is pretty uniformly seen as an annoying overhead, and a block > on actually getting the job done. You can dismiss that as misguided, but it's > a fact. "We need to do this for security" is a direct challenge to people to > dismiss it as unnecessary, and often to immediately start looking for ways to > bypass the requirement "so that it doesn't get in the way". I try not to take > that attitude in this sort of debate, but at the same time, I do try to > *represent* that view and ask for help in addressing it. > > The level of change in core Python is far less than on distutils-sig, and has > been relatively isolated from "non-web" areas. People understand (and are > grateful for) increases in "secure by default" behaviour in code like urllib > and ssl. They know that these are places where security is important, where > getting it right is harder than you'd think, and where trusting experts to do > the hard thinking for you is important. > > But things like hash randomisation and the random module are less obviously > security related. The feedback from hash randomisation focused on "why did you > break my code?". It wasn't a big deal, people were relying on undocumented > behaviour and accepted that, but they did see it as a breakage from a security > fix. I expect the same to be true with the random module, but with the added > dimension that we're proposing changing documented behaviour this time. > > As a result of similar arguments applying to every security change, and those > arguments never *really* seeming to satisfy people, there's a lot of > reiterated debate. And that's driving interested but non-expert people away > from contributing to the discussion. So we end up with a lack of checks and > balances because people without a vested interest in tightening security "tune > out" of the debates. I see that as a problem. But ultimately, if we can't find > a better way of running these discussions, I don't know how we fix it. I > certainly can't continue being devil's advocate every time. Things don't really satisify people because they often times fundamentally don't care about security. That is perfectly reasonable, so don't think that I expect everyone to care about security, but they simply don't. However, In my opinion we have a moral obligation to try and do what we reasonably can to protect people. It's a bit like social safety nets, one person might ask why they are being asked to pay taxes, after all they never needed government assistance but by asking every citizen to pay in, they can try and help people from falling through the cracks. This isn't a social safety net, it's a security safety net. > > Anyway, that's me done on this thread. I hope I've added more benefit than > cost to the discussion. Thanks to everyone for responding to my questions - > even if we all felt like we were both just repeating the same thing, it's a > lot of effort doing so and I appreciate your time. > > Paul > ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From njs at pobox.com Tue Sep 15 14:34:36 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 15 Sep 2015 05:34:36 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> Message-ID: On Sep 15, 2015 4:57 AM, "Sturla Molden" wrote: > > On 15/09/15 09:36, Nathaniel Smith wrote: > >> Obviously the thing the scientists worry about is a *strict* subset of >> what the cryptographers are worried about. This is why it is silly to >> worry that a crypto RNG will cause problems for a scientific >> simulation. The cryptographers take the scientists' real goal -- the >> correctness of arbitrary programs like e.g. a monte carlo simulation >> -- *much* more seriously than the scientists themselves do. > > > No. Cryptographers care about predictability, not the exact distribution. Any distribution can be considered randomness with a given entropy, but not any distribution is uniform. Only the uniform distribution is uniform. That is where our needs fail to meet. Cryptographers damn any RNG that allow the internal state to be reconstructed. Scientists damn any RNG that do not produce the distribution of interest. No, this is simply wrong. I promise! ("Oh, sorry, this is contradictions...") For the output of a cryptographic RNG, any deviation from the uniform distribution is considered a flaw. (And as you know, given uniform variates you can construct any distribution of interest.) If I know that you're using a coin that usually comes up heads to generate your passwords, then this gives me a head start in guessing your passwords, and that's considered unacceptable. Or for further evidence, consider: "Scott Fluhrer and David McGrew also showed such attacks which distinguished the keystream of the RC4 from a random stream given a gigabyte of output." -- https://en.m.wikipedia.org/wiki/RC4#Biased_outputs_of_the_RC4 This result is listed on wikipedia because the existence of a program that can detect a deviation from perfect uniformity given a gigabyte of samples and an arbitrarily complicated test statistic is considered a publishable security flaw (and RC4 is generally deprecated because of this and related issues -- this is why openbsd's "arc4random" function no longer uses (A)RC4). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From skrah at bytereef.org Tue Sep 15 14:36:04 2015 From: skrah at bytereef.org (Stefan Krah) Date: Tue, 15 Sep 2015 12:36:04 +0000 (UTC) Subject: [Python-ideas] =?utf-8?q?Should_our_default_random_number_generat?= =?utf-8?q?or_be=09secure=3F?= References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> Message-ID: Nathaniel Smith writes: > Obviously the thing the scientists worry about is a *strict* subset of > what the cryptographers are worried about. This is why it is silly to > worry that a crypto RNG will cause problems for a scientific > simulation. Do you have links to papers analyzing chacha20 w.r.t statistical properties? The only information that I found is http://www.pcg-random.org/other-rngs.html#id11 "Fewer rounds result in poor statistical performance; ChaCha2 fails statistical tests badly, and ChaCha4 passes TestU01 but sophisticated mathematical analysis has shown it to exhibit some bias. ChaCha8 (and higher) are believed to be good. Nevertheless, ChaCha needs to go to more work to achieve satisfactory statistical quality than many other generators. ChaCha20, being newer, has received less scrutiny from the cryptographic community than Arc4." Stefan Krah From sturla.molden at gmail.com Tue Sep 15 14:51:28 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 15 Sep 2015 14:51:28 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <20150910015505.GO19373@ando.pearwood.info> Message-ID: On 10/09/15 04:23, Tim Peters wrote: > Now (well, last I > saw) they recommend a parameterized scheme creating a distinct variant > of MT per thread (not just different state, but a different (albeit > related) algorithm) The DCMT use the same algorithm (Mersenne Twister) but with different polynomials. The choice of polynomial is more or less arbitrary. You can search for a set of N polynomials that are (almost) prime to each other, and thus end up with totally independent sequences. Searching for such a set can take some time, so you need to do that in advance and save the result. But once you have a set, each one of them is just as valid as the vanilla MT. PCG also provides independent streams. Sturla From jeremy at jeremysanders.net Tue Sep 15 16:02:43 2015 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Tue, 15 Sep 2015 16:02:43 +0200 Subject: [Python-ideas] Should our default random number generator be secure? References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> Message-ID: M.-A. Lemburg wrote: > If you can come up with a crypto RNG that allows repeating the > results, I think you'd have us all convinced, otherwise it > doesn't really make sense to compare apples and oranges, > and insisting that orange juice is better for you than > apple juice ;-) According to http://www.pcg-random.org/other-rngs.html This chacha20 implementation is seedable and should be reproducible: https://gist.github.com/orlp/32f5d1b631ab092608b1 ...though I am concerned about the k-dimensional equidistribution as a scientist, and also that if the random number generator is changed without the interface changing, then it may screw up tests and existing codes which rely on a particular sequence of random numbers. J From mal at egenix.com Tue Sep 15 16:20:46 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Sep 2015 16:20:46 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> Message-ID: <55F8293E.2070307@egenix.com> On 15.09.2015 14:09, Sturla Molden wrote: > On 15/09/15 10:45, M.-A. Lemburg wrote: > >> k-dim equidistribution is a way to measure how well your >> PRNG behaves, because it describes in analytical terms how >> far you can get with increasing the linear complexity of your >> RNG output. > > Yes and no. Conceptually it means that k subsequent samples will have exactly zero correlation. But > any PRNG that produces detectable correlation between samples 623 steps apart is junk anyway. The MT > have proven equidistribution for k=623, but many have measured equidistribution for far longer > periods than that. Numerical computations are subject to rounding error and truncation error > whatever you do. The question is whether the deviation from k-dim equidistribution will show up in > your simulation result or drown in the error terms. I guess the answer is: it depends :-) According to the SFMT paper: """ ...it requires 10**28 samples to detect an F2-linear relation with 15 (or more) terms among 521 bits, by a standard statistical test. If the number of bits is increased, the necessary sample size is increased rapidly. Thus, it seems that k(v) of SFMT19937 is sufficiently large, far beyond the level of the observable bias. On the other hand, the speed of the generator is observable. """ http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/M062821.pdf (which again refers to this paper: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/HONGKONG/hong-fin4.pdf) 10**28 is already a lot of data, but YMMV, of course. Here's a quote for the WELL family of PRNGs: """ The WELL generators mentioned in Table IV successfully passed all the statistical tests included ... TestU01 ..., except those that look for linear dependencies in a long sequence of bits, such as the matrix-rank test ... for very large binary matrices and the linear complexity tests ... This is in fact a limitation of all F2-linear generators, including the Mersenne twister, the TT800, etc. Because of their linear nature, the sequences produced by these generators just cannot have the linear complexity of a truly random sequence. This is definitely unacceptable in cryptology, for example, but is quite acceptable for the vast majority of simulation applications if the linear dependencies are of long range and high order. """ http://www.iro.umontreal.ca/~lecuyer/myftp/papers/wellrng.pdf -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 15 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 3 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From sturla.molden at gmail.com Tue Sep 15 16:27:37 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 15 Sep 2015 16:27:37 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> Message-ID: On 15/09/15 14:34, Nathaniel Smith wrote: > No, this is simply wrong. I promise! ("Oh, sorry, this is > contradictions...") For the output of a cryptographic RNG, any deviation > from the uniform distribution is considered a flaw. (And as you know, > given uniform variates you can construct any distribution of interest.) > If I know that you're using a coin that usually comes up heads to > generate your passwords, then this gives me a head start in guessing > your passwords, and that's considered unacceptable. The uniform distribution has the highest entropy, yes, but it does not mean that other distributions are unacceptable. The sequence just has to be incredibly hard to predict. A non-uniform distribution will give an adversary a head start, that is true, but if the adversary still cannot complete the brute-force attack before the end of the universe there is little help in knowing this. In scientific computing we do not care about adversaries. We care about the correctness of our numerical result. That means we should be fuzzy about the distribution, not about the predictability or "randomness" of a sequence, nor about adversaries looking to recover the internal state. MT is proven to be uniform (equidistributed) up to 623 dimensions, but it is incredibly easy to recover the internal state. The latter we do not care about. In fact, we can often do even better with "quasi-random" sequences, e.g. Sobol sequences, which are not constructed to produce "uncorrelated" points, but constructed to produce correlated points that are delibarately more uniform than uncorrelated points. Sturla From ncoghlan at gmail.com Tue Sep 15 16:47:34 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Sep 2015 00:47:34 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default Message-ID: Hi folks, Based on the feedback in the recent threads, I've written a draft PEP that dispenses with the userspace CSPRNG idea, and instead proposes: * defaulting to using the system RNG for the module level random API in Python 3.6+ * implicitly switching to the deterministic PRNG if you call random.seed(), random.getstate() or random.setstate() (this implicit fallback would trigger a silent-by-default deprecation warning in 3.6, and a visible-by-default runtime warning after 2.7 goes EOL) * providing random.system and random.seedable submodules so you can explicitly opt in to using the one you want without having to manage your own RNG instances That approach would provide a definite security improvement over the status quo, while restricting the compatibility break to a performance regression in applications that use the module level API without calling seed(), getstate() or setstate(). It would also allow the current security warning in the random module documentation to be moved towards the end of the module, in a section dedicated to determinism and reproducibility. The full PEP should be up shortly at https://www.python.org/dev/peps/pep-0504/, but caching is still a problem when uploading new PEPs, so if that 404s, try http://legacy.python.org/dev/peps/pep-0504/ Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From sturla.molden at gmail.com Tue Sep 15 16:54:13 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 15 Sep 2015 16:54:13 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F8293E.2070307@egenix.com> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8293E.2070307@egenix.com> Message-ID: On 15/09/15 16:20, M.-A. Lemburg wrote: > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/M062821.pdf > (which again refers to this paper: > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/HONGKONG/hong-fin4.pdf) You seem to be confusing the DCMT with the SFMT which is a fast SIMD friendly Mersenne Twister. The DCMT is intended for using the Mersenne Twister in parallel computing (i.e. one Mersenne Twister per processor). It is not a Mersenne Twister accelerated with parallel hardware. That would be the SFMT. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf The period for the DC Mersenne Twisters they report are long enough, e.g. 2^127-1 or 2^521-1, but much shorter than the period of MT19937 (2^19937-1). This does not matter because the period of MT19937 is excessive. In scientific computing, the sequence is long enough for most practical purposes if it is larger than 2^64. 2^127-1 is more than enough, and this is the shortest period DCMT reported in the paper. So do we care? Probably not. Sturla From sturla.molden at gmail.com Tue Sep 15 17:40:57 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 15 Sep 2015 17:40:57 +0200 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On 15/09/15 16:47, Nick Coghlan wrote: > * providing random.system and random.seedable submodules so you can > explicitly opt in to using the one you want without having to manage > your own RNG instances I do not think these names are helpful. The purpose was to increase security, not confuse the user even more. What does "seedable" mean? Secure as in ChaCha20? Insecure as in MT19937? Something else? A name like "seedable" does not convey any useful information about the security to an un(der)informed web developer. A name like "random.system" does not convey any information about numerical applicability to an un(der)informed researcher. The module names should rather indicate how the generators are intended to be used. I suggest: random.crypto.* (os.urandom, ChaCha20, Arc4Random) random.numeric.* (Mersenne Twister, PCG, XorShift) Deprecate random.random et al. with a visible warning. That should convey the message. Sturla From tim.peters at gmail.com Tue Sep 15 17:46:04 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 15 Sep 2015 10:46:04 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F7DAAE.5010401@egenix.com> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> Message-ID: [M.-A. Lemburg ] > ... > If you can come up with a crypto RNG that allows repeating the > results, I think you'd have us all convinced, otherwise it > doesn't really make sense to compare apples and oranges, > and insisting that orange juice is better for you than > apple juice ;-) For example, run AES in CTR mode. Remember that we did something related on whatever mailing list it was ;-) discussing the PSF's voting system, to break ties in a reproducible-by-anyone way using some public info ("news") that couldn't be known until after the election ended. My understanding is that ChaCha20 (underlying currently-trendy implementations of arc4random) is not only deterministic, it even _could_ support an efficient jumpahead(n) operation. The specific OpenBSD implementation of arc4random goes beyond just using ChaCha20 by periodically scrambling the state with kernel-obtained "entropy" too, and that makes it impossible to reproduce its sequence. But it would remain a crytpo-strength generator without that extra scrambling step. Note that these _can_ be very simple to program. The "Blum Blum Shub" crypto generator from 30 years ago just iteratively squares a "big integer" modulo a (carefully chosen) constant. Not only deterministic, given any integer `i` it's efficient to directly compute the i'th output. It's an expensive generator, though (typically only 1 output bit is derived from each modular squaring operation). From sturla.molden at gmail.com Tue Sep 15 17:45:09 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 15 Sep 2015 17:45:09 +0200 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On 15/09/15 17:40, Sturla Molden wrote: > random.crypto.* (os.urandom, ChaCha20, Arc4Random) > random.numeric.* (Mersenne Twister, PCG, XorShift) Or even random.security.* The name hierarchy should convey a very clear message. Sturla From oscar.j.benjamin at gmail.com Tue Sep 15 19:20:18 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 15 Sep 2015 18:20:18 +0100 Subject: [Python-ideas] Globally configurable random number generation In-Reply-To: References: <305B13C9-BA39-4133-8BDC-794E82EBF254@yahoo.com> Message-ID: On 15 September 2015 at 05:53, Nick Coghlan wrote: > On 15 September 2015 at 14:03, Andrew Barnert wrote: >> Also, while I'm not 100% sold on the auto-switching and the delegate-at-call-time wrappers, I'll play with them and see, and if they do work, then you're definitely right that your second version does solve your problem with my proposal, so it doesn't matter whether your first version did anymore. >> >> First, on delegating top-level function: have you tested the performance? Is MT so slow that an extra lookup and function call don't matter? > > If folks are in a situation where the performance impact of the > additional layer of indirection is a problem, they can switch to using > random.Random explicitly, or import from random.seedable rather than > the top level random module. > >> One quick thought on auto-switching vs. explicitly setting the instance before any functions have been called: if I get you to install a plugin that calls random.seed(), I've now changed your app to use seeded random numbers. And it might even still pass security tests, because it doesn't switch until someone hits some API that activates the plugin. Is that a realistic danger for any realistic apps? If so, doesn't that potentially make 3.6 more dangerous than 3.5? The same problem can occur the other way round. Suppose that I want my whole app to be seedable but I have many modules that use "from random import choice" etc. Then in my top-level script I call random.seed and get an error under Python 3.6. So I switch that to use random.seedable but potentially end up with a mix of modules using random.seedable.choice and random.choice. It may seem under certain conditions that my app is properly seeded while not under others depending on which particular functions get called. The docs explicitly state that I will always be able to globally seed the module so that my entire non-threaded application is reproducible when using the top-level functions (even across different Python versions for random.random). So it's entirely reasonable to expect that people are using this behaviour and will want a way to revert to it which in the general case would need something like set_default_instance so that every module (including those I don't write myself) uses the same generator. > This isn't an applicable concern, as we already provide zero runtime > protections against hostile monkeypatching of other modules (by design > choice). You can subvert even os.urandom in a hostile plugin: > > def not_random(num_bytes): > return b'A' * num_bytes > import os > os.urandom = not_random It might not be a case of "hostile monkeypatching". Someone might just be trying to fix their code that was broken by the backwards-incompatible change proposed in this discussion. >> For another: I still think we should be getting people to explicitly use seeded_random or system_random (or seedless_random, if they need speed as well as "probably secure") or explicit class instances (which are a bigger change, but more backward compatible once you've made it) as often as possible, even if random does eventually turn into seedless_random. That's fine but seeded_random won't exist in earlier Python versions so it creates another cross-version compatibility problem. Also switching to using your own random instance can be a non-trivial change if more than one module/project is involved. The random module has deliberately provided a convenient place to store that global state which would need to be replaced somehow. >> And finally: it _seems like_ people who want MT for simulation/game/science stuff will have a pretty easy time finding the migration path, but I'm having a really hard time coming up with a convincing argument. Does anyone have a handful of science guys they can hack up a system for and test them empirically? Because if you can establish that fact, I think the naysayers have very little reason left to say nay, and a consensus would surely be better than having that horribly contentious thread end with "too bad, overruled, the PEP has been accepted". > > Given the general lack of investment in sustaining engineering for > scientific software, I think the naysayers are right on that front, > which is why I switched my proposal to give them a transparent upgrade > path - I was originally thinking primarily of the educational and > gaming use cases, and hadn't considered randomised simulations in the > scientific realm. TBH when I need to burn thousands of CPU-hours on RNG heavy code I would rather use numpy's random module. It also uses Mersenne Twister but it's a lot faster if you need loads of random numbers. -- Oscar From guido at python.org Tue Sep 15 19:33:47 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Sep 2015 10:33:47 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: I had to check out of the mega-threads, but I really don't like the outcome (unless this PEP is just the first of several competing proposals). The random module provides a useful interface ? a random() function and a large variety of derived functionality useful for statistics programming (e.g. uniform(), choice(), bivariate(), etc.). Many of these have significant mathematical finesse in their implementation. They are all accessing shared state that is kept in a global variable in the module, and that is a desirable feature (nobody wants to have to pass an extra variable just so you can share the state of the random number generator with some other code). I don?t want to change this API and I don?t want to introduce deprecation warnings ? the API is fine, and the warnings will be as ineffective as the warnings in the documentation. I am fine with adding more secure ways of generating random numbers. But we already have random.SystemRandom(), so there doesn?t seem to be a hurry? How about we make one small change instead: a way to change the default instance used by the top-level functions in the random module. Say, random.set_random_generator() This would require the global functions to use an extra level of indirection, e.g. instead of random = _inst.random we?d change that code to say def random(): return _inst.random() (and similar for all related functions). I am not worried of the cost of the indirection (and if it turns out too expensive we can reimplement the module in C). Then we could implement def set_random_generator(instance): global _inst _inst = instance We could also have a function random.use_secure_random() that calls set_random_generator() with an instance of a secure random number generator (maybe just SystemRandom()). We could rig things so that once use_secure_random() has been called called, set_random_generator() will throw an exception (to avoid situations where a library module attempts to make the shared random generator insecure in a program that has declared that it wants secure random). It would also be fine for SystemRandom (or at least whatever is used by use_secure_random(), if SystemRandom cannot change for backward compatibility reasons) to raise an exception when seed(), setstate() or getstate() are called. Of course modules are still free to use their own instances of the Random class. But I don?t see a reason to mess with the existing interface. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Sep 15 19:42:38 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Sep 2015 19:42:38 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> Message-ID: <55F8588E.7010106@egenix.com> On 15.09.2015 17:46, Tim Peters wrote: > [M.-A. Lemburg ] >> ... >> If you can come up with a crypto RNG that allows repeating the >> results, I think you'd have us all convinced, otherwise it >> doesn't really make sense to compare apples and oranges, >> and insisting that orange juice is better for you than >> apple juice ;-) > > For example, run AES in CTR mode. Remember that we did something > related on whatever mailing list it was ;-) discussing the PSF's > voting system, to break ties in a reproducible-by-anyone way using > some public info ("news") that couldn't be known until after the > election ended. Ah, now we're getting somewhere :-) If we accept that non-guessable, but deterministic is a good compromise, then adding a cipher behind MT sounds like a reasonable way forward, even as default. For full crypto strength, people would still have to rely on solutions like /dev/urandom or the OpenSSL one (or reseed the default RNG every now and then). All others get the benefit of non-guessable, but keep the ability to seed the default RNG in Python. Is there some research on this (MT + cipher or hash) ? > My understanding is that ChaCha20 (underlying currently-trendy > implementations of arc4random) is not only deterministic, it even > _could_ support an efficient jumpahead(n) operation. The specific > OpenBSD implementation of arc4random goes beyond just using ChaCha20 > by periodically scrambling the state with kernel-obtained "entropy" > too, and that makes it impossible to reproduce its sequence. But it > would remain a crytpo-strength generator without that extra scrambling > step. > > Note that these _can_ be very simple to program. The "Blum Blum Shub" > crypto generator from 30 years ago just iteratively squares a "big > integer" modulo a (carefully chosen) constant. Not only > deterministic, given any integer `i` it's efficient to directly > compute the i'th output. It's an expensive generator, though > (typically only 1 output bit is derived from each modular squaring > operation). IMO, that's a different discussion and we should rely on existing well tested full entropy mixers (urandom or OpenSSL) until the researchers have come with something like MT for chaotic PRNGs. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 15 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 3 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From donald at stufft.io Tue Sep 15 19:50:12 2015 From: donald at stufft.io (Donald Stufft) Date: Tue, 15 Sep 2015 13:50:12 -0400 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On September 15, 2015 at 1:34:56 PM, Guido van Rossum (guido at python.org) wrote: > > I am fine with adding more secure ways of generating random numbers. > But we already have random.SystemRandom(), so there doesn?t > seem to be a hurry? The problem isn't so much that there isn't a way of securely generating random? numbers, but that the module, as it is right now, guides you towards using an? insecure source of random numbers rather than a secure one. This means that unless you're familar with the random module or reading the online documentation you don't really have any idea that ``random.random()`` isn't secure. This is an attractive nuisance for anyone who *doesn't* need deterministic output from their random numbers and leads to situations where people are incorrectly using MT when they should be using SystemRandom because they don't know any better. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From mal at egenix.com Tue Sep 15 19:56:14 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Sep 2015 19:56:14 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8293E.2070307@egenix.com> Message-ID: <55F85BBE.2040404@egenix.com> On 15.09.2015 16:54, Sturla Molden wrote: > On 15/09/15 16:20, M.-A. Lemburg wrote: > >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/M062821.pdf >> (which again refers to this paper: >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/ARTICLES/HONGKONG/hong-fin4.pdf) > > You seem to be confusing the DCMT with the SFMT which is a fast SIMD friendly Mersenne Twister. I was talking about the SFMT, which is a variant of the MT for processors with SIMD instruction sets (most CPUs have these nowadays) and which has 32-, 64-bit or floating point output: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html But what I really wanted to reference was the discussion in the SFMT paper about the practical effects of the 623-dim equidistribution (see the end of the first paper I quoted; the discussion references the second paper). > The DCMT is intended for using the Mersenne Twister in parallel computing (i.e. one Mersenne Twister > per processor). It is not a Mersenne Twister accelerated with parallel hardware. That would be the > SFMT. > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf > > The period for the DC Mersenne Twisters they report are long enough, e.g. 2^127-1 or 2^521-1, but > much shorter than the period of MT19937 (2^19937-1). This does not matter because the period of > MT19937 is excessive. In scientific computing, the sequence is long enough for most practical > purposes if it is larger than 2^64. 2^127-1 is more than enough, and this is the shortest period > DCMT reported in the paper. So do we care? Probably not. Thanks for the pointers. I wasn't aware of a special MT variant for parallel computing. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 15 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 3 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tim.peters at gmail.com Tue Sep 15 20:05:44 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 15 Sep 2015 13:05:44 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <1441821254.2853664.379081313.43B3886D@webmail.messagingengine.com> <1441824901.2867280.379140585.4F25667D@webmail.messagingengine.com> <20150909190757.GM19373@ando.pearwood.info> <55F0BF61.6050205@canterbury.ac.nz> <55F13EAF.5040500@egenix.com> <55F1B219.1000502@egenix.com> <87y4gdzp2d.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: ... [Nathaniel Smith ] >>> Here you go: >>> https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf [Tim, on appendix D] >> ... >> My question is, even if /dev/urandom is available, they're _not_ >> content to use that.alone. They continue to mix it up with all other >> kinds of silly stuff. So why do they trust urandom less than >> OpenSLL's gimmick? [Nathaniel] > Who knows why they wrote the code in that exact way. /dev/urandom is fine. Presumably _they_ know, yes? When they're presenting a "gold standard" all-purpose token generator, it would be nice if they had taken care to make every step clear to other crypto wonks. Otherwise the rest of us are left wondering if _anyone_ really knows what they're talking about ;-) [on the MT state solver] >> ... >> It remained unclear to me _exactly_ what "in the presence of non >> consecutive outputs" is supposed to mean. In the only examples, they >> knew exactly how many times MT was called. "Non consecutive" in all >> those contexts appeared to mean "but we couldn't observe_any_ output >> bits in some cases - the ones we could know something about were >> sometimes non-consecutive". So in the MT output sequence, they had no >> knowledge of _some_ of the outputs, but they nevertheless knew exactly >> _which_ of the outputs they were wholly ignorant about. >> >> That's no problem for the equation solver. They just skip adding any >> equations for the affected bits, keep collecting more outputs and >> potentially "wrap around", probably leading to an overdetermined >> system in the end. >> >> But Python doesn't work the way PHP does here. As explained in >> another message, in Python you can have _no idea_ how many MT outputs >> are consumed by a single .choice() call. In the PHP equivalent, you >> always consume exactly one MT output. PHP's method suffers >> statistical bias, but under the covers Python uses an accept/reject >> method to avoid that. Any number of MT outputs may be (invisibly!) >> consumed before "accept" is reached, although typically only one or >> two. You can deduce some of the leading MT output bits from the >> .choice() result, but _only_ for the single MT output .choice() >> reveals anything about. About the other MT outputs it may consume, >> you can't even know that some _were_ skipped over, let alone how many. > This led me to look at the implementation of Python's choice(), and > it's interesting; I hadn't realized that it was using such an > inefficient method. Speed is irrelevant here, but do note that .choice() isn't restricted to picking from less than 2**32 possibilities. >>> random.choice(range(2**62)) 2693408174642551707 Special-casing is done at Python speed, and adding a Python-level branch to ask "is it less than 2**32?" is typically more expensive than calling MT again. > (To make a random selection between, say, 36 > items, it rounds up to 64 = 2**6, draws a 32-bit sample from MT, > discards 26 of the bits (!) to get a number between 0-63, and then > repeats until this number happens to fall in the 0-35 range, so it > rejects with probability ~0.45. A more efficient algorithm is the one > that it uses if getrandbits is not available, where it uses all 32 > bits and only rejects with probability (2**32 % 36) / (2**32) = > ~1e-9.) I guess this does add a bit of obfuscation. And note that the branch you like better _also_ needs another Python-speed test to raise an exception if the range is "too big". The branch actually used doesn't need that. > OTOH the amount of obfuscation is very sensitive to the size of the > password alphabet. If I use uppercase + lowercase + digits, that gives > me 62 options, so I only reject with probability 1/32, and I can > expect that any given 40-character session key will contain zero skips > with probability ~0.28, and that reveals 240 bits of seed. To be quite clear, nothing can ever be learned about "the seed". All that can be observed is bits from the outputs, and the goal is to deduce "the state". There is an invertible permutation mapping a state word to an output word, but from partial knowledge of N < 32 bits from a single output word you typically can't deduce N bits of the state word from which the output was derived (for example, if you only know the first bit of an output, you can't deduce from that alone the value of any bit in the corresponding state word). That's why they need such a hairy framework to begin with (in the example, you _can_ add equations tying the known value of the first output bit to linear functions of all bits of the state related to the first output bit). About the "240 bits", they need about 80 times more than that to deduce the state. 0.28 ** 80 is even less than a tenth ;-) > I don't have time right now to go look up the MT equations to see how > easy it is to make use of such partial information, It's robust against only knowing a subset of an output's bits (including none). It's not robust against not knowing _which_ output you're staring at. They label the state bits with variables x_0 through x_19936, and generate equations relating specific state bits to the output bits they can deduce. If they don't know how many times MT was invoked between outputs they know were generated, they can't know _which_ state-bit variables to plug into their equations. Is this output related to (e.g.) at least x_0 through x_31, or is it x_32 through x_63? x_64 through x_95? Take an absurd extreme to illustrate an obvious futility in general: suppose .choice() remembered the size of the last range it was asked to pick from. Then if the next call to .choice() is for the same-sized range, call MT 2**19937-2 times ignoring the outputs, and call it once more. It will get the same result then. "The solver" will deduce exactly the same output bits every time, and will never learn more than that. Eventually, if they're doing enough sanity checking, the best their solver could do is notice that the derived equations (regardless of how they construct them) are inconsistent. The worst it could do is "deduce" a state that's pure fantasy. They can't know that they are in fact seeing an output derived from exactly the same state every time. Unless they read the source code for .choice() and see it's playing this trick. in that case, they would never add to their collection of equations after the first output was dealt with. _Then_ the equations would faithfully reflect the truth: that they learned a tiny bit at first, but never learn more than just that. >There but there certainly are lots of real-world weaponized exploits that begin with > something like "first, gather 10**8 session keys...". I certainly > wouldn't trust it. And I'm not asking you to. I wouldn't either. I'm expressing skepticism about that the solver in this paper is a slam-dunk proof that all existing idiomatic Python password generators are about to cause the world to end ;-) > Also, if I use the base64 or hex alphabets, then the probability of > rejection is 0, and I can deterministically read off bits from the > underlying MT state. I' readily agree that if .choice(x) is used whenever len(x) is a power of two, then their solver applies directly to such cases. It's in all and only such cases they can know exactly how many MT outputs were consumed, and so know also which specific state-bit variables to use. > (Alternatively, if someone in the future makes > the obvious optimization As above, if what you like better were obviously faster in reality, it would have been written that way already ;-) Honest, this looks like Raymond Hettinger's code, and he typically obsesses over speed. > to choice(), then it will basically stop rejecting in practice, and again > it becomes trivial to read off all the bits from the underlying MT state.) Not trivial. You still need this hairy solver framework, and best I can tell its code wasn't made available. Note that the URL given in the paper gives a 404 error now. It isn't trivial to code it either. > The point of "secure by default" is that you don't have to spend all > these paragraphs doing the math to try and guess whether some RNG > usage might maybe be secure; it just is secure. No argument there! I'm just questioning how worried people "should be" over what actual Python code actually does now. I confess I remain free of outright panic ;-) >> Best I can tell, that makes a huge difference to whether their solver >> is even applicable to cracking idiomatic "password generators" in >> Python. You can't know which variables correspond to the bits you can >> deduce. You could split the solver into multiple instances to cover >> all feasible possibilities (for how many MT outputs may have been >> invisibly consumed), but the number of solver instances needed then >> grows exponentially with the number of outputs you do see something >> about. In the worst case (31 bits are truncated), they need over >> 19000 outputs to deduce the state. Even a wildly optimistic "well, >> let's guess no more than 1 MT output was invisibly rejected each time" >> leads to over 2**19000 solver clones then. > Your "wildly optimistic" estimate is wildly conservative under > realistic conditions. Eh. > How confident are you that the rest of your > analysis is totally free of similar errors? Would you willing to bet, > say, the public revelation of every website you've visited in the last > 5 years on it? I couldn't care less if that were revealed. In fact, I'd enjoy the trip down memory lane ;-) > ... > Grant money is a drop in the bucket of security research funding these > days. Criminals and governments have very deep pockets, and it's well > documented that there are quite a few people with PhDs who make their > living by coming up with exploits and then auctioning them on the > black market. Excellent points! Snideness doesn't always pay off for me ;-) > BTW, it looks like that PHP paper was an undergraduate project. You > don't need a PhD to solve linear equations :-). So give 'em their doctorates! I've seen doctoral theses a hundred times less substantial ;-) > ... > No, SystemRandom.choice is certainly fine. But people clearly don't > use it, so it's fine-ness doesn't matter that much in practice... It's just waiting for a real exploit. People writing security papers love to "name & shame". "Gothca! Gotcha!" Once people see there _is_ "a real problem" (if there is), they'll scramble to avoid being the target of the next name-&-shame campaign. Before then, they're much too busy trying to erase all traces of every website they've visited in the last 5 years ;-) From mal at egenix.com Tue Sep 15 20:19:05 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 15 Sep 2015 20:19:05 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> Message-ID: <55F86119.6000000@egenix.com> On 15.09.2015 13:41, Nathaniel Smith wrote: > On Tue, Sep 15, 2015 at 1:45 AM, M.-A. Lemburg wrote: >> On 15.09.2015 09:36, Nathaniel Smith wrote: >>> >>> [Using empirical tests to check RNGs] >>> >>> Obviously the thing the scientists worry about is a *strict* subset of >>> what the cryptographers are worried about. >> >> I think this explains why we cannot make ends meet: >> >> A scientist wants to be able to *repeat* a simulation in exactly the >> same way without having to store GBs of data (or send them to colleagues >> to have them very the results). >> >> Crypto RNGs cannot provide this feature per design. >> >> What people designing PRNGs are after is to improve the statistical >> properties of these PRNGs while still maintaining the repeatability >> of the output. >> >>> This is why it is silly to >>> worry that a crypto RNG will cause problems for a scientific >>> simulation. The cryptographers take the scientists' real goal -- the >>> correctness of arbitrary programs like e.g. a monte carlo simulation >>> -- *much* more seriously than the scientists themselves do. (This is >>> because scientists need RNGs to do their real work, whereas for >>> cryptographers RNGs are their real work.) >> >> Yes, cryptographers are the better folks, understood. These arguments >> are not really helpful. They are not even arguments. > > Err... I think we're arguing past each other. (Hint: I'm a scientist, > not a cryptographer ;-).) > > My email was *only* trying to clear up the argument that keeps popping > up about whether or not a cryptographic RNG could introduce bias in > simulations etc., as compared to the allegedly-better-behaved Mersenne > Twister. (As in e.g. your comment upthread that "[MT] is proven to be > equidistributed which is a key property needed for it to be used as > basis for other derived probability distributions".) Ok, thanks for the clarification. > This argument is > incorrect -- equidistribution is not a guarantee that an RNG will > produce good results when deriving other probability distributions, > and in general cryptographic RNGs will produce as-or-better results > than MT in terms of correctness of output. On this particular axis, > using a cryptographic RNG is not at all dangerous. You won't get me to agree on "statistical tests are better than mathematical proofs", so let's call it a day :-) > Obviously this is only one of the considerations in choosing an RNG; > the quality of the randomness is totally orthogonal to considerations > like determinism. > > (Cryptographers also have deterministic RNGs -- they call them "stream > ciphers" -- and these will also meet or beat MT in any practically > relevant test of correctness for the same reasons I outlined, despite > not being provably equidistributed. Of course there are then yet other > trade-offs like speed. But that's not really relevant to this thread, > because no-one is proposing replacing MT as the standard deterministic > RNG in Python; I'm just trying to be clear about how one judges the > quality of randomness that an RNG produces.) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 15 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 3 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 11 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From guido at python.org Tue Sep 15 20:21:20 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Sep 2015 11:21:20 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On Tue, Sep 15, 2015 at 10:50 AM, Donald Stufft wrote: > On September 15, 2015 at 1:34:56 PM, Guido van Rossum (guido at python.org) > wrote: > > > I am fine with adding more secure ways of generating random numbers. > > But we already have random.SystemRandom(), so there doesn?t > > seem to be a hurry? > > The problem isn't so much that there isn't a way of securely generating > random > numbers, but that the module, as it is right now, guides you towards using > an > insecure source of random numbers rather than a secure one. This means that > unless you're familar with the random module or reading the online > documentation you don't really have any idea that ``random.random()`` isn't > secure. This is an attractive nuisance for anyone who *doesn't* need > deterministic output from their random numbers and leads to situations > where > people are incorrectly using MT when they should be using SystemRandom > because > they don't know any better. > That feels condescending, as does the assumption that (almost) every naive use of randomness is somehow a security vulnerability. The concept of secure vs. insecure sources of randomness isn't *that* hard to grasp. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Tue Sep 15 20:25:39 2015 From: random832 at fastmail.com (Random832) Date: Tue, 15 Sep 2015 14:25:39 -0400 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> On Tue, Sep 15, 2015, at 13:33, Guido van Rossum wrote: > I don?t want to change this API and I don?t want to introduce deprecation > warnings ? the API is fine, and the warnings will be as ineffective as > the > warnings in the documentation. The output of random.random today when it's not seeded / seeded with None isn't _really_ deterministic - you can't reproduce it, after all, without modifying the code (though in principle you could do seed(None)/getstate the first time and then setstate on subsequent executions - it may be worth supporting this use case?) - so changing it isn't likely to affect anyone - anyone needing MT is likely to also be using the seed functions. > random.set_random_generator() What do you think of having calls to seed/setstate(/getstate?) implicitly switch (by whatever mechanism) to MT? This could be done without a deprecation warning, and would allow existing code that relies on reproducible values to continue working without modification? [indirection in global functions]... > (and similar for all related functions). global getstate/setstate should also save/replace the _inst or its type; at least if it's a different type than it was at the time the state was saved. For backwards compatibility in case these are pickled it could use the existing format when _inst is the current MT implementation, and accept these in setstate. > It would also be fine for SystemRandom (or > at > least whatever is used by use_secure_random(), if SystemRandom cannot > change for backward compatibility reasons) to raise an exception when > seed(), setstate() or getstate() are called. SystemRandom already raises an exception when getstate and setstate are called. From random832 at fastmail.com Tue Sep 15 20:34:04 2015 From: random832 at fastmail.com (Random832) Date: Tue, 15 Sep 2015 14:34:04 -0400 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: <1442342044.575954.384484633.18794172@webmail.messagingengine.com> On Tue, Sep 15, 2015, at 14:25, Random832 wrote: > I made an editing mistake that may have made it hard to follow my post. This paragraph: > The output of random.random today when it's not seeded / seeded with > None isn't _really_ deterministic - you can't reproduce it, after all, > without modifying the code (though in principle you could do > seed(None)/getstate the first time and then setstate on subsequent > executions - it may be worth supporting this use case?) - so changing it > isn't likely to affect anyone - anyone needing MT is likely to also be > using the seed functions. Should have been _after_ this one: > What do you think of having calls to seed/setstate(/getstate?) > implicitly switch (by whatever mechanism) to MT? This could be done > without a deprecation warning, and would allow existing code that relies > on reproducible values to continue working without modification? From guido at python.org Tue Sep 15 20:36:10 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Sep 2015 11:36:10 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: On Tue, Sep 15, 2015 at 11:25 AM, Random832 wrote: > On Tue, Sep 15, 2015, at 13:33, Guido van Rossum wrote: > > I don?t want to change this API and I don?t want to introduce deprecation > > warnings ? the API is fine, and the warnings will be as ineffective as > > the > > warnings in the documentation. > > The output of random.random today when it's not seeded / seeded with > None isn't _really_ deterministic - you can't reproduce it, after all, > without modifying the code (though in principle you could do > seed(None)/getstate the first time and then setstate on subsequent > executions - it may be worth supporting this use case?) Yes, that's how I would do it (better than using a weak seed). > - so changing it > isn't likely to affect anyone - anyone needing MT is likely to also be > using the seed functions. > Or they could just make a lot of random() calls and find their performance down the drain (like what happened in the tracker issue that started all this: http://bugs.python.org/issue25003). > > random.set_random_generator() > > What do you think of having calls to seed/setstate(/getstate?) > implicitly switch (by whatever mechanism) to MT? This could be done > without a deprecation warning, and would allow existing code that relies > on reproducible values to continue working without modification? > I happen to believe that MT's performance is a feature of the (default) API, and this would still be considered breakage (again, as in that issue). [indirection in global functions]... > > (and similar for all related functions). > > global getstate/setstate should also save/replace the _inst or its type; > at least if it's a different type than it was at the time the state was > saved. For backwards compatibility in case these are pickled it could > use the existing format when _inst is the current MT implementation, and > accept these in setstate. > > > It would also be fine for SystemRandom (or > > at > > least whatever is used by use_secure_random(), if SystemRandom cannot > > change for backward compatibility reasons) to raise an exception when > > seed(), setstate() or getstate() are called. > > SystemRandom already raises an exception when getstate and setstate are > called. > Great! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Sep 15 21:43:30 2015 From: mertz at gnosis.cx (David Mertz) Date: Tue, 15 Sep 2015 12:43:30 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: I commonly use random.some_distribution() as a quick source of "randomness" knowing full well that it's not cryptographic. Moreover, I usually do so initially without setting a seed. The first question I want to answer is "does this random process behave roughly as I expect?" But in the back of my mind is always the thought, "If/when I want to reuse this I'll add a seed for reproducibility". It would never occur to me to reach for the random module if I want to do cryptography. It's a good and well established API that currently exists. Sure, add a submodule random.crypto (or whatever name), but I'm -1 on changing anything whatsoever on the module functions that are well known. On Sep 15, 2015 11:26 AM, "Random832" wrote: > On Tue, Sep 15, 2015, at 13:33, Guido van Rossum wrote: > > I don?t want to change this API and I don?t want to introduce deprecation > > warnings ? the API is fine, and the warnings will be as ineffective as > > the > > warnings in the documentation. > > The output of random.random today when it's not seeded / seeded with > None isn't _really_ deterministic - you can't reproduce it, after all, > without modifying the code (though in principle you could do > seed(None)/getstate the first time and then setstate on subsequent > executions - it may be worth supporting this use case?) - so changing it > isn't likely to affect anyone - anyone needing MT is likely to also be > using the seed functions. > > > random.set_random_generator() > > What do you think of having calls to seed/setstate(/getstate?) > implicitly switch (by whatever mechanism) to MT? This could be done > without a deprecation warning, and would allow existing code that relies > on reproducible values to continue working without modification? > > [indirection in global functions]... > > (and similar for all related functions). > > global getstate/setstate should also save/replace the _inst or its type; > at least if it's a different type than it was at the time the state was > saved. For backwards compatibility in case these are pickled it could > use the existing format when _inst is the current MT implementation, and > accept these in setstate. > > > It would also be fine for SystemRandom (or > > at > > least whatever is used by use_secure_random(), if SystemRandom cannot > > change for backward compatibility reasons) to raise an exception when > > seed(), setstate() or getstate() are called. > > SystemRandom already raises an exception when getstate and setstate are > called. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Sep 15 22:18:44 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Sep 2015 13:18:44 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: How about the following. We add a fast secure random generator to the stdlib as an option, and when it has proven its worth a few releases from now we consider again whether the default random() can be made secure without breaking anything. On Tue, Sep 15, 2015 at 12:43 PM, David Mertz wrote: > I commonly use random.some_distribution() as a quick source of > "randomness" knowing full well that it's not cryptographic. Moreover, I > usually do so initially without setting a seed. > > The first question I want to answer is "does this random process behave > roughly as I expect?" But in the back of my mind is always the thought, > "If/when I want to reuse this I'll add a seed for reproducibility". It > would never occur to me to reach for the random module if I want to do > cryptography. > > It's a good and well established API that currently exists. Sure, add a > submodule random.crypto (or whatever name), but I'm -1 on changing anything > whatsoever on the module functions that are well known. > On Sep 15, 2015 11:26 AM, "Random832" wrote: > >> On Tue, Sep 15, 2015, at 13:33, Guido van Rossum wrote: >> > I don?t want to change this API and I don?t want to introduce >> deprecation >> > warnings ? the API is fine, and the warnings will be as ineffective as >> > the >> > warnings in the documentation. >> >> The output of random.random today when it's not seeded / seeded with >> None isn't _really_ deterministic - you can't reproduce it, after all, >> without modifying the code (though in principle you could do >> seed(None)/getstate the first time and then setstate on subsequent >> executions - it may be worth supporting this use case?) - so changing it >> isn't likely to affect anyone - anyone needing MT is likely to also be >> using the seed functions. >> >> > random.set_random_generator() >> >> What do you think of having calls to seed/setstate(/getstate?) >> implicitly switch (by whatever mechanism) to MT? This could be done >> without a deprecation warning, and would allow existing code that relies >> on reproducible values to continue working without modification? >> >> [indirection in global functions]... >> > (and similar for all related functions). >> >> global getstate/setstate should also save/replace the _inst or its type; >> at least if it's a different type than it was at the time the state was >> saved. For backwards compatibility in case these are pickled it could >> use the existing format when _inst is the current MT implementation, and >> accept these in setstate. >> >> > It would also be fine for SystemRandom (or >> > at >> > least whatever is used by use_secure_random(), if SystemRandom cannot >> > change for backward compatibility reasons) to raise an exception when >> > seed(), setstate() or getstate() are called. >> >> SystemRandom already raises an exception when getstate and setstate are >> called. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 16 02:43:36 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 15 Sep 2015 19:43:36 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F8588E.7010106@egenix.com> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8588E.7010106@egenix.com> Message-ID: ... [Marc-Andre] > Ah, now we're getting somewhere :-) > > If we accept that non-guessable, but deterministic is a good > compromise, then adding a cipher behind MT sounds like a reasonable > way forward, even as default. > > For full crypto strength, people would still have to rely on > solutions like /dev/urandom or the OpenSSL one (or reseed the > default RNG every now and then). All others get the benefit of > non-guessable, but keep the ability to seed the default RNG in > Python. I expect the only real reason "new entropy" is periodically mixed in to OpenBSD's arc4random() is to guard against that a weakness in ChaCha20 may be discovered later. If there were any known computationally feasible way whatsoever to distinguish bare-bones ChaCha20's output from a "truly random" sequence, it wouldn't be called "crypto" to begin with. But reseeding MT every now and again is definitely not suitable for crypto purposes. You would need to reseed at least every 624 outputs, and from a crypto-strength seed source. In which case, why bother with MT at all? You could just as well use the crypto source directly. > Is there some research on this (MT + cipher or hash) ? Oh, sure. MT's creators noted from the start that it would suffice to run MT's outputs through a crypto hash (like your favorite flavor of SHA). That's just as vulnerable to "poor seeding" attacks as plain MT, but it's computationally infeasible to deduce the state from any number of hashed outputs (unless your crypto hash is at least partly invertible, in which case it's not really a crypto hash ;-) ).; For other approaches, search for CryptMT. MT's creators suggested a number of other schemes over the years. The simplest throws away the "tempering" part of MT (the 4 lines that map the raw state word into a mildly scrambled output word - not because it needs to be thrown away, but because they think it would no longer be needed given what follows). Then one byte is obtained via grabbing the next MT 32-bit output, folding it into a persistent accumulator via multiplication, and just revealing the top byte: accum = some_odd_integer while True: accum *= random.getrandbits(32) | 1 yield accum >> 24 I did see one paper suggesting it was possible to distinguish the output of that from a truly random sequence given 2**50 consecutive outputs (but that's all - still no way to deduce the state). From tim.peters at gmail.com Wed Sep 16 02:49:33 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 15 Sep 2015 19:49:33 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8588E.7010106@egenix.com> Message-ID: [Tim, on CryptMT] > I did see one paper suggesting it was possible to distinguish the > output of that from a truly random sequence given 2**50 consecutive > outputs (but that's all - still no way to deduce the state). Sorry: not 2**50 consecutive outputs (which are bytes), but 2**50 consecutive output bits, so only 2**47 outputs. From tim.peters at gmail.com Wed Sep 16 02:55:00 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 15 Sep 2015 19:55:00 -0500 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8588E.7010106@egenix.com> Message-ID: [Tim] > ... > Oh, sure. MT's creators noted from the start that it would suffice to > run MT's outputs through a crypto hash (like your favorite flavor of > SHA). That's just as vulnerable to "poor seeding" attacks as plain > MT, but it's computationally infeasible to deduce the state from any > number of hashed outputs Although what's "computationally feasible" may well have changed since then! These days I expect even a modestly endowed attacker could afford to store an exhaustive table of the 2**32 possible outputs and their corresponding hashes. Then the hashes are 100% invertible via simple lookup, so are no better than not hashing at all. From stephen at xemacs.org Wed Sep 16 03:16:59 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 16 Sep 2015 10:16:59 +0900 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > > This is an attractive nuisance for anyone who *doesn't* need > > deterministic output from their random numbers and leads to > > situations where people are incorrectly using MT when they should > > be using SystemRandom because they don't know any better. > > That feels condescending, It is, but it's also accurate: there's plenty of anecdotal evidence that this actually happens, specifically that most of the recipes for password generation on SO silently fall back to a deterministic PRNG if SystemRandom is unavailable, and the rest happily start with random.random. Not only are people apparently doing a wrong thing here, they are eagerly teaching others to do the same. (There's also the possibility that the bad guys are seeding SO with backdoors in this way, I guess.) > as does the assumption that (almost) every naive use of randomness > is somehow a security vulnerability. This is a strawman. None of the advocates of this change makes that assumption. The advocates proceed from the (basically unimpeachable) assumptions that (1) the attacker only has to win once, and (2) they are out there knocking on a lot of doors. Then the questionable assumption is that (3) the attackers are knocking on *this* door. RC4 was at one time one of the best crypto algorithms available, but it also induced the WEP fiasco, and a scramble for a new standard. The question is whether we wait for a "Python security fiasco" to do something about this situation. Waiting *is* an option; the arguments that RNGs won't be a "Python security fiasco" before Python 4 is released are very plausible[1], and the overhead of a compatibility break is not negligible (though Paul Moore himself admits it's probably not huge, either). But the risk of a security fiasco (probably in a scenario not mentioned in this thread) is real. The arguments of the opponents of the change amount to "I have confirmed that the probability it will happen to me is very small, therefore the probability it will happen to anyone is small", which is, of course, a fallacy. > The concept of secure vs. insecure sources of randomness isn't > *that* hard to grasp. Once one *tries*. Read some of Paul Moore's posts, and you will discover that the very mention of some practice "improving security" immediately induces a non-trivial subset of his colleagues to start thinking about how to avoid doing it. I am almost not kidding; according to his descriptions, the situation in the trenches is very nearly that bad. Security is evidently hated almost as much as spam. If random.random were to default to an unseedable nondeterministic RNG, the scientific users would very quickly discover that (if not on their own, when their papers get rejected). On the other hand, inappropriate uses are nowhere near so lucky. In the current situation, the programs Just Work Fine (they produce passwords that no human would choose for themselves, for example), and noone is the wiser unless they deliberately seek the information. It seems to me that, given the "in your face" level of discoverability that removing the state-access methods would provide, backward compatibility with existing programs is the only real reason not to move to "secure" randomness by default. In fact "secure" randomness is *higher*-quality for any purpose, including science. Footnotes: [1] Cf. Tim Peters' posts especially, they're few and where the information content is low the humor content is high. ;-) From stephen at xemacs.org Wed Sep 16 03:30:21 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 16 Sep 2015 10:30:21 +0900 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87oah3xiaa.fsf@uwakimon.sk.tsukuba.ac.jp> Sorry for the self-followup; premature send. Stephen J. Turnbull writes: > In fact "secure" randomness is *higher*-quality for any purpose, > including science. It does need to be acknowedged that scientists need replicability for unscientific reasons: (1) some "scientists" lie (cf. the STAP cell controversy), and (2) as a regression test for their simulation software. But an exact replication of an "honest" simulation is scientifically useless! From stephen at xemacs.org Wed Sep 16 04:22:58 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 16 Sep 2015 11:22:58 +0900 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> A pseudo-randomly selected recent quote: > It would never occur to me to reach for the random module if I want > to do cryptography. It's sad that so many of the opponents of this change make this kind of comment sooner or later. Security is (rarely) about *any* of *us*! Most of *us* don't need it (if we do, our physical or corporate security has already been compromised), most of *us* understand it, a somewhat smaller fraction of *us* behave in habitually secure ways (at the level we practice oral hygiene, say). That doesn't mean that security has to be #1 always and everywhere in designing Python, but I find it pretty distressing that apparently a lot of people either don't understand or don't care about what's at stake in these kinds of decisions *for the rest of the world*. The reality is that security that is not on by default is not secure. Any break in a dike can flood a whole town. The flip side is that security has costs, specifically the compatibility break, and since security needs to be on by default, the aggregate burden should be *presumed large* (even a small burden is spread over many users). Nevertheless, I think that the arguments to justify this change are pretty good: (1) The cost of adapting per program seems small, and seems to restricted to a class of users (software engineers doing regression testing and scientists doing simulations) who probably can easily make the change locally. Nick's proto-PEP is specifically designed so that there will be no cost to naive users (kids writing games etc) who don't need access to state. Caveat: there may be a performance hit for some naive users. That can probably be avoided with an appropriate choice of secure RNG, but that hasn't actually been benchmarked AFAIK. (2) ISTM there are no likely attack vectors due to choice of default RNG in random.random, based on Tim's analysis, but AFAICS he's unwilling to say it's implausible that they exist. (Sorry for the double negative!) I take this to mean that there may be real risk. (3) The anecdotal evidence that the module's current default is frequently misused is strong (the StackOverflow recipes for password generation). Two out of three ain't bad. YMMV, of course. From tim.peters at gmail.com Wed Sep 16 05:14:12 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 15 Sep 2015 22:14:12 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Stephen J. Turnbull ] > ... > (2) ISTM there are no likely attack vectors due to choice of default > RNG in random.random, based on Tim's analysis, but AFAICS he's > unwilling to say it's implausible that they exist. (Sorry for the > double negative!) I take this to mean that there may be real risk. Oh, _many_ attacks are possible. Many are even plausible. For example, while Python's _default_ seeding is based on urandom() setting MT's entire massive state (no more secure way exists), a user making up their own seed is quite likely to do so in a way vulnerable to a "poor seeding" attack. "Password generators" should be the least of our worries. Best I can tell, the PHP paper's highly technical MT attack against those has scant chance of working in Python except when random.choice(x) is known to have len(x) a power of 2. Then it's a very powerful attack. But in PHP's idiomatic way of spelling random.choice(x) ("by hand", spelled out in the paper), it's _always_ a very powerful attack. In general, the more technical the attack, the more details matter. It's just no _fun_ to drone on about simple universally applicable brute-force attacks, so I'll continue to drone on about the PHP paper's sophisticated MT state-deducer ;-) From ncoghlan at gmail.com Wed Sep 16 05:40:46 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Sep 2015 13:40:46 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On 16 September 2015 at 03:33, Guido van Rossum wrote: > I had to check out of the mega-threads, but I really don't like the outcome > (unless this PEP is just the first of several competing proposals). > > The random module provides a useful interface ? a random() function and a > large variety of derived functionality useful for statistics programming > (e.g. uniform(), choice(), bivariate(), etc.). Many of these have > significant mathematical finesse in their implementation. They are all > accessing shared state that is kept in a global variable in the module, and > that is a desirable feature (nobody wants to have to pass an extra variable > just so you can share the state of the random number generator with some > other code). > > I don?t want to change this API and I don?t want to introduce deprecation > warnings ? the API is fine, and the warnings will be as ineffective as the > warnings in the documentation. The proposed runtime warnings are just an additional harder to avoid nudge for folks that don't read the documentation, so I'd be OK with dropping them from the proposal. However, it also occurs to me there may be a better solution to eliminating them than getting people to change their imports: add a "random.ensure_seedable()" API that flips the default instance to the deterministic RNG without triggering the warning. For applications that genuinely want the determinism, warnings free 3.6+ compatibility would then look like: if hasattr(random, "ensure_seedable"): random.ensure_seedable() > I am fine with adding more secure ways of generating random numbers. But we > already have random.SystemRandom(), so there doesn?t seem to be a hurry? > > How about we make one small change instead: a way to change the default > instance used by the top-level functions in the random module. Say, > > random.set_random_generator() That was my previous proposal. The problem with it is that it's much harder to test and support, as you have to allow for the global instance changing multiple times, and in multiple different directions. With the proposal in the PEP, there's only a single idempotent change that's possible: from the system RNG (used by default to eliminate the silent security failure) to the seedable RNG (needed for reproducibility). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Sep 16 06:00:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Sep 2015 14:00:22 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 16 September 2015 at 11:16, Stephen J. Turnbull wrote: > Guido van Rossum writes: > > The concept of secure vs. insecure sources of randomness isn't > > *that* hard to grasp. > > Once one *tries*. Read some of Paul Moore's posts, and you will > discover that the very mention of some practice "improving security" > immediately induces a non-trivial subset of his colleagues to start > thinking about how to avoid doing it. I am almost not kidding; > according to his descriptions, the situation in the trenches is very > nearly that bad. Security is evidently hated almost as much as spam. Yep, hence things like http://stopdisablingselinux.com/ SELinux in enforcing mode operates on a very simple principle: we should know what system resources we expect our applications to access, and we should write that down in a form the computer understands so it can protect us against attackers trying to use that application to do something unintended (like steal user information). However, what we've realised as an industry is that effective security systems have to be *transparent* and they have to be *natural*. So in a containerised world, SELinux isolates containers from each other, but if you're writing code that runs *in* the container, you don't need to worry about it - from inside the container, it looks like SELinux isn't running. The traditional security engineering approach of telling people "You're doing it wrong" just encourages them to avoid talking to security people [1], rather than encouraging them to improve their practices [2]. Hence the proposal in PEP 504 - my goal is to make the default behaviour of the random module cryptographically secure, *without* unduly affecting the use cases that need reproducibility rather than cryptographic security, while still providing at least a nudge in the direction of promoting security awareness. Changing the default matters more to me than the nudge, so I'd be prepared to drop that part. Regards, Nick. [1] http://sobersecurity.blogspot.com.au/2015/09/everyone-is-afraid-of-us.html [2] http://sobersecurity.blogspot.com.au/2015/09/being-nice-security-person.html -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Wed Sep 16 06:09:49 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 15 Sep 2015 21:09:49 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: On Sep 15, 2015 1:19 PM, "Guido van Rossum" wrote: > > How about the following. We add a fast secure random generator to the stdlib as an option, and when it has proven its worth a few releases from now we consider again whether the default random() can be made secure without breaking anything. If we have a fast secure RNG, then the standard Random object might as well at least use it by default until someone actually sets or reads the state (and then switch to MT at that point). Until one of these events happens, the two RNGs are indistinguishable, and this would be a 100% backwards compatible change. (It might even make sense to backport to 2.7.) The limitation is that if library A uses the global random object without seeding in a security sensitive context, and library B uses seeding, then a program that just uses library A will be secure, but if it then starts using library B it will become insecure. But this is still better than the current situation where library A is always insecure. The only case where this would actually have a downside compared to status quo (assuming arc4random lives up to it's reputation for speed etc) is if people start assuming that the default random object is in fact secure and intentionally choosing to use it in security sensitive situations. But hopefully people who know enough to realize that this is a decision they need to make will also read the docs where it clearly states that this is only a best-effort kind of hardening mechanism and that using random.Random/the global methods for cryptographic purposes is still a bug. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Sep 16 06:12:44 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Sep 2015 21:12:44 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On Tue, Sep 15, 2015 at 8:40 PM, Nick Coghlan wrote: > On 16 September 2015 at 03:33, Guido van Rossum wrote: > > I had to check out of the mega-threads, but I really don't like the > outcome > > (unless this PEP is just the first of several competing proposals). > > > > The random module provides a useful interface ? a random() function and a > > large variety of derived functionality useful for statistics programming > > (e.g. uniform(), choice(), bivariate(), etc.). Many of these have > > significant mathematical finesse in their implementation. They are all > > accessing shared state that is kept in a global variable in the module, > and > > that is a desirable feature (nobody wants to have to pass an extra > variable > > just so you can share the state of the random number generator with some > > other code). > > > > I don?t want to change this API and I don?t want to introduce deprecation > > warnings ? the API is fine, and the warnings will be as ineffective as > the > > warnings in the documentation. > > The proposed runtime warnings are just an additional harder to avoid > nudge for folks that don't read the documentation, so I'd be OK with > dropping them from the proposal. Good, because I really don't want the warnings, nor the hack based on whether you call any of the seed/state-related methods. > However, it also occurs to me there > may be a better solution to eliminating them than getting people to > change their imports: add a "random.ensure_seedable()" API that flips > the default instance to the deterministic RNG without triggering the > warning. > > For applications that genuinely want the determinism, warnings free > 3.6+ compatibility would then look like: > > if hasattr(random, "ensure_seedable"): > random.ensure_seedable() > I don't believe that seedability is the only thing that matters. MT is also over an order of magnitude faster than os.urandom() or SystemRandom. > > I am fine with adding more secure ways of generating random numbers. But > we > > already have random.SystemRandom(), so there doesn?t seem to be a hurry? > > > > How about we make one small change instead: a way to change the default > > instance used by the top-level functions in the random module. Say, > > > > random.set_random_generator() > > That was my previous proposal. The problem with it is that it's much > harder to test and support, as you have to allow for the global > instance changing multiple times, and in multiple different > directions. > Actually part of my proposal was a use_secure_random() that was also a one-way flag flip, just in the opposite direction. :-) With the proposal in the PEP, there's only a single idempotent change > that's possible: from the system RNG (used by default to eliminate the > silent security failure) to the seedable RNG (needed for > reproducibility). > I'd be much more comfortable if in 3.6 we only introduced a new way to generate secure random numbers that was as fast as MT. Once that has been in use for a few releases we may have a discussion about whether it's time to make it the default. Security isn't served well by panicky over-reaction. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Sep 16 06:15:22 2015 From: guido at python.org (Guido van Rossum) Date: Tue, 15 Sep 2015 21:15:22 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: Clearly I need to mute this thread too. :-( On Tue, Sep 15, 2015 at 9:09 PM, Nathaniel Smith wrote: > On Sep 15, 2015 1:19 PM, "Guido van Rossum" wrote: > > > > How about the following. We add a fast secure random generator to the > stdlib as an option, and when it has proven its worth a few releases from > now we consider again whether the default random() can be made secure > without breaking anything. > > If we have a fast secure RNG, then the standard Random object might as > well at least use it by default until someone actually sets or reads the > state (and then switch to MT at that point). Until one of these events > happens, the two RNGs are indistinguishable, and this would be a 100% > backwards compatible change. (It might even make sense to backport to 2.7.) > > The limitation is that if library A uses the global random object without > seeding in a security sensitive context, and library B uses seeding, then a > program that just uses library A will be secure, but if it then starts > using library B it will become insecure. But this is still better than the > current situation where library A is always insecure. > > The only case where this would actually have a downside compared to status > quo (assuming arc4random lives up to it's reputation for speed etc) is if > people start assuming that the default random object is in fact secure and > intentionally choosing to use it in security sensitive situations. But > hopefully people who know enough to realize that this is a decision they > need to make will also read the docs where it clearly states that this is > only a best-effort kind of hardening mechanism and that using > random.Random/the global methods for cryptographic purposes is still a bug. > > -n > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Sep 16 06:16:37 2015 From: mertz at gnosis.cx (David Mertz) Date: Tue, 15 Sep 2015 21:16:37 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: Sounds good to me! On Sep 15, 2015 1:19 PM, "Guido van Rossum" wrote: > How about the following. We add a fast secure random generator to the > stdlib as an option, and when it has proven its worth a few releases from > now we consider again whether the default random() can be made secure > without breaking anything. > > On Tue, Sep 15, 2015 at 12:43 PM, David Mertz wrote: > >> I commonly use random.some_distribution() as a quick source of >> "randomness" knowing full well that it's not cryptographic. Moreover, I >> usually do so initially without setting a seed. >> >> The first question I want to answer is "does this random process behave >> roughly as I expect?" But in the back of my mind is always the thought, >> "If/when I want to reuse this I'll add a seed for reproducibility". It >> would never occur to me to reach for the random module if I want to do >> cryptography. >> >> It's a good and well established API that currently exists. Sure, add a >> submodule random.crypto (or whatever name), but I'm -1 on changing anything >> whatsoever on the module functions that are well known. >> On Sep 15, 2015 11:26 AM, "Random832" wrote: >> >>> On Tue, Sep 15, 2015, at 13:33, Guido van Rossum wrote: >>> > I don?t want to change this API and I don?t want to introduce >>> deprecation >>> > warnings ? the API is fine, and the warnings will be as ineffective as >>> > the >>> > warnings in the documentation. >>> >>> The output of random.random today when it's not seeded / seeded with >>> None isn't _really_ deterministic - you can't reproduce it, after all, >>> without modifying the code (though in principle you could do >>> seed(None)/getstate the first time and then setstate on subsequent >>> executions - it may be worth supporting this use case?) - so changing it >>> isn't likely to affect anyone - anyone needing MT is likely to also be >>> using the seed functions. >>> >>> > random.set_random_generator() >>> >>> What do you think of having calls to seed/setstate(/getstate?) >>> implicitly switch (by whatever mechanism) to MT? This could be done >>> without a deprecation warning, and would allow existing code that relies >>> on reproducible values to continue working without modification? >>> >>> [indirection in global functions]... >>> > (and similar for all related functions). >>> >>> global getstate/setstate should also save/replace the _inst or its type; >>> at least if it's a different type than it was at the time the state was >>> saved. For backwards compatibility in case these are pickled it could >>> use the existing format when _inst is the current MT implementation, and >>> accept these in setstate. >>> >>> > It would also be fine for SystemRandom (or >>> > at >>> > least whatever is used by use_secure_random(), if SystemRandom cannot >>> > change for backward compatibility reasons) to raise an exception when >>> > seed(), setstate() or getstate() are called. >>> >>> SystemRandom already raises an exception when getstate and setstate are >>> called. >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Sep 16 06:19:01 2015 From: mertz at gnosis.cx (David Mertz) Date: Tue, 15 Sep 2015 21:19:01 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> Message-ID: The below said, I confess I never really liked random.random() as a name. Calling it random.uniform() 20 years ago would have been better. But that's ancient history, and no big deal. On Sep 15, 2015 12:43 PM, "David Mertz" wrote: > I commonly use random.some_distribution() as a quick source of > "randomness" knowing full well that it's not cryptographic. Moreover, I > usually do so initially without setting a seed. > > The first question I want to answer is "does this random process behave > roughly as I expect?" But in the back of my mind is always the thought, > "If/when I want to reuse this I'll add a seed for reproducibility". It > would never occur to me to reach for the random module if I want to do > cryptography. > > It's a good and well established API that currently exists. Sure, add a > submodule random.crypto (or whatever name), but I'm -1 on changing anything > whatsoever on the module functions that are well known. > On Sep 15, 2015 11:26 AM, "Random832" wrote: > >> On Tue, Sep 15, 2015, at 13:33, Guido van Rossum wrote: >> > I don?t want to change this API and I don?t want to introduce >> deprecation >> > warnings ? the API is fine, and the warnings will be as ineffective as >> > the >> > warnings in the documentation. >> >> The output of random.random today when it's not seeded / seeded with >> None isn't _really_ deterministic - you can't reproduce it, after all, >> without modifying the code (though in principle you could do >> seed(None)/getstate the first time and then setstate on subsequent >> executions - it may be worth supporting this use case?) - so changing it >> isn't likely to affect anyone - anyone needing MT is likely to also be >> using the seed functions. >> >> > random.set_random_generator() >> >> What do you think of having calls to seed/setstate(/getstate?) >> implicitly switch (by whatever mechanism) to MT? This could be done >> without a deprecation warning, and would allow existing code that relies >> on reproducible values to continue working without modification? >> >> [indirection in global functions]... >> > (and similar for all related functions). >> >> global getstate/setstate should also save/replace the _inst or its type; >> at least if it's a different type than it was at the time the state was >> saved. For backwards compatibility in case these are pickled it could >> use the existing format when _inst is the current MT implementation, and >> accept these in setstate. >> >> > It would also be fine for SystemRandom (or >> > at >> > least whatever is used by use_secure_random(), if SystemRandom cannot >> > change for backward compatibility reasons) to raise an exception when >> > seed(), setstate() or getstate() are called. >> >> SystemRandom already raises an exception when getstate and setstate are >> called. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Sep 16 06:27:41 2015 From: mertz at gnosis.cx (David Mertz) Date: Tue, 15 Sep 2015 21:27:41 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sep 15, 2015 7:23 PM, "Stephen J. Turnbull" wrote: > > A pseudo-randomly selected recent quote: > > > It would never occur to me to reach for the random module if I want > > to do cryptography. > That doesn't mean that security has to be #1 always and everywhere in > designing Python, but I find it pretty distressing that apparently a > lot of people either don't understand or don't care about what's at > stake in these kinds of decisions *for the rest of the world*. > The reality is that security that is not on by default is not > secure. Any break in a dike can flood a whole town. This feels somewhere between disingenuous and dishonest. Just like I don't use the random module for cryptography, I also don't use the socket module or the threading module for cryptography. Could a program dealing with sockets have security issues?! Very likely! Could a multithreaded one expose vulnerabilities? Certainly! Should we try to "secure" these modules for users who don't need to our don't know to think about security? Absolutely not! -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Sep 16 06:47:25 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 16 Sep 2015 13:47:25 +0900 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87lhc7x95u.fsf@uwakimon.sk.tsukuba.ac.jp> Tim Peters writes: > [Stephen J. Turnbull ] > > ... > > (2) ISTM there are no likely attack vectors due to choice of default > > RNG in random.random, based on Tim's analysis, but AFAICS he's > > unwilling to say it's implausible that they exist. (Sorry for the > > double negative!) I take this to mean that there may be real risk. > > Oh, _many_ attacks are possible. Many are even plausible. For > example, while Python's _default_ seeding is based on urandom() > setting MT's entire massive state (no more secure way exists), a user > making up their own seed is quite likely to do so in a way vulnerable > to a "poor seeding" attack. I'm not sure what you mean to say, but I don't count that as "due to choice of default RNG". That's foot-shooting of the kind we can't do anything about anyway, and if *that* is what Nick is worried about, I'm worried about Nick. ;-) *I* am more worried about attacks we don't know about yet (or at least haven't been mentioned in this thread), and maybe even haven't been invented yet. I presume Nick is, too. > "Password generators" should be the least of our worries. Best I can > tell, the PHP paper's highly technical MT attack against those has > scant chance of working in Python except when random.choice(x) is > known to have len(x) a power of 2. That's genuinely comforting to read (even though it's the second or third time I've read it ;-). But I'm still nervous about the unknown. From ncoghlan at gmail.com Wed Sep 16 07:38:36 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Sep 2015 15:38:36 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 16 September 2015 at 14:27, David Mertz wrote: > > On Sep 15, 2015 7:23 PM, "Stephen J. Turnbull" wrote: >> >> A pseudo-randomly selected recent quote: >> >> > It would never occur to me to reach for the random module if I want >> > to do cryptography. > >> That doesn't mean that security has to be #1 always and everywhere in >> designing Python, but I find it pretty distressing that apparently a >> lot of people either don't understand or don't care about what's at >> stake in these kinds of decisions *for the rest of the world*. >> The reality is that security that is not on by default is not >> secure. Any break in a dike can flood a whole town. > > This feels somewhere between disingenuous and dishonest. Just like I don't > use the random module for cryptography, I also don't use the socket module > or the threading module for cryptography. That's great that you already know not to use the random module for cryptography. Unfortunately, this is a lesson that needs to be taught developer by developer: "don't use the random module for security sensitive tasks". When they ask "Why not?", they get hit with a wall of confusing arcana about brute force search spaces, and cryptographically secure random number generators, and get left with a feeling of dissatisfaction with the explanation because cryptography is one of the areas of computing where our intuitions break down so it takes years to retrain our brains to adopt the relevant mindset. Beginners don't even get that far, as they have to ask "What's a security sensitive task?" while they're still at a stage where they're trying to grasp the basic concept of computer generated random numbers (this is a concrete problem with the current situation, as a warning that says "Don't use this for " is equivalent to "Don't use this" is you don't yet know how to identify ""). It's instinctive for humans to avoid additional work when it provides no immediate benefit to us personally. This is a sensible time management strategy, but it's proved to be a serious problem in the context of computer security. An analogy that came up in one of the earlier threads is this: * as an individual lottery ticket holder, assuming you're going to win is a bad assumption * as a lottery operator, assuming someone, somewhere, is going to win is a good assumption Infrastructure security engineers are lottery operators - with millions of software development projects, millions of businesses demanding online web presences, and tens of millions of developers worldwide (with many, many more on the way as computing becomes a compulsory part of schooling), any potential mistake is going to be made and exploited eventually, we just have no way of predicting when or where. Unlike lottery operators (who get to set their prize levels), we also have no way of predicting the severity of the consequences. The *problem* we have is that individual developers are lottery ticket holders - the probability of *our* particular component being the one that gets compromised is vanishingly small, so the incentive to inflict additional work on ourselves to mitigate security concerns is similarly small (although some folks do it anyway out of sheer interest, and some have professional incentives to do so). So let's assume any given component has a 1 in 10000 chance of being compromised (0.01%). We only have to get to 100k components before the aggregate chance of at least one component being compromised rises to almost 100% (around 99.54%). It's at this point the sheer scale of the internet starts working against us - while it's currently estimated that there are currently only around 30 million developers (both professionals and hobbyists) worldwide, it's further estimated that there are 3 *billion* people with access to the internet. Neither of those numbers is going to suddenly start getting smaller, so we start getting interested in security risks with a lower and lower probability of being exploited. Accordingly, we want "don't worry about security" to be the *right answer* in as many cases as possible - there's always going to be plenty of unavoidable security risks in any software development project, so eliminating the avoidable ones by default makes it easier to focus attention on other areas of potential concern. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Sep 16 07:59:04 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Sep 2015 15:59:04 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On 16 September 2015 at 14:12, Guido van Rossum wrote: > Security isn't served well by panicky over-reaction. Proposing a change in 2015 that wouldn't be released to the public until early 2017 or so isn't exactly panicking. (And the thing that changed for me that prompted me to write the PEP was finally figuring out a remotely plausible migration plan to address the backwards compatibility concerns, rather than anything on the security side) As I wrote in the PEP, this kind of problem is a chronic one, not an acute one, where security engineers currently waste a *lot* of their (and other people's) time on remedial firefighting - a security audit (or a breach investigation) detects a vulnerability, high priority issues get filed with affected projects, nobody goes home happy. Accordingly, my proposal is aimed as much at eliminating the perennial "But *why* can't I use the random module for security sensitive tasks?" argument as it is at anything else. I'd like the answer to that question to eventually be "Sure, you can use the random module for security sensitive tasks, so let's talk about something more important, like why you're collecting and storing all this sensitive personally identifiable information in the first place". Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tim.peters at gmail.com Wed Sep 16 09:23:57 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 16 Sep 2015 02:23:57 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <87lhc7x95u.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhc7x95u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Stephen J. Turnbull ] >>> (2) ISTM there are no likely attack vectors due to choice of default >>> RNG in random.random, based on Tim's analysis, but AFAICS he's >>> unwilling to say it's implausible that they exist. (Sorry for the >>> double negative!) I take this to mean that there may be real risk. [Tim] >> Oh, _many_ attacks are possible. Many are even plausible. For >> example, while Python's _default_ seeding is based on urandom() >> setting MT's entire massive state (no more secure way exists), a user >> making up their own seed is quite likely to do so in a way vulnerable >> to a "poor seeding" attack. [Stephen] > I'm not sure what you mean to say, That the most obvious and easiest of RNG attacks remain possible regardless of anything that may be done, short of refusing to provide a seedable generator. > but I don't count that as "due to choice of default RNG". That's foot- >shooting of the kind we can't do anything about anyway, and if *that* > is what Nick is worried about, I'm worried about Nick. ;-) Oh no, _nobody_ is worried enough to "do something" about it. Not really. Note that in the PHP paper, 10 of the 16 apps scored "full attack" via pure brute force against poor seeding (figure 13, column 4.3). That's probably mostly due to that the versions of PHP tested inflicted poor _default_ seeding on users. I hope so. But there's no accounting of which apps did and didn't set their own seeds. They did note that "Joomla" attempted to repair a security bug by _removing_ its own seeding, in 2010. Which left it open to PHP's poor default seeding instead - which was nevertheless an improvement. > *I* am more worried about attacks we don't know about yet (or at least > haven't been mentioned in this thread), and maybe even haven't been > invented yet. I presume Nick is, too. Fundamentally, I just don't see the sense in saying that someone who does their own seeding deserves whatever they get, while someone who uses an inappropriate generator in a security context should be saved from themself. I know, I read all the posts about why I'm wrong. I just don't buy it. There's no real substitute for understanding what you're doing, regardless of field. Yes, incompetence can cause great damage. But I'm not sure it does the world a real favor to possibly help a programmer incompetent to do a task keep working in the field a little longer. This isn't the only damage they can cause, and the longer they keep working in an area they don't understand the more damage they can do. The alternative? Learn how to use frickin' SystemRandom. It's not hard. Or get work for which they are competent. >> "Password generators" should be the least of our worries. Best I can >> tell, the PHP paper's highly technical MT attack against those has >> scant chance of working in Python except when random.choice(x) is >> known to have len(x) a power of 2. > That's genuinely comforting to read (even though it's the second or > third time I've read it ;-) If you read everything I ever wrote, it's the second. Although you may have _inferred_ it before I ever wrote it, from Nathaniel's "if I use the base64 or hex alphabets", instinctively leaping from "hmm ... 2**6 and ... 2**4" to "power of 2". In which case it could feel like the third time. And I used the phrase "power of 2" in a reply to you before, but in a context wholly unrelated to the PHP paper. That may even make it feel like the fourth time. Always happy to clarify ;-) > But I'm still nervous about the unknown. Huh! I've heard humans are prone to that. In which case, there will always be something to be nervous about :-) From mertz at gnosis.cx Wed Sep 16 09:43:59 2015 From: mertz at gnosis.cx (David Mertz) Date: Wed, 16 Sep 2015 00:43:59 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On Sep 15, 2015 11:00 PM, "Nick Coghlan" wrote: > "But *why* can't I use the random module for security sensitive > tasks?" argument as it is at anything else. I'd like the answer to > that question to eventually be "Sure, you can use the random module > for security sensitive tasks, so let's talk about something more > important, like why you're collecting and storing all this sensitive > personally identifiable information in the first place". I believe this attitude makes overall security WORSE, not better. Giving a false assurance that simply using a certain cryptographic building block makes your application secure makes it less likely applications will fail to undergo genuine security analysis. Hence I affirmatively PREFER a random module that explicitly proclaims that it is non-cryptographic. Someone who figures out enough to use random.SystemRandom, or a future crypto.random, or the like is more likely to think about why they are doing so, and what doing so does and does NOT assure them off. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Sep 16 10:11:12 2015 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 16 Sep 2015 09:11:12 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8588E.7010106@egenix.com> Message-ID: On 2015-09-16 01:43, Tim Peters wrote: > ... > > [Marc-Andre] >> Ah, now we're getting somewhere :-) >> >> If we accept that non-guessable, but deterministic is a good >> compromise, then adding a cipher behind MT sounds like a reasonable >> way forward, even as default. >> >> For full crypto strength, people would still have to rely on >> solutions like /dev/urandom or the OpenSSL one (or reseed the >> default RNG every now and then). All others get the benefit of >> non-guessable, but keep the ability to seed the default RNG in >> Python. > > I expect the only real reason "new entropy" is periodically mixed in > to OpenBSD's arc4random() is to guard against that a weakness in > ChaCha20 may be discovered later. If there were any known > computationally feasible way whatsoever to distinguish bare-bones > ChaCha20's output from a "truly random" sequence, it wouldn't be > called "crypto" to begin with. Periodic reseeding also serves to guard against other leaks of information about the underlying state that don't come from breaking through the cipher. If an attacker manages to deduce the state through side channels, timing attacks on the machine, brief physical access, whatever, then reseeding with new entropy will limit the damage rather than blithely continuing on with a compromised state forever. It's an important feature of a CSPRNG. Using a crypto output function in your PRNG is a necessary but not sufficient condition for security. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mal at egenix.com Wed Sep 16 10:21:23 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 16 Sep 2015 10:21:23 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8588E.7010106@egenix.com> Message-ID: <55F92683.5040205@egenix.com> On 16.09.2015 02:43, Tim Peters wrote: > ... > > [Marc-Andre] >> Ah, now we're getting somewhere :-) >> >> If we accept that non-guessable, but deterministic is a good >> compromise, then adding a cipher behind MT sounds like a reasonable >> way forward, even as default. >> >> For full crypto strength, people would still have to rely on >> solutions like /dev/urandom or the OpenSSL one (or reseed the >> default RNG every now and then). All others get the benefit of >> non-guessable, but keep the ability to seed the default RNG in >> Python. > > I expect the only real reason "new entropy" is periodically mixed in > to OpenBSD's arc4random() is to guard against that a weakness in > ChaCha20 may be discovered later. If there were any known > computationally feasible way whatsoever to distinguish bare-bones > ChaCha20's output from a "truly random" sequence, it wouldn't be > called "crypto" to begin with. > > But reseeding MT every now and again is definitely not suitable for > crypto purposes. You would need to reseed at least every 624 outputs, > and from a crypto-strength seed source. In which case, why bother > with MT at all? You could just as well use the crypto source > directly. > > >> Is there some research on this (MT + cipher or hash) ? > > Oh, sure. MT's creators noted from the start that it would suffice to > run MT's outputs through a crypto hash (like your favorite flavor of > SHA). That's just as vulnerable to "poor seeding" attacks as plain > MT, but it's computationally infeasible to deduce the state from any > number of hashed outputs (unless your crypto hash is at least partly > invertible, in which case it's not really a crypto hash ;-) ).; > > For other approaches, search for CryptMT. MT's creators suggested a > number of other schemes over the years. The simplest throws away the > "tempering" part of MT (the 4 lines that map the raw state word into a > mildly scrambled output word - not because it needs to be thrown away, > but because they think it would no longer be needed given what > follows). Then one byte is obtained via grabbing the next MT 32-bit > output, folding it into a persistent accumulator via multiplication, > and just revealing the top byte: > > accum = some_odd_integer > while True: > accum *= random.getrandbits(32) | 1 > yield accum >> 24 > > I did see one paper suggesting it was possible to distinguish the > output of that from a truly random sequence given 2**50 consecutive > outputs (but that's all - still no way to deduce the state). > [Tim, on CryptMT] >> I did see one paper suggesting it was possible to distinguish the >> output of that from a truly random sequence given 2**50 consecutive >> outputs (but that's all - still no way to deduce the state). > > Sorry: not 2**50 consecutive outputs (which are bytes), but 2**50 > consecutive output bits, so only 2**47 outputs. Thanks for the "CryptMT" pointers. I'll do some research after PyCon UK on this. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/CRYPTMT/index.html A quick glimpse at http://www.ecrypt.eu.org/stream/p3ciphers/cryptmt/cryptmt_p3.pdf suggests that this is a completely new stream cipher, though it uses the typical elements (key + non-linear filter + feedback loop). The approach is interesting, though: they propose an PRNG which can then get used as stream cipher by XOR'ing the PRNG output with the data stream. So the PRNG implies the cipher, not the other way around as many other approaches to CSPRNGs. That's probably also one of its perceived weaknesses: it's different than the common approach. On 16.09.2015 02:55, Tim Peters wrote:> [Tim] >> ... >> Oh, sure. MT's creators noted from the start that it would suffice to >> run MT's outputs through a crypto hash (like your favorite flavor of >> SHA). That's just as vulnerable to "poor seeding" attacks as plain >> MT, but it's computationally infeasible to deduce the state from any >> number of hashed outputs > > Although what's "computationally feasible" may well have changed since > then! These days I expect even a modestly endowed attacker could > afford to store an exhaustive table of the 2**32 possible outputs and > their corresponding hashes. Then the hashes are 100% invertible via > simple lookup, so are no better than not hashing at all. Simply adding a hash doesn't sound like a good idea. My initial thought was using a (well studied) stream cipher on the output, not just a hash on the output. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 16 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 2 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 10 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From cory at lukasa.co.uk Wed Sep 16 10:28:26 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 16 Sep 2015 09:28:26 +0100 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhc7x95u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 16 September 2015 at 08:23, Tim Peters wrote: > Fundamentally, I just don't see the sense in saying that someone who > does their own seeding deserves whatever they get, while someone who > uses an inappropriate generator in a security context should be saved > from themself. I know, I read all the posts about why I'm wrong. I > just don't buy it. There's no real substitute for understanding what > you're doing, regardless of field. Yes, incompetence can cause great > damage. But I'm not sure it does the world a real favor to possibly > help a programmer incompetent to do a task keep working in the field a > little longer. This isn't the only damage they can cause, and the > longer they keep working in an area they don't understand the more > damage they can do. The alternative? Learn how to use frickin' > SystemRandom. It's not hard. Or get work for which they are > competent. Because that's never how these things go. You usually don't write a password generator that uses a non-CS PRNG in a security context, get discovered in the short term, and fired/reprimanded/whatever. Instead, one of the following things happens: - you get code review from a reviewer who knows the problem space and spots the problem. It gets fixed, you get educated, you're better prepared for the field. - you get code review from a reviewer who knows the problem space but *doesn't* spot the problem because Python isn't their first language. It doesn't get fixed and no-one notices for ten years until the problem is exploited, but you left the company 8 years ago and are now Head of Security Engineering at CoolStartupInc. - you don't get code review, or your reviewer is no better informed on this topic than you are. The problem doesn't get fixed and no-one notices ever because your program isn't exploited, or is only exploited in ways you never find out about because the rest of your security process sucked too, but you never find out about this. This is the ongoing problem with incompetence when it comes to security: the feedback loop is long and the negative event fires rarely, so most programmers never experience it. Most engineers have *never* experienced a security vulnerability in their own project, let alone had one exploited. Thus, most engineers never get the negative feedback loop that tells them that they don't know enough to do the work they're doing. Look at all the people who get this wrong. Consider haveibeenpwned.com for a minute. They list a fraction of the website databases that have been exposed due to security errors. At last count, that list includes (I removed more than half for the sake of length): - Adobe - Ashley Madison - Snapchat - Gawker - NextGenUpdate - Yandex - Forbes - Stratfor - Domino's - Yahoo - Telecom Regulatory Authority of India - Vodafone - Sony - HackingTeam - Bell - Minecraft Forum - UN Internet Governance Forum - Tesco Are you telling me that every engineer responsible for these is not working in the industry any more? I doubt it. In fact, I think most of these places can't even account for which engineer is responsible, and if they can odds are good they left long before the problem was exploited. So you're right, there is no real substitute for knowing what you're doing. But we cannot prevent programmers who don't know this stuff from writing the code that does it. We don't get to set the bar. We cannot throw GoReadABookOrTwo exceptions when inexperienced programmers type random.random, much as we would like too. With that said, we *can* construct an environment where a programmer has to have actually tried to hurt themselves. They have to have taken the gun off the desk, loaded it, disabled the safety, pointed it at their foot, and pulled the trigger. At that point we can say that we took all reasonable precautions to stop you doing what you did and you did it anyway: that's entirely on you. If you disable the safety settings, then frankly you are taking on the mantle of an expert: you are claiming you knew more than the person who developed the system, and if you don't then the consequences are on you. But if you use the defaults then you're just doing the most obvious thing, and from my perspective that should not be a punishable offence. From cory at lukasa.co.uk Wed Sep 16 10:29:53 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 16 Sep 2015 09:29:53 +0100 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On 16 September 2015 at 08:43, David Mertz wrote: > Hence I affirmatively PREFER a random module that explicitly proclaims that > it is non-cryptographic. Someone who figures out enough to use > random.SystemRandom, or a future crypto.random, or the like is more likely > to think about why they are doing so, and what doing so does and does NOT > assure them off. And what about those that don't? Is our position here "screw 'em, and also screw their users"? From ncoghlan at gmail.com Wed Sep 16 10:34:35 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Sep 2015 18:34:35 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: On 16 September 2015 at 17:43, David Mertz wrote: > > On Sep 15, 2015 11:00 PM, "Nick Coghlan" wrote: >> "But *why* can't I use the random module for security sensitive >> tasks?" argument as it is at anything else. I'd like the answer to >> that question to eventually be "Sure, you can use the random module >> for security sensitive tasks, so let's talk about something more >> important, like why you're collecting and storing all this sensitive >> personally identifiable information in the first place". > > I believe this attitude makes overall security WORSE, not better. Giving a > false assurance that simply using a certain cryptographic building block > makes your application secure makes it less likely applications will fail to > undergo genuine security analysis. > > Hence I affirmatively PREFER a random module that explicitly proclaims that > it is non-cryptographic. Someone who figures out enough to use > random.SystemRandom, or a future crypto.random, or the like is more likely > to think about why they are doing so, and what doing so does and does NOT > assure them off. You're *describing the status quo*. This isn't a new concept, as it's the way our industry has worked since forever: 1. All the security features are off by default 2. The onus is on individual developers to "just know" when the work they're doing is security sensitive 3. Once they realise what they're doing is security sensitive (probably because a security engineer pointed it out), the onus is *still* on them to educate themselves as to what to do about it Meanwhile, their manager is pointing at the project schedule demanding to know why the new feature hasn't shipped yet, and they're in turn pointing fingers at the information security team blaming them for blocking the release until the security vulnerabilities have been addressed. And that's the *good* scenario, since the only people it upsets are the people working on the project. In the more typical cases where the security team doesn't exist, gets overruled, or simply has too many fires to try to put out, we get http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/ and http://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/ On the community project side, we take the manager, the product schedule and the information security team out of the picture, so folks never even get to find out that there are any problems with the approach they're taking - they just ship and deploy software, and are mostly protected by the lack of money involved (companies and governments are far more interesting as targets than online communities, so open source projects mainly need to worry about protecting the software distribution infrastructure that provides an indirect attack vector on more profitable targets). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From njs at pobox.com Wed Sep 16 11:02:36 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Sep 2015 02:02:36 -0700 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F92683.5040205@egenix.com> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8588E.7010106@egenix.com> <55F92683.5040205@egenix.com> Message-ID: On Wed, Sep 16, 2015 at 1:21 AM, M.-A. Lemburg wrote: > > > On 16.09.2015 02:43, Tim Peters wrote: >> [Tim, on CryptMT] >>> I did see one paper suggesting it was possible to distinguish the >>> output of that from a truly random sequence given 2**50 consecutive >>> outputs (but that's all - still no way to deduce the state). >> >> Sorry: not 2**50 consecutive outputs (which are bytes), but 2**50 >> consecutive output bits, so only 2**47 outputs. > > Thanks for the "CryptMT" pointers. I'll do some research after PyCon UK > on this. > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/CRYPTMT/index.html > > A quick glimpse at > > http://www.ecrypt.eu.org/stream/p3ciphers/cryptmt/cryptmt_p3.pdf > > suggests that this is a completely new stream cipher, though it > uses the typical elements (key + non-linear filter + feedback loop). NB that that paper also says that it's patented and requires a license for commercial use. > The approach is interesting, though: they propose an PRNG which > can then get used as stream cipher by XOR'ing the PRNG output with > the data stream. So the PRNG implies the cipher, not the other way > around as many other approaches to CSPRNGs. > > That's probably also one of its perceived weaknesses: it's different > than the common approach. I think you just described the standard definition of a stream cipher? "Stream cipher" is just the crypto term for a deterministic RNG, that you XOR with data. (However it's a not a CSPRNG, because those require seeding schedules and things like that -- check out e.g. Fortuna.) -n -- Nathaniel J. Smith -- http://vorpus.org From p.f.moore at gmail.com Wed Sep 16 11:42:33 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 16 Sep 2015 10:42:33 +0100 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 16 September 2015 at 05:00, Nick Coghlan wrote: > On 16 September 2015 at 11:16, Stephen J. Turnbull wrote: >> Guido van Rossum writes: >> > The concept of secure vs. insecure sources of randomness isn't >> > *that* hard to grasp. >> >> Once one *tries*. Read some of Paul Moore's posts, and you will >> discover that the very mention of some practice "improving security" >> immediately induces a non-trivial subset of his colleagues to start >> thinking about how to avoid doing it. I am almost not kidding; >> according to his descriptions, the situation in the trenches is very >> nearly that bad. Security is evidently hated almost as much as spam. > > Yep, hence things like http://stopdisablingselinux.com/ > > SELinux in enforcing mode operates on a very simple principle: we > should know what system resources we expect our applications to > access, and we should write that down in a form the computer > understands so it can protect us against attackers trying to use that > application to do something unintended (like steal user information). I don't know if it's still true, but most Oracle database installation instructions state "disable SELinux" as a basic pre-requisite. This is a special case of a more general issue, which is that the "assign only those privileges that you need" principle is impossible to implement when you are working with proprietary software that contains no documentation on what privileges it needs, other than "admin rights". (Actually, I just checked - it looks like the official Oracle docs no longer suggest disabling SELinux. But I bet they still don't give you all the information you need to implement a tight security policy without a fair amount of "try it and see what breaks"...) Even in open source, people routinely run "sudo pip install". Not "make the Python site-packages read/write", which is still wrong, but which at least adheres to the principle of least privilege, but "give me root access". How many people get an app for their phone, see "this app needs " and has any option other than to click "yes" or discard the app? Who does anything with UAC on Windows other than blindly click "yes" or disable it altogether? Not because they don't understand the issues (certainly, many don't, but some do) but rather because there's really no other option? In these contexts, "security" is the name for "those things I have to work around to do what I'm trying to do" - by disabling it, or blindly clicking "yes", or insisting I need admin rights. Or another example. Due to a password expiry policy combined with a lack of viable single sign on, I have to change upwards of 50 passwords at least once every 4 weeks in order to be able to do my job. And the time to do so is considered "overhead" and therefore challenged regularly. So I spend a lot of time looking to see if I can automate password changes (which is *definitely* not good practice). I'm sure others do things like using weak passwords or reusing passwords. Because the best practice simply isn't practical in that context. Nobody in the open source or security good practices communities even has an avenue to communicate with the groups involved in this sort of thing. At least as far as I know. I do what I can to raise awareness, but it's a "grass roots" exercise that typically doesn't reach the people with the means to actually change anything. Of course, nobody in this environment uses Python to build internet-facing web applications, either. So I'm not trying to argue that this should drive the question of the RNG used in Python. But at the same time, I am trying to sell Python as a good tool for automating business processes, writing administrative scripts and internal applications, etc. So there is a certain link... Sorry - but it's nice to vent sometimes :-) Paul From mal at egenix.com Wed Sep 16 13:38:05 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 16 Sep 2015 13:38:05 +0200 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8588E.7010106@egenix.com> <55F92683.5040205@egenix.com> Message-ID: <55F9549D.4020006@egenix.com> On 16.09.2015 11:02, Nathaniel Smith wrote: > On Wed, Sep 16, 2015 at 1:21 AM, M.-A. Lemburg wrote: >> >> >> On 16.09.2015 02:43, Tim Peters wrote: >>> [Tim, on CryptMT] >>>> I did see one paper suggesting it was possible to distinguish the >>>> output of that from a truly random sequence given 2**50 consecutive >>>> outputs (but that's all - still no way to deduce the state). >>> >>> Sorry: not 2**50 consecutive outputs (which are bytes), but 2**50 >>> consecutive output bits, so only 2**47 outputs. >> >> Thanks for the "CryptMT" pointers. I'll do some research after PyCon UK >> on this. >> >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/CRYPTMT/index.html >> >> A quick glimpse at >> >> http://www.ecrypt.eu.org/stream/p3ciphers/cryptmt/cryptmt_p3.pdf >> >> suggests that this is a completely new stream cipher, though it >> uses the typical elements (key + non-linear filter + feedback loop). > > NB that that paper also says that it's patented and requires a license > for commercial use. Hmm, you're right: """ If CryptMT is selected as one of the recommendable stream ciphers by eSTREAM, then it is free even for commercial use. """ Hasn't happened yet, but perhaps either eSTREAM or the authors will change their minds. Too bad they haven't yet, since it's a pretty fast stream cipher :-( Anyway, here's a paper on CryptMT: http://cryptography.gmu.edu/~jkaps/download.php?docid=1083 >> The approach is interesting, though: they propose an PRNG which >> can then get used as stream cipher by XOR'ing the PRNG output with >> the data stream. So the PRNG implies the cipher, not the other way >> around as many other approaches to CSPRNGs. >> >> That's probably also one of its perceived weaknesses: it's different >> than the common approach. > > I think you just described the standard definition of a stream cipher? > "Stream cipher" is just the crypto term for a deterministic RNG, that > you XOR with data. (However it's a not a CSPRNG, because those require > seeding schedules and things like that -- check out e.g. Fortuna.) The standard definition I know reads like this: """ Stream ciphers are an important class of encryption algorithms. They encrypt individual characters (usually binary digits) of a plaintext message one at a time, using an encryption transformation which varies with time. """ (taken from Chap 6.1 Introduction of "Handbook of Applied Cryptography"; http://cacr.uwaterloo.ca/hac/about/chap6.pdf) That's a bit more general than what you describe, since the keystream can pretty much be generated in arbitrary ways. What I wanted to emphasize is that a common way of coming up with a stream cipher is to use an existing block cipher which you then transform into a stream cipher. See e.g. https://www.emsec.rub.de/media/crypto/attachments/files/2011/03/hudde.pdf E.g. take AES run in CTR (counter) mode: it applies AES repeatedly to the values of a simple counter as "RNG". Running MT + AES would result in a similar setup, except that the source would have somewhat better qualities and would be based on standard well studied technology, albeit slower than going straight for a native stream cipher. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 16 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 2 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 10 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From stephen at xemacs.org Wed Sep 16 14:43:23 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 16 Sep 2015 21:43:23 +0900 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> <87lhc7x95u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <8737yey1p0.fsf@uwakimon.sk.tsukuba.ac.jp> Tim Peters writes: > Fundamentally, I just don't see the sense in saying that someone > who does their own seeding deserves whatever they get, while > someone who uses an inappropriate generator in a security context > should be saved from themself. Strawman, or imprecise quotation if you prefer. Nobody said they *deserve* it AFAICR; I said we can't stop them. Strictly speaking, yes, we could. We could (and *I* think we *should*) make it much less obvious how to do it by removing the seed method and the seed argument to __init__. The problem there is backward compatibility. I don't see that Guido would stand for it. Dis here homeboy not a-gonna stick mah neck out heeya, suh. I suspect we might also want to provide helper functions to construct a state from a seed as used by some other simulation package, such as Python 3.4. ;-) Name them and document them as for use in replicating simulations done from those seeds. Nice self-documenting names like "construct_rng_internal_state_from_python_3_4_compatible_seed". There should be one for each version of Python, too (sssh! don't confuse the users with abstractions like "identical implementation"). > There's no real substitute for understanding what you're doing, > regardless of field. Yes, incompetence can cause great damage. > But I'm not sure it does the world a real favor to possibly help a > programmer incompetent to do a task keep working in the field a > little longer. "Think of it as evolution in action." Yeah, I sympathize. But realistically, Darwinian selection will take geological time, no? That is, in almost all cases where disaster strikes, the culprit has long since moved on[1]. Whoever gets the sack is unlikely to be him or her. More likely it will be whoever has been telling the shop that their product is an accident waiting to happen. :-( The way I think about it, though, is a variation on a theme by Nick. Specifically, the more attractive nuisances we can eliminate, the fewer things the uninitiated need to learn. Footnotes: [1] That's especially true in Japan, where I live. "Whodunnit" also gets fuzzed up by the tendency to group work and group think, and a value system that promotes "getting along with others" more than expertise. Child-proof caps are a GoodThang[tm]. ;-) From ncoghlan at gmail.com Wed Sep 16 14:53:53 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Sep 2015 22:53:53 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 16 September 2015 at 19:42, Paul Moore wrote: > Nobody in the open source or security good practices communities even > has an avenue to communicate with the groups involved in this sort of > thing. Fortunately, that's no longer the case. Open source based development models are going mainstream, and while there's still a lot of work to do, cases like the US Federal government requiring the creation of open source prototypes as part of a bidding process are incredibly heartening (https://18f.gsa.gov/2015/08/28/announcing-the-agile-BPA-awards/). On the security side, folks are realising that the "You can't do that, it's a security risk" model is a bad one, and hence favoring switching to a model more like "We can help you to minimise your risk exposure while still enabling you to do what you want to do". So while it's going to take time for practices like those described in https://playbook.cio.gov/ to become a description of "the way the IT industry typically works", the benefits are so remarkable that it's a question of "when" rather than "if". > Of course, nobody in this environment uses Python to build > internet-facing web applications, either. So I'm not trying to argue > that this should drive the question of the RNG used in Python. But at > the same time, I am trying to sell Python as a good tool for > automating business processes, writing administrative scripts and > internal applications, etc. So there is a certain link... Right, helping Red Hat's Python maintenance team to maintain that kind of balance is one aspect of my day job, hence my interest in https://www.python.org/dev/peps/pep-0493/ as a nicer migration path when backporting the change to verify HTTPS certificates by default. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From robert.kern at gmail.com Wed Sep 16 15:25:22 2015 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 16 Sep 2015 14:25:22 +0100 Subject: [Python-ideas] Should our default random number generator be secure? In-Reply-To: <55F9549D.4020006@egenix.com> References: <87lhcaz5gs.fsf@uwakimon.sk.tsukuba.ac.jp> <55F6A380.4070609@egenix.com> <55F700C4.4030900@egenix.com> <20150915035334.GF31152@ando.pearwood.info> <55F7DAAE.5010401@egenix.com> <55F8588E.7010106@egenix.com> <55F92683.5040205@egenix.com> <55F9549D.4020006@egenix.com> Message-ID: On 2015-09-16 12:38, M.-A. Lemburg wrote: > What I wanted to emphasize is that a common way of coming up > with a stream cipher is to use an existing block cipher which you > then transform into a stream cipher. See e.g. > > https://www.emsec.rub.de/media/crypto/attachments/files/2011/03/hudde.pdf > > E.g. take AES run in CTR (counter) mode: it applies AES repeatedly > to the values of a simple counter as "RNG". Indeed. DE Shaw has done the analysis for you: https://www.deshawresearch.com/resources_random123.html > Running MT + AES would result in a similar setup, except that the > source would have somewhat better qualities and would be based > on standard well studied technology, albeit slower than going > straight for a native stream cipher. Why do you think it would have better qualities? You'll have to redo the analysis that makes MT and AES each so well-studied, and I'm not sure that all of the desirable properties of either will survive the combination. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From guido at python.org Wed Sep 16 16:26:05 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Sep 2015 07:26:05 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: There's still way too much chatter, and a lot that seems just rhetoric. This is not the republican primaries. Yes lots of companies got hacked. What's the evidence that a language's default RNG was involved? IIUC the best practice for password encryption (to make cracking using a large word list harder) is something called bcrypt; maybe next year something else will become popular, but the default RNG seems an unlikely candidate. I know that in the past the randomness of certain protocols was compromised because the seeding used a timestamp that an attacker could influence or guess. But random.py seeds MT from os.urandom(2500). So what's the class of vulnerabilities where the default RNG is implicated? Tim's proposal is simple: create a new module, e.g. safefandom, with the same API as random (less seed/state). That's it. Then it's a simple import change away to do the right thing, and we have years to seed StackOverflow with better information before that code even hits the road. (But a backport to Python 2.7 could be on PyPI tomorrow!) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 16 17:47:30 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 16 Sep 2015 10:47:30 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Guido] > There's still way too much chatter, and a lot that seems just rhetoric. This > is not the republican primaries. Which is a shame, since the chatter here is of much higher quality than in the actual primaries ;-) > Yes lots of companies got hacked. What's the evidence that a language's > default RNG was involved? Nobody cares whether there's evidence of actual harm. Just that there _might_ be, and even if none identifiable now, then maybe in the future. There is evidence of actual harm from RNGs doing poor _seeding_ by default, but Python already fixed that (I know, you already know that ;-) ). And this paper, from a few years ago, studying RNG vulnerabilities in PHP apps, is really good: https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf An interesting thing is that several of the apps already had a history of trying to fix security-related holes related to RNG (largely due to PHP's poor default seeding), but remained easily cracked. The primary recommendation there wasn't to make PHP's various PRNGs "crypto by magic", but for core PHP to supply "a standard" crypto RNG for people to use instead. As above, some of the app developers already knew darned well they had a history of RNG-related holes, but simply had no standard way to address it, and didn't have the _major_ expertise needed to roll their own. > IIUC the best practice for password encryption (to > make cracking using a large word list harder) is something called bcrypt; > maybe next year something else will become popular, but the default RNG > seems an unlikely candidate. I know that in the past the randomness of > certain protocols was compromised because the seeding used a timestamp that > an attacker could influence or guess. But random.py seeds MT from > os.urandom(2500). So what's the class of vulnerabilities where the default > RNG is implicated? 1. Users doing their own poor seeding. 2. A hypothetical MT state-deducer (seemingly needing to be considerably more sophisticated than the already mondo sophisticated one in the paper above) to be of general use against Python. 3. "Prove there can't be any in the future. Ha! You can't." ;-) > Tim's proposal is simple: create a new module, e.g. safefandom, with the > same API as random (less seed/state). That's it. Then it's a simple import > change away to do the right thing, and we have years to seed StackOverflow > with better information before that code even hits the road. (But a backport > to Python 2.7 could be on PyPI tomorrow!) Which would obviously be fine by me: make the distinction obvious at import time, make "the safe way" dead easy and convenient to use, give it anew name engineered to nudge newbies away from the "unsafe" (by contrast) `random`, and a new name easily discoverable by web search. There's something else here: some of these messages gave pointers to web pages where "security wonks" conceded that specific uses of SystemRandom were fine, but they couldn't recommend it anyway because it's too hard to explain what is or isn't "safe". "Therefore" users should only use urandom() directly. Which is insane, if for no other reason than that users would then invent their own algorithms to convert urandom() results into floats and ints, etc. Then they'll screw up _that_ part. But if "saferandom" were its own module, then over time it could implement its own "security wonk certified" higher level (than raw bytes) methods. I suspect it would never need to change anything from what the SystemRandom class does, but I'm not a security wonk, so I know nothing. Regardless, _whatever_ changes certified wonks deemed necessary in the future could be confined to the new module, where incompatibilities would only annoy apps using that module. Ditto whatever doc changes were needed. Also gone would be the inherent confusion from needing to draw distinctions between "safe" and "unsafe" in a single module's docs (which any by-magic scheme would only make worse). However, supplying a powerful and dead-simple-to-use new module would indeed do nothing to help old code entirely by magic. That's a non-goal to me, but appears to be the _only_ deal-breaker goal for the advocates. Which is why none of us is the BDFL ;-) From ncoghlan at gmail.com Wed Sep 16 17:54:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Sep 2015 01:54:24 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 17 September 2015 at 00:26, Guido van Rossum wrote: > There's still way too much chatter, and a lot that seems just rhetoric. This > is not the republican primaries. There was still a fair bit of useful feedback in there, so I pushed a new version of the PEP that addresses it: * the submodule idea is gone * the module level API still delegates to random._inst at call time rather than import time * random._inst is a SystemRandom() instance by default * there's a new ensure_repeatable() API to switch it back to random.Random() * seed(), getstate() and setstate() all implicitly call ensure_repeatable() * the latter issue a warning recommending calling ensure_repeatable() explicitly The key user experience difference from the status quo is that this allows the "not suitable for security purposes" warning to be moved to a section specifically covering ensure_repeatable(), seed(), getstate() and setstate() rather than automatically applying to the entire random module. The reason it becomes reasonable to move the warning is that it changes the failure mode from "any use of the module API for security sensitive purposes" is a problem to "any use of the module API for security sensitive purposes is a problem if the application also calls random.ensure_repeatable()". > Yes lots of companies got hacked. What's the evidence that a language's > default RNG was involved? IIUC the best practice for password encryption (to > make cracking using a large word list harder) is something called bcrypt; > maybe next year something else will become popular, but the default RNG > seems an unlikely candidate. I know that in the past the randomness of > certain protocols was compromised because the seeding used a timestamp that > an attacker could influence or guess. But random.py seeds MT from > os.urandom(2500). So what's the class of vulnerabilities where the default > RNG is implicated? Reducing the search space for brute force attacks on things like: * randomly generated default passwords * password reset tokens * session IDs The PHP paper covered an attack on password reset tokens. Python's seeding is indeed much better, and Tim's mathematical skills are infinitely better than mine so I'm never personally going to win a war of equations with him. If you considered a conclusive proof of a break specifically targeting *CPython's* PRNG essential before considering changing the default behaviour (even given the almost entirely backwards compatible approach I'm now proposing), I'd defer the PEP with a note suggesting that practical attacks on security tokens generated with CPython's PRNG may be a topic of potential interest to the security community. The PEP would then stay deferred until someone actually did the research and demonstrated a practical attack. > Tim's proposal is simple: create a new module, e.g. safefandom, with the > same API as random (less seed/state). That's it. Then it's a simple import > change away to do the right thing, and we have years to seed StackOverflow > with better information before that code even hits the road. (But a backport > to Python 2.7 could be on PyPI tomorrow!) If folks are reaching for a third party library anyway, we'd be better off point them at one of the higher levels ones like passlib or cryptography. There's also the aspect that something I'd now like to achieve is to eliminate the security warning that is one of the first things people currently see when they open up the random module documentation: https://docs.python.org/3/library/random.html While I think that warning is valuable given the current default behaviour, it's also inherently user hostile for beginners that actually *do* read the docs, as it raises questions they don't know how to answer: "The pseudo-random generators of this module should not be used for security purposes. Use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator." Switching the default means that the question to be asked is instead "Do you need repeatability?", which is *much* easier question, and we only need to ask it in the documentation for ensure_repeatable() and the related functions that call that implicitly. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald at stufft.io Wed Sep 16 17:54:30 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 16 Sep 2015 11:54:30 -0400 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On September 16, 2015 at 11:48:12 AM, Tim Peters (tim.peters at gmail.com) wrote: > > There's something else here: some of these messages gave pointers > to > web pages where "security wonks" conceded that specific uses > of > SystemRandom were fine, but they couldn't recommend it anyway > because > it's too hard to explain what is or isn't "safe". "Therefore" > users > should only use urandom() directly. Which is insane, if for no > other > reason than that users would then invent their own algorithms > to > convert urandom() results into floats and ints, etc. Then they'll > screw up _that_ part. That was the documentation for PyCA's cryptography module, where the only use of random we needed was for an IV (which you can use the output of os.urandom directly) and for an integer, which you could just use int.from_bytes and the output of os.urandom (i.e. int.from_bytes(os.urandom(20), byteorder="big")). It wasn't so much a general recommendation against random.SystemRandom, just that for our particular use case os.urandom is either by itself fine, or with a tiny bit of code on top of it fine and that's easier to explain than to try to explain how to use the random module safely and just warn against it entirely. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From steve at pearwood.info Wed Sep 16 17:54:13 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 17 Sep 2015 01:54:13 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: <20150916155412.GJ31152@ando.pearwood.info> On Wed, Sep 16, 2015 at 03:59:04PM +1000, Nick Coghlan wrote: [...] > Accordingly, my proposal is aimed as much at eliminating the perennial > "But *why* can't I use the random module for security sensitive > tasks?" argument as it is at anything else. I'd like the answer to > that question to eventually be "Sure, you can use the random module > for security sensitive tasks, so let's talk about something more > important, like why you're collecting and storing all this sensitive > personally identifiable information in the first place". The answer to that question is *already* "sure you can use the random module". You just have to use it correctly. [Aside: do you think that, having given companies and people a "secure by default" solution that will hopefully prevent data breaches, that they will be *more* or *less* open to the idea that they shouldn't be collecting this sensitive information?] We've spent a long time taking about random() as regards to security, but nobody exposes the output of random directly. They use it as a building block to generate tokens and passwords, and *that's* where the breech is occurring. We shouldn't care so much about the building blocks and care more about the high-level tools: the batteries included. Look at the example given by Nathaniel: https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf What was remarkable about this is how many individual factors were involved in the attacks. It wasn't merely an attack on Mersenne Twister, and it is quite possible that had any of the other factors been changed, the attacks would have failed. E.g. the applications used MD5 hashes. What if they had used SHA-1? They leaked sensitive information such as PIDs and exposed the time that the random numbers where generated. They allowed the attackers to get as many connections as they wanted. Someone might argue that none of those other problems would matter if the PRNG was more secure. That's true, up to a point: you never know when somebody will come up with an attack on the CSPRNG. Previous generations of CSPRNGs, including RC4, have been "retired", and we must expect that the current generation will be too. It is a good habit to avoid leaking this sort of information (times, PIDs etc) even if you don't have a concrete attack in place, because you don't know when a concrete attack will be discovered. Today's CSPRNG is tomorrow's hopelessly insecure PRNG, but defence is depth is always useful. I propose that instead of focusing on changing the building blocks that people will use by default, we provide them with ready made batteries for the most common tasks, and provide a clear example of acceptable practices for making their own batteries. (As usual, the standard lib will provide batteries, and third-party frameworks or libraries can provide heavy-duty nuclear reactors.) I propose: - The random module's API is left as-is, including the default PRNG. Backwards compatibility is important, code-churn is bad, and there are good use-cases for a non-CSPRNG. - We add at least one CSPRNG. I leave it to the crypto-wonks to decide which. - We add a new module, which I'm calling "secrets" (for lack of a better name) to hold best-practice security-related functions. To start with, it would have at least these three functions: one battery, and two building blocks: + secrets.token to create password recovery tokens or similar; + secrets.random calls the CSPRNG; it just returns a random number (integer?). There is no API for getting or setting the state, setting the seed, or returning values from non-uniform distributions; + secrets.choice similarly uses the CSPRNG. Developers will still have to make a choice: "do I use secrets, or random?" If they're looking for a random token (or password?), the answer is obvious: use secrets, because the battery is already there. For reasons that I will go into below, I don't think that requiring this choice is a bad thing. I think it is a *good* thing. secrets becomes the go-to module for things you want to keep secret. random remains the module you use for games and simulations. If there is interest in this proposed secrets module, I'll write up a proto-PEP over the weekend, and start a new thread for the benefit of those who have muted this one. You can stop reading now. The rest is motivational rather than part of the concrete proposal. Still here? Okay. I think that it is a good thing to have developers explicitly make a choice between random and secrets. I think it is important that we continue to involve developers in security thinking. I don't believe that "secure by default" is possible at the application level, and that's what really matters. It doesn't matter if the developer uses a "secure by default" CSPRNG if the application leaks information some other way. We cannot possibly hope to solve application security from the bottom-up (although providing good secure tools is part of the solution). I believe that computer security is to the IT world what occupational health and safety is to the farming, building and manufacturing industries (and others). The thing about security is that, like safety, it is not a product. There is no function you can call to turn security on, no secure=True setting. It is a process and a mind-set, and everyone involved needs to think about it, at least a little bit. It took a long time for the blue collar industries to accept that OH&S was something that *everyone* has to be involved in, from the government setting standards to individual workers who have to keep their eyes open while on the job. Like the IT industry today, management's attitude was that safety was a cost that just ate into profits and made projects late (sound familiar?), and the workers' attitude was all too often "it won't happen to us". It takes experience and training and education to recognise dangerous situations on the job, and people die when they don't get that training. It is part of every person's job to think about what they are doing. I don't believe that it is possible to have "zero-thought security" any more than it is possible to have "zero-thought safety". The security professionals can help by providing ready-to-use tools, but the individual developers still have to use those tools correctly, and cultivate a properly careful mindset: "If I wanted to break into this application, what information would I look for? How can I stop providing it? Am I using the right tool for this job? How can I check? Where's the security rep?" Until the IT industry treats security as the building industry treats OH&S, attempts to bail out the Titanic with a teacup with bottom-up "safe by default" functions will just encourage a false sense of security. -- Steve From ncoghlan at gmail.com Wed Sep 16 18:05:53 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Sep 2015 02:05:53 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <20150916155412.GJ31152@ando.pearwood.info> References: <20150916155412.GJ31152@ando.pearwood.info> Message-ID: On 17 September 2015 at 01:54, Steven D'Aprano wrote: > I propose: > > - The random module's API is left as-is, including the default PRNG. > Backwards compatibility is important, code-churn is bad, and there are > good use-cases for a non-CSPRNG. > > - We add at least one CSPRNG. I leave it to the crypto-wonks to decide > which. > > - We add a new module, which I'm calling "secrets" (for lack of a better > name) to hold best-practice security-related functions. To start with, > it would have at least these three functions: one battery, and two > building blocks: > > + secrets.token to create password recovery tokens or similar; > > + secrets.random calls the CSPRNG; it just returns a random number > (integer?). There is no API for getting or setting the state, > setting the seed, or returning values from non-uniform > distributions; > > + secrets.choice similarly uses the CSPRNG. > > Developers will still have to make a choice: "do I use secrets, or > random?" If they're looking for a random token (or password?), the > answer is obvious: use secrets, because the battery is already there. > For reasons that I will go into below, I don't think that requiring this > choice is a bad thing. I think it is a *good* thing. > > secrets becomes the go-to module for things you want to keep secret. > random remains the module you use for games and simulations. > > If there is interest in this proposed secrets module, I'll write up a > proto-PEP over the weekend, and start a new thread for the benefit of > those who have muted this one. Oh, *this* I like (minus the idea of introducing a CSPRNG - random.SystemRandom will be a good choice for this task). "Is it an important secret?" is a question anyone can answer, so simply changing the proposed name addresses all my concerns regarding having to ask people to learn how to answer a difficult question that isn't directly related to what they're trying to do. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Sep 16 18:06:36 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Sep 2015 09:06:36 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Sep 16, 2015 at 8:47 AM, Tim Peters wrote: > [Guido] > > There's still way too much chatter, and a lot that seems just rhetoric. > This > > is not the republican primaries. > > Which is a shame, since the chatter here is of much higher quality > than in the actual primaries ;-) > > > > Yes lots of companies got hacked. What's the evidence that a language's > > default RNG was involved? > > Nobody cares whether there's evidence of actual harm. Just that there > _might_ be, and even if none identifiable now, then maybe in the > future. > > There is evidence of actual harm from RNGs doing poor _seeding_ by > default, but Python already fixed that (I know, you already know that > ;-) ). > > And this paper, from a few years ago, studying RNG vulnerabilities in > PHP apps, is really good: > > > https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf > > An interesting thing is that several of the apps already had a history > of trying to fix security-related holes related to RNG (largely due to > PHP's poor default seeding), but remained easily cracked. > > The primary recommendation there wasn't to make PHP's various PRNGs > "crypto by magic", but for core PHP to supply "a standard" crypto RNG > for people to use instead. As above, some of the app developers > already knew darned well they had a history of RNG-related holes, but > simply had no standard way to address it, and didn't have the _major_ > expertise needed to roll their own. > > > > IIUC the best practice for password encryption (to > > make cracking using a large word list harder) is something called bcrypt; > > maybe next year something else will become popular, but the default RNG > > seems an unlikely candidate. I know that in the past the randomness of > > certain protocols was compromised because the seeding used a timestamp > that > > an attacker could influence or guess. But random.py seeds MT from > > os.urandom(2500). So what's the class of vulnerabilities where the > default > > RNG is implicated? > > 1. Users doing their own poor seeding. > > 2. A hypothetical MT state-deducer (seemingly needing to be > considerably more sophisticated than the already mondo > sophisticated one in the paper above) to be of general use > against Python. > > 3. "Prove there can't be any in the future. Ha! You can't." ;-) > > > > Tim's proposal is simple: create a new module, e.g. safefandom, with the > > same API as random (less seed/state). That's it. Then it's a simple > import > > change away to do the right thing, and we have years to seed > StackOverflow > > with better information before that code even hits the road. (But a > backport > > to Python 2.7 could be on PyPI tomorrow!) > > Which would obviously be fine by me: make the distinction obvious at > import time, make "the safe way" dead easy and convenient to use, give > it anew name engineered to nudge newbies away from the "unsafe" (by > contrast) `random`, and a new name easily discoverable by web search. > > There's something else here: some of these messages gave pointers to > web pages where "security wonks" conceded that specific uses of > SystemRandom were fine, but they couldn't recommend it anyway because > it's too hard to explain what is or isn't "safe". "Therefore" users > should only use urandom() directly. Which is insane, if for no other > reason than that users would then invent their own algorithms to > convert urandom() results into floats and ints, etc. Then they'll > screw up _that_ part. > > But if "saferandom" were its own module, then over time it could > implement its own "security wonk certified" higher level (than raw > bytes) methods. I suspect it would never need to change anything from > what the SystemRandom class does, but I'm not a security wonk, so I > know nothing. Regardless, _whatever_ changes certified wonks deemed > necessary in the future could be confined to the new module, where > incompatibilities would only annoy apps using that module. Ditto > whatever doc changes were needed. Also gone would be the inherent > confusion from needing to draw distinctions between "safe" and > "unsafe" in a single module's docs (which any by-magic scheme would > only make worse). > > However, supplying a powerful and dead-simple-to-use new module would > indeed do nothing to help old code entirely by magic. That's a > non-goal to me, but appears to be the _only_ deal-breaker goal for the > advocates. > > Which is why none of us is the BDFL ;-) > So if you or someone else (Chris?) wrote that up in PEP form I'd accept it. I'd even accept adding a warning on calling seed() (but not setstate()). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Wed Sep 16 18:09:15 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 16 Sep 2015 12:09:15 -0400 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <20150916155412.GJ31152@ando.pearwood.info> References: <20150916155412.GJ31152@ando.pearwood.info> Message-ID: On September 16, 2015 at 11:55:48 AM, Steven D'Aprano (steve at pearwood.info) wrote: > > - We add at least one CSPRNG. I leave it to the crypto-wonks to decide > which. We already have a CSPRNG via os.urandom, and importantly we don't have to decide which implementation it is, because the OS provides it and is responsible for it. I am against adding a userspace CSPRNG as anything but a possible implementation detail of making a CSPRNG the default for random.py. If we're not going to change the default, then I think adding a userspace CSPRNG is jsut adding a different footgun. That's OK though, becuase os.urandom is a pretty great CSPRNG. > > Developers will still have to make a choice: "do I use secrets, or > random?" If they're looking for a random token (or password?), the > answer is obvious: use secrets, because the battery is already there. > For reasons that I will go into below, I don't think that requiring this > choice is a bad thing. I think it is a *good* thing. Forcing the user to make a choice isn?t a bad option from a security point of view. Most people will prefer to use the secure one by default even if they don't know better, the problem right now is that there is a "default", and that default is unsafe so people aren't forced to make a choice, they are given a choice with the option to go and make a choice later. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From tim.peters at gmail.com Wed Sep 16 18:09:55 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 16 Sep 2015 11:09:55 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Guido] >> ... >> Tim's proposal is simple: create a new module, e.g. safefandom, with the >> same API as random (less seed/state). That's it. Then it's a simple import >> change away to do the right thing, and we have years to seed StackOverflow >> with better information before that code even hits the road. (But a backport >> to Python 2.7 could be on PyPI tomorrow!) [Nick Coghlan ] > If folks are reaching for a third party library anyway, we'd be better > off point them at one of the higher levels ones like passlib or > cryptography. Note that, in context, "saferandom" _would_ be a standard module in a future Python 3 feature release. But it _could_ be used literally tomorrow by anyone who wanted a head start, whether in a current Python 2 or Python 3. And if pieces of `passlib` and/or `cryptography` are thought to be essential for best practice, cool, then `saferandom` could also become a natural home for workalikes. Would you really want to _ever_ put such functions in the catch-all "random" module? The docs would become an incomprehensible mess. From ncoghlan at gmail.com Wed Sep 16 18:21:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Sep 2015 02:21:09 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 17 September 2015 at 02:09, Tim Peters wrote: > [Guido] >>> ... >>> Tim's proposal is simple: create a new module, e.g. safefandom, with the >>> same API as random (less seed/state). That's it. Then it's a simple import >>> change away to do the right thing, and we have years to seed StackOverflow >>> with better information before that code even hits the road. (But a backport >>> to Python 2.7 could be on PyPI tomorrow!) > > [Nick Coghlan ] >> If folks are reaching for a third party library anyway, we'd be better >> off point them at one of the higher levels ones like passlib or >> cryptography. > > Note that, in context, "saferandom" _would_ be a standard module in a > future Python 3 feature release. But it _could_ be used literally > tomorrow by anyone who wanted a head start, whether in a current > Python 2 or Python 3. > > And if pieces of `passlib` and/or `cryptography` are thought to be > essential for best practice, cool, then `saferandom` could also become > a natural home for workalikes. Would you really want to _ever_ put > such functions in the catch-all "random" module? The docs would > become an incomprehensible mess. My main objection here was the name, so Steven's suggestion of calling such a module "secrets" with a suitably crafted higher level API rather than replicating the entire random module API made a big difference. We may even be able to finally give hmac.compare_digest a more obvious home as something like "secrets.equal". I'll leave PEP 504 as Draft for now, but I currently expect I'll end up withdrawing it in favour of Steven's idea. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jsbfox at gmail.com Wed Sep 16 18:36:58 2015 From: jsbfox at gmail.com (Un Do) Date: Wed, 16 Sep 2015 17:36:58 +0100 Subject: [Python-ideas] Add inspect.getenclosed to return/yield source code for nested classes and functions Message-ID: I propose adding a function into inspect module that will retrieve definitions of classes and functions (standard and lambdas) located inside another function/method. In my opinion this would a small but nice and useful addition to the standard library. It can be implemented using a couple of undocumented function from that module (findsource and getblock) without any performance drawbacks. Example: In [9]: print(getsource(function)) def function(): class inner_class(): def __init__(self): return # Some code # Some more code # Even more code l = lambda x: 42 # Ugh code again def inner_function(with_argument): pass In [10]: for c in function.__code__.co_consts: ....: if not iscode(c): ....: continue ....: name, starts_line = c.co_name, c.co_firstlineno ....: if not name.startswith('<') or name == '': ....: lines, _ = findsource(c) ....: source = ''.join(getblock(lines[starts_line-1:])) ....: print(dedent(source), end='-' * 30 + '\n') ....: class inner_class(): def __init__(self): return ------------------------------ l = lambda x: 42 ------------------------------ def inner_function(with_argument): pass ------------------------------ What do you think? From mertz at gnosis.cx Wed Sep 16 18:39:53 2015 From: mertz at gnosis.cx (David Mertz) Date: Wed, 16 Sep 2015 09:39:53 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: Message-ID: The point here is that the closest we can come to PROTECTING users is to avoid making false promises to them. All this talk of "maybe, possibly, secure RNGs" (until they've been analyzed longer) is just building a house on sand. Maybe ChaCha20 is completely free of all exploits... It's new-ish, and no one has found any. The API we really owe users is to create a class random.BelievedSecureIn2015, and let users utilize that if they like. All the rest of the proposals are just invitations to create more security breaches... The specific thing that random.random and MT DOES NOT do. On Sep 16, 2015 1:29 AM, "Cory Benfield" wrote: > On 16 September 2015 at 08:43, David Mertz wrote: > > Hence I affirmatively PREFER a random module that explicitly proclaims > that > > it is non-cryptographic. Someone who figures out enough to use > > random.SystemRandom, or a future crypto.random, or the like is more > likely > > to think about why they are doing so, and what doing so does and does NOT > > assure them off. > > And what about those that don't? Is our position here "screw 'em, and > also screw their users"? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Sep 16 18:53:38 2015 From: brett at python.org (Brett Cannon) Date: Wed, 16 Sep 2015 16:53:38 +0000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, 16 Sep 2015 at 09:10 Tim Peters wrote: > [Guido] > >> ... > >> Tim's proposal is simple: create a new module, e.g. safefandom, with the > >> same API as random (less seed/state). That's it. Then it's a simple > import > >> change away to do the right thing, and we have years to seed > StackOverflow > >> with better information before that code even hits the road. (But a > backport > >> to Python 2.7 could be on PyPI tomorrow!) > > [Nick Coghlan ] > > If folks are reaching for a third party library anyway, we'd be better > > off point them at one of the higher levels ones like passlib or > > cryptography. > > Note that, in context, "saferandom" _would_ be a standard module in a > future Python 3 feature release. But it _could_ be used literally > tomorrow by anyone who wanted a head start, whether in a current > Python 2 or Python 3. > +1 on the overall idea, although I would rather the module be named random.safe in the stdlib ("namespaces are one honking great idea" and it helps keep the "safer" version of random near the "unsafe" version in the module index which makes discovery easier). And as long as the version on PyPI stays Python 2/3 compatible people can just rely on the saferandom name until they drop Python 2 support and then just update their imports. > > And if pieces of `passlib` and/or `cryptography` are thought to be > essential for best practice, cool, then `saferandom` could also become > a natural home for workalikes. Would you really want to _ever_ put > such functions in the catch-all "random" module? The docs would > become an incomprehensible mess. > So, a PEP for this to propose which random algorithm to use (I have at least heard chacha/ch4random and some AES thing bandied about as being fast)? And if yes to a PEP, who's writing it? And then who is writing the implementation in the end? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Sep 16 19:06:19 2015 From: mertz at gnosis.cx (David Mertz) Date: Wed, 16 Sep 2015 10:06:19 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sep 16, 2015 9:54 AM, "Brett Cannon" wrote: > +1 on the overall idea, although I would rather the module be named random.safe in the stdlib ("namespaces are one honking great idea" and it helps keep the "safer" version of random near the "unsafe" version in the module index which makes discovery easier). And as long as the version on PyPI stays Python 2/3 compatible people can just rely on the saferandom name until they drop Python 2 support and then just update their imports. Without repeating my somewhat satirical long name, I think "safe" is a terrible name because it makes a false promise. However, the name "secrets" is a great name. I think a top-level module is better than "random.secrets" because not everything related to secrets is related to randomness. But that detail is minor. Letting the documentation of "secrets" discuss the current state of cryptanalysis on the algorithms and protocols contained therein is the right place for it. With prominent dates attached to those discussions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Wed Sep 16 19:09:43 2015 From: random832 at fastmail.com (Random832) Date: Wed, 16 Sep 2015 13:09:43 -0400 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1442423383.1806663.385468561.043B31B7@webmail.messagingengine.com> On Wed, Sep 16, 2015, at 11:54, Nick Coghlan wrote: > * random._inst is a SystemRandom() instance by default He has a point on the performance issue. The difference between Random and SystemRandom on my machine is significantly more than an order of magnitude. (Calling libc's arc4random with ctypes was roughly in the middle, though I *suspect* a lot of that was due to ctypes overhead). From donald at stufft.io Wed Sep 16 19:17:22 2015 From: donald at stufft.io (Donald Stufft) Date: Wed, 16 Sep 2015 13:17:22 -0400 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <1442423383.1806663.385468561.043B31B7@webmail.messagingengine.com> References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> <1442423383.1806663.385468561.043B31B7@webmail.messagingengine.com> Message-ID: On September 16, 2015 at 1:10:09 PM, Random832 (random832 at fastmail.com) wrote: > On Wed, Sep 16, 2015, at 11:54, Nick Coghlan wrote: > > * random._inst is a SystemRandom() instance by default > > He has a point on the performance issue. The difference between Random > and SystemRandom on my machine is significantly more than an order of > magnitude. (Calling libc's arc4random with ctypes was roughly in the > middle, though I *suspect* a lot of that was due to ctypes overhead). > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ >? I did the benchmark already:?https://bpaste.net/show/79cc134a12b1 Using this code:?https://github.com/dstufft/randtest However, using anything except for urandom is warned against by most security experts I?ve talked to. Most of them are only OK with it, if it means we?re using a CSPRNG by default in the random.py module, but not if it?s not the default. Even then, one of them thought that using a userspace CSPRNG instead of urandom was a bad idea (The rest though it was better than the status quo). They all agreed that splitting the namespace was a good idea. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From chris.barker at noaa.gov Wed Sep 16 19:25:34 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 16 Sep 2015 10:25:34 -0700 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <85h9n482sa.fsf@benfinney.id.au> Message-ID: <5517176680814632774@unknownmsgid> >but it's still depressing how many people >are still writing blog posts and SO answers >and so on that tell people "you need to >install the latest version of Python, 2.7, I teach an intro to python class, and have been advocating python/supporting users of python on OS-X for years. AND I am one those folks that advocates starting out by installing the latest Python2.7 (unless your going with 3). And I don't think I'm going to stop. >because your computer doesn't come with >it" But never for that reason, but because I don't think users SHOULD rely on the system Python on OS-X (and probably orher systems). You can google the reasons why -- you'll probably find a fair number of posts with my name on it ;-). Or, if that debate really is relevant to this discussion I could repeat it all here... >and then proceed to give instructions >that will lead to a screwed up PATH Well, I hope I don't do that ;-) -- in fact, the python.org installer has done a pretty nice job with its defaults for years -- the people that get messed up the ones that try to "fix" it be hand, when they don't know what they are doing (and very few people DO know what they doing with PATH on OS-X) >and make no mention of virtualenv... OK, I do that ..... But quite deliberately. Virtualenv solves some problems for sure, but NOT the "I can't import something I swear I just installed" problem. in fact, it creates even MORE different "python" and "pip" commands, and a greater need to understand PATH, and what terminals are configured how, etc. So no, I don't introduce virtualenv to beginners. But I'll probably start teaching: python -m pip install ...... -Chris From p.f.moore at gmail.com Wed Sep 16 20:08:20 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 16 Sep 2015 19:08:20 +0100 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <20150916155412.GJ31152@ando.pearwood.info> References: <20150916155412.GJ31152@ando.pearwood.info> Message-ID: On 16 September 2015 at 16:54, Steven D'Aprano wrote: > If there is interest in this proposed secrets module, I'll write up a > proto-PEP over the weekend, and start a new thread for the benefit of > those who have muted this one. I love this idea. The name is perfect, and your motivational discussion fits exactly how I think we should be approaching security. Would it also be worth having secrets.password(alphabet, length) - generate a random password of length "length" from alphabet "alphabet". It's not going to cover every use case, but it immediately becomes the obvious answer to all those "how do I generate a password" SO questions people keep pointing at. Also, a backport version could be made available via PyPI. I don't see why the module couldn't use random.SystemRandom as its CSPRNG (and as a result be pure Python) but that can be an implementation detail the security specialists can argue over if they want. No need to expose it here (although if it's useful, republishing (some more of) its API without exposing the implementation, just like the proposed secrets.choice, would be fine). Paul. From wes.turner at gmail.com Wed Sep 16 20:42:01 2015 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 16 Sep 2015 13:42:01 -0500 Subject: [Python-ideas] High time for a builtin function to manage packages (simply)? In-Reply-To: <878u8i3d2k.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1441315622.90616.374217921.66BE1B36@webmail.messagingengine.com> <87vbbr2b28.fsf@uwakimon.sk.tsukuba.ac.jp> <20150904024552.GL19373@ando.pearwood.info> <204080FC-D5B0-44A9-9D9D-582B1491B413@yahoo.com> <20150904172710.GO19373@ando.pearwood.info> <87lhcl2viv.fsf@uwakimon.sk.tsukuba.ac.jp> <87d1xv2e04.fsf@uwakimon.sk.tsukuba.ac.jp> <878u8i3d2k.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I just found this HTML web-based pip interface: Stallion "easy-to-use Python Package Manager interface with command-line tool" * http://perone.github.io/stallion/ * https://pypi.python.org/pypi/Stallion Qt would not be a reasonable requirement; Tk shouldn't be a requirement. I tend to try and work with repeatable CLI commands; rather than package manager GUIs. On Sep 7, 2015 2:26 AM, "Stephen J. Turnbull" wrote: > Andrew Barnert writes: > > > Tcl/Tk, and Tkinter for all pre-installed Pythons but 2.3, have > > been included with every OS X since they started pre-installing > > 2.5. > > My mistake, it's only MacPorts where I don't have it. I used > MacPorts' all-lowercase spelling, which doesn't work in the system > Python. (The capitalized spelling doesn't work in MacPorts.) > > > And it works with all python.org installs for 10.6 or later, all > > Homebrew default installs, standard source builds... Just about > > anything besides MacPorts (which seems to want to build Tkinter > > against its own Tcl/Tk instead of Apple's) > > I recall having problems with trying to build and run against the > system Tcl/Tk in both source and MacPorts, but that was a *long* time > ago (2.6-ish). Trying it now, on my Mac OS X Yosemite system python > 2.7.10, "root=Tkinter.Tk()" creates and displays a window, but doesn't > pop it up. In fact, "root.tkraise()" doesn't, either. Oops. On this > system, IDLE has the same problem with its initial window, and > furthermore complains that Tcl/Tk 8.5.9 is unstable. > > Quite possibly this window-raising issue is Just Me. But based on my > own experience, it is not at all obvious that ensuring availability of > a GUI is possible in the same way we can ensure pip. > > > Also, why do you think Qt would be less of a problem? > > I don't. I think "ensure PyQt" would be a huge burden, much greater > than Tkinter. Bottom line: IMO, at this point in time, if it has to > Just Work, it has to Work Without GUI. (Modulo the possibility that > we can use an HTML server and borrow the display engine from the > platform web browser. I think I already mentioned that, and I think > it's really the way to go. People who *don't* have a web browser > probably can handle "python -m pip ..." without StackOverflow.) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Wed Sep 16 20:45:13 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 16 Sep 2015 13:45:13 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Guido, on "saferandom"] > So if you or someone else (Chris?) wrote that up in PEP form I'd accept it. I like Steven D'Aprano's "secrets" idea better, so it won't be me ;-) Indeed, if PHP had a secure "token generator" function built in, the paper in question would have found almost nothing of practical interest to write about. > I'd even accept adding a warning on calling seed() (but not setstate()). Yield the width of an electron, and the universe itself will be too small to contain the eventual consequences ;-) From tim.peters at gmail.com Wed Sep 16 20:55:20 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 16 Sep 2015 13:55:20 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Tim] >> .... >> Note that, in context, "saferandom" _would_ be a standard module in a >> future Python 3 feature release. But it _could_ be used literally >> tomorrow by anyone who wanted a head start, whether in a current >> Python 2 or Python 3. [Brett Cannon ] > +1 on the overall idea, although I would rather the module be named > random.safe in the stdlib ("namespaces are one honking great idea" Ah, grasshopper, there's a reason that one is last in PEP 20. "Flat is better than nested" is the one - and only one - that _obviously_ applies here ;-) > and it helps keep the "safer" version of random near the "unsafe" version > in the module index which makes discovery easier). And as long as the > version on PyPI stays Python 2/3 compatible people can just rely on the > saferandom name until they drop Python 2 support and then just update > their imports. I'd much rather see Steven D'Aprano's "secrets" idea pursued: solve "the problems" on their own terms directly. > ... > So, a PEP for this to propose which random algorithm to use (I have at least > heard chacha/ch4random and some AES thing bandied about as being fast)? os.urandom() is the obvious thing to build on, and it's already there. If alternatives are desired (which they may well be - .urandom() is sloooooooow on many systems), that can be addressed later. Before then, speed probably doesn't matter for most plausibly appropriate uses. > And if yes to a PEP, who's writing it? And then who is writing the > implementation in the end? Did you just volunteer? Great! Thanks ;-) OK, Steven already volunteered to write a PEP for his proposal. From mal at egenix.com Wed Sep 16 21:09:27 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 16 Sep 2015 21:09:27 +0200 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <20150916155412.GJ31152@ando.pearwood.info> References: <20150916155412.GJ31152@ando.pearwood.info> Message-ID: <55F9BE67.50209@egenix.com> On 16.09.2015 17:54, Steven D'Aprano wrote: > I propose: > > - The random module's API is left as-is, including the default PRNG. > Backwards compatibility is important, code-churn is bad, and there are > good use-cases for a non-CSPRNG. > > - We add at least one CSPRNG. I leave it to the crypto-wonks to decide > which. > > - We add a new module, which I'm calling "secrets" (for lack of a better > name) to hold best-practice security-related functions. To start with, > it would have at least these three functions: one battery, and two > building blocks: > > + secrets.token to create password recovery tokens or similar; > > + secrets.random calls the CSPRNG; it just returns a random number > (integer?). There is no API for getting or setting the state, > setting the seed, or returning values from non-uniform > distributions; > > + secrets.choice similarly uses the CSPRNG. +1 on the idea (not sure about the name, though :-)) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 16 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-18: PyCon UK 2015 ... 2 days to go 2015-09-26: Python Meeting Duesseldorf Sprint 2015 10 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tim.peters at gmail.com Wed Sep 16 21:13:27 2015 From: tim.peters at gmail.com (Tim Peters) Date: Wed, 16 Sep 2015 14:13:27 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: <20150916155412.GJ31152@ando.pearwood.info> References: <20150916155412.GJ31152@ando.pearwood.info> Message-ID: [Steven D'Aprano , on "secrets"] +1 on everything. Glad _that's_ finally over ;-) One tech point: > ... > + secrets.random calls the CSPRNG; it just returns a random number > (integer?). There is no API for getting or setting the state, > setting the seed, or returning values from non-uniform > distributions; The OpenBSD arc4random() has a very sparse API, but gets this part exactly right: uint32_t arc4random_uniform(uint32_t upper_bound); arc4random_uniform() will return a single 32-bit value, uniformly distributed but less than upper_bound. This is recommended over constructions like ?arc4random() % upper_bound? as it avoids "modulo bias" when the upper bound is not a power of two. In the worst case, this function may consume multiple iterations to ensure uniformity; see the source code to understand the problem and solution. In Python, there's no point to the uint32_t restrictions, and the function is already implemented for arbitrary bigints via the current (but private) Random._randbelow() method, whose implementation could be simplified for this specific use. That in turn relies on the .getrandbits(number_of_bits) method, which SystemRandom overrides. So getrandbits() is the fundamental primitive. and SystemRandom already implements that based on .urandom() results. An OpenBSD-ish random_uniform(upper_bound) would be a "nice to have", but not essential. > + secrets.choice similarly uses the CSPRNG. Apart from error checking, that's just: def choice(seq): return seq[self.random_uniform(len(seq))] random.Random already does that (and SystemRandom inherits it), although spelled with _randbelow(). From srkunze at mail.de Wed Sep 16 21:45:57 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 16 Sep 2015 21:45:57 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F1CAD3.7050602@brenbarn.net> References: <55F0E1F2.6040709@brenbarn.net> <55F1B76E.2030602@mail.de> <55F1CAD3.7050602@brenbarn.net> Message-ID: <55F9C6F5.50903@mail.de> On 10.09.2015 20:24, Brendan Barnwell wrote: > Right, but can't you already do that with ABCs, as in the example in > the docs (https://docs.python.org/2/library/abc.html)? You can write > an ABC whose __subclasshook__ does whatever hasattr checks you want > (and, if you want, checks the type annotations too), and then you can > use isinstance/issubclass to check if a given instance/class "provides > the protocol" described by that ABC. You might probably be write. Maybe, it's that this kind of "does whatever hasattr checks you want" gets standardized via the protocol base class. Pondering about this idea further, current Python actually gives enough means to do that on runtime. If I rely on method A to be present at object b, Python will give me simply an AttributeError and that'll suffice. So, it's only for the static typechecker again. Best, Sven From srkunze at mail.de Wed Sep 16 22:42:20 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 16 Sep 2015 22:42:20 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> Message-ID: <55F9D42C.1080208@mail.de> On 11.09.2015 08:24, Jukka Lehtosalo wrote: > On Thu, Sep 10, 2015 at 9:42 AM, Sven R. Kunze > wrote: > > If my variables have crappy names, so I need to add type hints to > them, well, then, I rather fix them first. > > > Even good variable names can leave the type ambiguous. Try harder then. > And besides, if you assume that all code is perfect or can be made > perfect I think that you've already lost the discussion. Reality > disagrees with you. ;-) Not sure where I said this. > You can't just wave a magic wand and to get every programmer to > document their code and write unit tests. However, we know quite well > that programmers are perfectly capable of writing type annotations, > and tools can even enforce that they are present (witness all the Java > code in existence). You can't just wave a magic wand and to get every programmer to add type annotations to their code. However, we know quite well that programmers are perfectly capable of writing unit tests, and tools can even enforce that they are present (witness coverage tools and hooks in SCM systems preventing it from dropping). [ Interesting, that it was that easy to exchange the parts you've given me ;) ] Btw. have you heard of code review? > Tools can't verify that you have good variable names or useful > docstrings, and people are too inconsistent or lazy to be relied on. Same can be said for type annotations. > In a cost/benefit analysis it may be optimal to spent half the > available time on annotating parts of the code base to get some (but > necessarily limited) static checking coverage and spend the remaining > half on writing tests for selected parts of the code base, for > example. It's not all or nothing. I would like to peer-review that cost/benefit analysis you've made to see whether your numbers are sane. > >> You get extra credit if your tests are slow to run and flaky, > > We are problem solvers. So, I would tell my team: "make them > faster and more reliable". > > > But you'd probably also ask them to implement new features (or *your* > manager might be unhappy), and they have to find the right balance, as > they only have 40 hours a week (or maybe 80 hours if you work at an > early-stage startup :-). Having more tools gives you more options for > spending your time efficiently. Yes, I am going to tell him: "Hey, it doesn't work but we got all/most of the types right." > > Granted. But you still don't know if your code runs correctly. You > are better off with tests. And I agree type checking is 1 test to > perform (out of 10K). > > > Actually a type checker can verify multiple properties of a typical > line of code. So for 10k lines of code, complete type checking > coverage would give you the equivalent of maybe 30,000 (simple) tests. > :-P I think you should be more specific on this. Using hypothesis, e.g., you can easily increase the number of simple tests as well. What I can tell is that most of the time, a variable carries the same type. It is really convenient that it doesn't have to but most of the time it does. Thus, one test run can probably reveal a dangerous type mistake. I've seen code where that is not the case indeed and one variable is either re-used or accidentally have different types. But, well, you better stay away from it anyway because most of the time it's very old code. Moreover, in order to add *reasonable* type annotations you would probably invest equal amount of time that you would invest to write some tests for it. The majority of time is about *understanding* the code. And there, better variable names help a lot. > It's often not cost effective to have good test coverage (and even > 100% line coverage doesn't give you full coverage of all > interactions). Testing can't prove that your code doesn't have defects > -- it just proves that for a tiny subset of possible inputs you code > works as expected. A type checker may be able to prove that for *all* > possible inputs your code doesn't do certain bad things, but it can't > prove that it does the good things. Neither subsumes the other, and > both of these are approaches are useful and complementary (but > incomplete). I fully agree on this. Yet I don't need type annotations. ;) A simple test running a typechecker working at 40%-60% (depending on whom you ask) efficiency suffices at least for me. I would love to see better typecheckers rather than cluttering our code with some questionable annotations; btw. of which I don't know of are necessary at all. Don't be fooled by the possibility of dynamic typing in Python. Just because it's possible doesn't necessarily mean it's the usual thing. > I think that there was a good talk basically about this at PyCon this > year, by the way, but I can't remember the title. It'll be great to have it. :) Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Sep 16 22:57:29 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 16 Sep 2015 22:57:29 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> Message-ID: <55F9D7B9.8060109@mail.de> On 11.09.2015 00:22, Andrew Barnert wrote: > On Sep 10, 2015, at 09:42, Sven R. Kunze wrote: >> I mean when I am really going to touch that file to improve documentation (which annotations are a piece of), I am going to add more information for the reader of my API and that mostly will be describing the behavior of the API. > As a bit of useless anecdotal evidence: > > After starting to play with MyPy when Guido first announced the idea, I haven't actually started using static type checking seriously, but I have started writing annotations for some of my functions. It feels like a concise and natural way to say "this function wants two integers", and it reads as well as it writes. Of course there's no reason I couldn't have been doing this since 3.0, but I wasn't, and now I am. > > Try playing around with it and see if you get the same feeling. Since everyone is thinking about the random module right now, and it makes a great example of what I'm talking about, specify which functions take/return int vs. float, which need a real int vs. anything Integral, etc., and how much more easily you absorb the information than if it's in the middle of a sentence in the docstring. > > Anyway, I don't actually annotate every function (or every function except the ones that are so simple that any checker or reader that couldn't infer the types is useless, the way I would in Haskell), just the ones where the types seem like an important part of the semantics. So I haven't missed the more complex features the way I expected to. But I've still got no problem with them being added as we go along, of course. :) Thanks for the anecdote. It's good to hear you don't do it for every function and I am glad it helps you a lot. :) Do you know what makes me sad? If you do that for this function but don't do it for another what is the guideline then? Python Zen tells us to have one obvious way to do sth. At least for me, it's not obvious anymore when to annotate and when not to annote. Just a random guess depending on the moon phase? :( Sometimes and sometimes that. That can't be right for something to basic like types. Couldn't these problems not be solved by further research on typecheckers? Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production. But as said, I like the theoretical discussion around it. :) Best, Sven From asweigart at gmail.com Thu Sep 17 00:28:11 2015 From: asweigart at gmail.com (Al Sweigart) Date: Wed, 16 Sep 2015 15:28:11 -0700 Subject: [Python-ideas] Non-English names in the turtle module (with Spanish example) Message-ID: I've created a prototype for how we could add foreign language names to the turtle.py module and erase the language barrier for non-English schoolkids. The Tortuga module has the same functionality as You can test it out by running "pip install tortuga" https://pypi.python.org/pypi/Tortuga Since Python 2 doesn't have simpledialog, I used the PyMsgBox pure-python module for the input boxes. This code is small enough that it could be added into turtle.py. (It's just used for simple tkinter dialog boxes.) Check out the diff between Tortuga and turtle.py here: https://www.diffchecker.com/2xmbrkhk This file can be easily adapted to support multiple programming languages. Thoughts? Suggestions? -Al -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Thu Sep 17 03:04:53 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 17 Sep 2015 10:04:53 +0900 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F9D7B9.8060109@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> Message-ID: <87wpvpx3d6.fsf@uwakimon.sk.tsukuba.ac.jp> Sven R. Kunze writes: > Do you know what makes me sad? If you do that for this function but > don't do it for another what is the guideline then? Python Zen tells us > to have one obvious way to do sth. At least for me, it's not obvious > anymore when to annotate and when not to annote. Just a random guess > depending on the moon phase? :( No. There's a simple rule: if it's obvious to you that type annotation is useful, do it. If it's not obvious you want it, you don't, and you don't do it. You obviously are unlikely to do it for some time, if ever. Me too. But some shops want to use automated tools to analyze these things, and I don't see why there's a problem in providing a feature that makes it easier for them to do that. > Btw. I can tell the same anecdote when switching from C/C++/C#/Java to > Python. It was like a liberation---no explicit type declarations > anymore. I was baffled and frightened the first week using it. But I > love it now and I don't want to give that freedom up. Maybe, that's why > I am reluctant to use it in production. So don't, nothing else in the language depends on type annotation or on running a type checker for that matter. What's your point? That you'll have to read them in the stdlib? Nope; the stdlib will use stubfiles where it uses type annotations at all for the foreseeable future. That your employer might make you use them? That's the nature of employment. And if you can't convince your boss that annotations have no useful role in a program written in good style, why would you expect to convince us? From stephen at xemacs.org Thu Sep 17 03:36:37 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 17 Sep 2015 10:36:37 +0900 Subject: [Python-ideas] Non-English names in the turtle module (with Spanish example) In-Reply-To: References: Message-ID: <87vbb9x1wa.fsf@uwakimon.sk.tsukuba.ac.jp> Al Sweigart writes: > I've created a prototype for how we could add foreign language > names to the turtle.py module and erase the language barrier for > non-English schoolkids. I noticed that "ayudarme" ("help me") isn't implemented. Whether that's a barrier or not, I don't know. I suspect it needs testing, because I can imagine that kids might not use help if it were available, preferring to ask the kid next to them. The error messages aren't translated. That definitely needs to be done. > This file can be easily adapted to support multiple programming > languages. Maybe it's just me, but putting multiple languages in a single file might be confusing to the user if you just add all the language bindings to locals(). I really would like to see (but won't do it myself) a structure where "tortuga" imports "turtle", defines some variables which contain the dictionaries, and calls an "install language" utility (new in turtle) that does what's necessary to hook up the dictionaries to the turtle commands. From steve at pearwood.info Thu Sep 17 05:59:15 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 17 Sep 2015 13:59:15 +1000 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F9D7B9.8060109@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> Message-ID: <20150917035915.GL31152@ando.pearwood.info> On Wed, Sep 16, 2015 at 10:57:29PM +0200, Sven R. Kunze wrote: > Do you know what makes me sad? If you do that for this function but > don't do it for another what is the guideline then? Python Zen tells us > to have one obvious way to do sth. At least for me, it's not obvious > anymore when to annotate and when not to annote. Just a random guess > depending on the moon phase? :( This is no different from when to document and when to write tests. In a perfect world, every function is fully documented and fully tested. But in reality we have only a limited about of time to spend writing code, and only a portion of that is spent writing documentation and tests, so we have to prioritise. Some functions are less than fully documented and less than fully tested. How do you decide which ones get your attention? People will use the same sort of heuristic for deciding which functions get annotated: - does the function need annotations/documentation/tests? - do I have time to write annotations/documentation/tests? - is my manager telling me to add annotations/documentation/tests? - if I don't, will bad things happen? - if it easy or interesting to add them? - or difficult and boring? Don't expect to hold annotations up to a higher standard than we already hold other aspects of programming. -- Steve From gokoproject at gmail.com Thu Sep 17 06:07:50 2015 From: gokoproject at gmail.com (John Wong) Date: Thu, 17 Sep 2015 00:07:50 -0400 Subject: [Python-ideas] Bring line continuation to multi-level dictionary lookup Message-ID: Hi everyone. I work with APIs which have deep nested dictionary structure response. Imagine a simplified case: foo = {1: {2: {3: {4: {5: 6 } } } } Now imagine I need to get to 6: foo['1']['2']['3']['4']['5']['6'] This looks managable, but if the key name is long, then I certainly will end doing this to respect my style guide. To make it concrete, let's use something reallistic, a response call from AWS API: response = {'DescribeDBSnapshotsResponse': {'ResponseMetadata': {'RequestId': '123456'}, 'DescribeDBSnapshotsResult': {'Marker': None, 'DBSnapshots': [{'Engine': 'postgres'}]}}} If I had to get to the Engine I'd do this: detail_response = response["DescribeDBSnapshotsResponse"] result = detail_response["DescribeDBSnapshotsResult"] This is only a few level deep, but imagine something slightly longer (I strict out so much from this response). Obviously I am picking some real example but key name being really long to sell my request. Can we do it differently? How about print(response.get( "DescribeDBSnapshotsResponse").get( "DescribeDBSnapshotsResult").get( "DBSnapshots")[0].get( "Engine")) Okay. Not bad, almost like writing in Javascript except Python doesn't allow you to do line continuation before the got at all, so you are stuck with (. But the problem with the alternative is that if DescribeDBSnapshotsResult is a non-existent key, you will just get None, because that's the beauty of the .get method for a dictionary object. So while this allows you to write in slightly different way, I am costing silent KeyError exception. I wouldn't know which key raised the exception. Whereas with [key1][key2] I know if key1 doesn't exist, the exception will explain to me that key1 does not exist. So here I am, thinking, what if we can do this? response( ["DescribeDBSnapshotsResponse"] ["DescribeDBSnapshotsResult"] ) You get the point. This looks kinda ugly, but it doesn't require so many assignment. I think this is doable, after all [ ] is still a method call with the key name passed in. I am not familar with grammar, so I don't know how hard and how much the implementation has to change to adopt this. Let me know if this is a +1 or -10000000 bad crazy idea. Thanks. John -------------- next part -------------- An HTML attachment was scrubbed... URL: From 4kir4.1i at gmail.com Thu Sep 17 06:16:02 2015 From: 4kir4.1i at gmail.com (Akira Li) Date: Thu, 17 Sep 2015 07:16:02 +0300 Subject: [Python-ideas] Bring line continuation to multi-level dictionary lookup References: Message-ID: <87eghx4r5p.fsf@gmail.com> John Wong writes: > Hi everyone. > > I work with APIs which have deep nested dictionary structure response. > Imagine a simplified case: > > foo = {1: {2: {3: {4: {5: 6 } } } } > > Now imagine I need to get to 6: > > foo['1']['2']['3']['4']['5']['6'] > > This looks managable, but if the key name is long, then I certainly will > end doing this to respect my style guide. To make it concrete, let's use > something reallistic, a response call from AWS API: > > response = {'DescribeDBSnapshotsResponse': {'ResponseMetadata': > {'RequestId': '123456'}, 'DescribeDBSnapshotsResult': {'Marker': None, > 'DBSnapshots': [{'Engine': 'postgres'}]}}} > > If I had to get to the Engine I'd do this: > > detail_response = response["DescribeDBSnapshotsResponse"] > result = detail_response["DescribeDBSnapshotsResult"] > > This is only a few level deep, but imagine something slightly longer (I > strict out so much from this response). Obviously I am picking some real > example but key name being really long to sell my request. > > Can we do it differently? How about > print(response.get( > "DescribeDBSnapshotsResponse").get( > "DescribeDBSnapshotsResult").get( > "DBSnapshots")[0].get( > "Engine")) > > Okay. Not bad, almost like writing in Javascript except Python doesn't > allow you to do line continuation before the got at all, so you are stuck > with (. > > But the problem with the alternative is that > if DescribeDBSnapshotsResult is a non-existent key, you will just get None, > because that's the beauty of the .get method for a dictionary object. So > while this allows you to write in slightly different way, I am costing > silent KeyError exception. I wouldn't know which key raised the exception. > Whereas with [key1][key2] I know if key1 doesn't exist, the exception will > explain to me that key1 does not exist. import functools import operator functools.reduce(operator.getitem, [ "DescribeDBSnapshotsResponse", "DescribeDBSnapshotsResult", "DBSnapshots", 0, "Engine"], response) > So here I am, thinking, what if we can do this? > > response( > ["DescribeDBSnapshotsResponse"] > ["DescribeDBSnapshotsResult"] > ) > > You get the point. This looks kinda ugly, but it doesn't require so many > assignment. I think this is doable, after all [ ] is still a method call > with the key name passed in. I am not familar with grammar, so I don't know > how hard and how much the implementation has to change to adopt this. > > Let me know if this is a +1 or -10000000 bad crazy idea. > > Thanks. > > John From zachary.ware+pyideas at gmail.com Thu Sep 17 06:24:03 2015 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Wed, 16 Sep 2015 23:24:03 -0500 Subject: [Python-ideas] Bring line continuation to multi-level dictionary lookup In-Reply-To: References: Message-ID: On Wed, Sep 16, 2015 at 11:07 PM, John Wong wrote: > So here I am, thinking, what if we can do this? > > response( > ["DescribeDBSnapshotsResponse"] > ["DescribeDBSnapshotsResult"] > ) > > You get the point. This looks kinda ugly, but it doesn't require so many > assignment. I think this is doable, after all [ ] is still a method call > with the key name passed in. I am not familar with grammar, so I don't know > how hard and how much the implementation has to change to adopt this. > > Let me know if this is a +1 or -10000000 bad crazy idea. I think a much better idea is to create a utility function: def dig(container, *path): obj = container for p in path: obj = obj[p] return obj Then you can do your long lookup like so: engine = dig(response, 'DescribeDBSnapshotsResponse', 'DescribeDBSnapshotsResult', 'DBSnapshots', 0, 'Engine') This came up on python-list a month or two ago; perhaps there's some merit in finding a place to stick this utility function. -- Zach From guido at python.org Thu Sep 17 06:45:14 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Sep 2015 21:45:14 -0700 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <20150916155412.GJ31152@ando.pearwood.info> Message-ID: On Wed, Sep 16, 2015 at 12:13 PM, Tim Peters wrote: > [Steven D'Aprano , on "secrets"] > > +1 on everything. Glad _that's_ finally over ;-) > Yes. Thanks all! I'm looking forward to the new PEP. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Thu Sep 17 06:47:16 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Wed, 16 Sep 2015 21:47:16 -0700 Subject: [Python-ideas] Bring line continuation to multi-level dictionary lookup In-Reply-To: References: Message-ID: <55FA45D4.6060607@brenbarn.net> On 2015-09-16 21:07, John Wong wrote: > Hi everyone. > > I work with APIs which have deep nested dictionary structure response. > Imagine a simplified case: > > foo = {1: {2: {3: {4: {5: 6 } } } } > > Now imagine I need to get to 6: > > foo['1']['2']['3']['4']['5']['6'] You can just use the existing parentheses-based line continuation for this: (foo[1] [2] [3] [4] [5] ) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From abarnert at yahoo.com Thu Sep 17 07:56:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 16 Sep 2015 22:56:42 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F9D7B9.8060109@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> Message-ID: <058F4ABF-CCBC-455B-9A37-41742AA35C5E@yahoo.com> On Sep 16, 2015, at 13:57, Sven R. Kunze wrote: > Sometimes and sometimes that. That can't be right for something to basic like types. Types aren't as basic as you think, and assuming they are leads you to design languages like Java, that restrict you to working within unnecessary constraints. For an obvious example, what's the return type of json.loads (or, worse, eval)? Haskell, Dependent ML, and other languages have made great strides in working out how to get most of the power of a language like Python (and some things Python can't do, too) in a type-driven paradigm, but there's still plenty of research to go. And, even if that were a solved problem, nobody wants to rewrite Python as an ML dialect, and nobody would use it if you did. Python solves the json.loads problem by saying its runtime type is defined lazily and implicitly by the data. And there's no way any static type checker can possibly infer that type. A good statically-typed language can make it a lot easier to handle than a bad one like Java, but it will be very different from Python. > Couldn't these problems not be solved by further research on typecheckers? I'm not sure which problems you want solved. If you want every type to be inferable, for a language with a sufficiently powerful type system, that's provably equivalent to the halting problem, so it's not going to happen. More importantly, we already have languages with a powerful static type system and a great inference engine, and experience with those languages shows that it's often useful to annotate some types for readability that the inference engine could have figured out. If a particular function is more understandable to the reader when it declared its parameter types, I can't imagine what research anyone would do that would cause me to stop wanting to declaring those types. Also, even when you want to rely on inference, you still want the types to have meaningful names that you can read, and could have figured out how to construct on your own, for things like error messages, debuggers, and reflective code. So, the work that Jukka is proposing would still be worth doing even if we had perfect inference. > Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production. The problem here is that you're coming from C++/C#/Java, which are terrible examples of static typing. Disliking static typing because of Java is like disliking dynamic typing because of Tcl. I won't get into details of why they're so bad, but: if you don't have the time to learn you a Haskell for great good, you can probably at least pick up Boo in an hour or so, to at least see what static typing is like with inference by default and annotations only when needed in a very pythonesque languages, and that will give you half the answer. From abarnert at yahoo.com Thu Sep 17 08:06:23 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 16 Sep 2015 23:06:23 -0700 Subject: [Python-ideas] Bring line continuation to multi-level dictionary lookup In-Reply-To: References: Message-ID: On Sep 16, 2015, at 21:07, John Wong wrote: > > So here I am, thinking, what if we can do this? > > response( > ["DescribeDBSnapshotsResponse"] > ["DescribeDBSnapshotsResult"] > ) This already has a perfectly valid meaning: you have a list of one string, you're indexing it with another string, and passing the result to a function. If this isn't obvious, try this example: frobulate(['a', 'e', 'i', 'o', 'u'][vowel]) So, giving it a second meaning would be ambiguous. Also, there's already a perfectly good way to write what you want. (Actually two, because square brackets continue the exact same way parens do, but I wouldn't recommend that here.) (response ["DescribeDBSnapshotsResponse"] ["DescribeDBSnapshotsResult"] ) That looks no uglier than your suggestion, and a lot less ugly when buried inside a larger expression. (I think it might look nicer to indent the bracketed keys, but I think that technically violates PEP 8.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Sep 17 14:35:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Sep 2015 22:35:18 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 17 September 2015 at 04:55, Tim Peters wrote: > [Brett Cannon ] >> And if yes to a PEP, who's writing it? And then who is writing the >> implementation in the end? > > Did you just volunteer? Great! Thanks ;-) OK, Steven already > volunteered to write a PEP for his proposal. As far as implementation goes, based on a separate discussion at https://github.com/pyca/cryptography/issues/2347, I believe the essential cases can all be covered by: def random_bits(bits): return os.urandom(bits//8) def random_int(bits): return int.from_bytes(random_bits(bits), byteorder="big") def random_token(bits): return base64.urlsafe_b64encode(random_bits(bits)).decode("ascii") def random_hex_digits(bits): return binascii.hexlify(random_bits(bits)).decode("ascii") So if you want a 128 bit (16 bytes) IV, you can just write "secrets.random_bits(128)". Examples of all four in action: >>> random_bits(256) b'\xacc\xa6I[\x9c\xca\x86=B$\xd0\xbc\xee\x8a\xe3i\xe9\xb2\xf4w\xd4@\xc2{U\xb5\xb0\xac\x82\x8a=' >>> random_int(bits=256) 44147786895503064021838366541869866305141442570318401936078951782072369110412 >>> random_token(bits=256) '-woFuniDCsApOFMtRP5vtjfPfFkmvVhdaPoh9eqAuSs=' >>> random_hex_digits(bits=256) 'e5b09c74bda516ca8464f38dc45428004b6bd81d4e4031fdf9f164e567fbed82' Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tim.peters at gmail.com Thu Sep 17 17:11:44 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 17 Sep 2015 10:11:44 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Nick Coghlan ] > As far as implementation goes, based on a separate discussion at > https://github.com/pyca/cryptography/issues/2347, I believe the > essential cases can all be covered by: > > def random_bits(bits): > return os.urandom(bits//8) > > def random_int(bits): > return int.from_bytes(random_bits(bits), byteorder="big") > > def random_token(bits): > return base64.urlsafe_b64encode(random_bits(bits)).decode("ascii") > > def random_hex_digits(bits): > return binascii.hexlify(random_bits(bits)).decode("ascii") > > So if you want a 128 bit (16 bytes) IV, you can just write > "secrets.random_bits(128)". Examples of all four in action: > > ... Probably better to wait until Steven starts a new thread about his PEP (nobody is ever gonna look at _this_ thread again ;-) ). Just two things to note: 1. Whatever task-appropriate higher-level functions people want, as you've shown "secure" implementations are easy to write for someone who knows what's available to build on. It will take 10000 times longer for people to bikeshed what "secrets" should offer than to implement it ;-) 2. I'd personally be surprised if a function taking a "number of bits" argument silently replaced argument `bits` with `bits - bits % 8`. If the app-level programmers at issue can't think in terms of bytes instead (and use functions with a `bytes` argument), then, e.g., better to raise an exception if `bits % 8 != 0` to begin with. Or to round up, taking "bits" as meaning "a number of bytes covering _at least_ the number of bits asked for". From ncoghlan at gmail.com Thu Sep 17 18:36:15 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Sep 2015 02:36:15 +1000 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 18 September 2015 at 01:11, Tim Peters wrote: > Just two things to note: > > 1. Whatever task-appropriate higher-level functions people want, as > you've shown "secure" implementations are easy to write for someone > who knows what's available to build on. It will take 10000 times > longer for people to bikeshed what "secrets" should offer than to > implement it ;-) Agreed, although the 4 I listed are fairly well-credentialed - the implementations of the first two (raw bytes and integers) are the patterns cryptography.io uses, the token generator is comparable to the Django one (with a couple of extra punctuation characters in the alphabet), and the hex digit generator is the Pyramid one. You can get more exotic with full arbitrary alphabet password and passphrase generators, but I think we're getting beyond stdlib level functionality at that point - it's getting into the realm of password managers and attack software. > 2. I'd personally be surprised if a function taking a "number of bits" > argument silently replaced argument `bits` with `bits - bits % 8`. If > the app-level programmers at issue can't think in terms of bytes > instead (and use functions with a `bytes` argument), then, e.g., > better to raise an exception if `bits % 8 != 0` to begin with. Or to > round up, taking "bits" as meaning "a number of bytes covering _at > least_ the number of bits asked for". Yeah, I took a shortcut to keep them all as pretty one liners. A proper rand_bits with that API would look something like: def rand_bits(bits): num_bytes, add_byte = divmod(bits) if add_byte: num_bytes += 1 return os.urandom(bits) Compared to the os.urandom() call itself, the bits -> bytes calculation should disappear into the noise from a speed perspective (and a JIT compiled runtime like PyPy could likely optimise it away entirely). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From random832 at fastmail.com Thu Sep 17 19:08:37 2015 From: random832 at fastmail.com (Random832) Date: Thu, 17 Sep 2015 13:08:37 -0400 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1442509717.2145449.386496081.2D5AF5B7@webmail.messagingengine.com> On Thu, Sep 17, 2015, at 12:36, Nick Coghlan wrote: > You can get more exotic with full arbitrary alphabet password and > passphrase generators, but I think we're getting beyond stdlib level > functionality at that point - it's getting into the realm of password > managers and attack software. I think it's important to at least have a way to get a random number in a range that isn't a power of two, since that's so easy to get wrong. Even the libc arc4random API has that in arc4random_uniform. At that point people can build their own arbitrary alphabet password generators as one-liners. From tim.peters at gmail.com Thu Sep 17 20:07:28 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 17 Sep 2015 13:07:28 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Tim] >> Just two things to note: >> >> 1. Whatever task-appropriate higher-level functions people want, as >> you've shown "secure" implementations are easy to write for someone >> who knows what's available to build on. It will take 10000 times >> longer for people to bikeshed what "secrets" should offer than to >> implement it ;-) [Nick Coghlan ] > Agreed, although the 4 I listed are fairly well-credentialed - the > implementations of the first two (raw bytes and integers) are the > patterns cryptography.io uses, the token generator is comparable to > the Django one (with a couple of extra punctuation characters in the > alphabet), and the hex digit generator is the Pyramid one. I will immodestly claim that nobody needs to be a crypto-wonk to see that these implementations are exactly as secure (or insecure) as the platform urandom(): in each case, it's trivial to invert the output to recover the exact bytes urandom() returned. So if there's any attack against the outputs, that's also an attack against what urandom() returned. The outputs just spell what urandom returned using a different alphabet. For the same reason, e.g., it would be fine to replace each 0 bit in urandom's result with the string "egg", and each 1 bit with the string "turtle". An attack on the output of that is exactly as hard (or easy) as an attack on the output of urandom. Obvious, right? It's only a little harder to see that the same is true of even the fanciest of your 4 functions. Where you _may_ get in trouble is creating a non-invertible output. Like: def secure_int(nbytes): n = int.from_bytes(os.urandom(nbytes), "big") return n - n That's not likely to be useful ;-) > You can get more exotic with full arbitrary alphabet password and > passphrase generators, but I think we're getting beyond stdlib level > functionality at that point - it's getting into the realm of password > managers and attack software. I'll leave that for the discussion of Steven's PEP. I think he was on the right track to, e.g., suggest a secure choice() as one his few base building blocks. It _does_ take some expertise to implement a secure choice() correctly, but not so much from the crypto view as from the free-from-statistical-bias view. SystemRandom.choice() already gets both right. >> 2. I'd personally be surprised if a function taking a "number of bits" >> argument silently replaced argument `bits` with `bits - bits % 8`. If >> the app-level programmers at issue can't think in terms of bytes >> instead (and use functions with a `bytes` argument), then, e.g., >> better to raise an exception if `bits % 8 != 0` to begin with. Or to >> round up, taking "bits" as meaning "a number of bytes covering _at >> least_ the number of bits asked for". > Yeah, I took a shortcut to keep them all as pretty one liners. A > proper rand_bits with that API would look something like: > > def rand_bits(bits): > num_bytes, add_byte = divmod(bits) > if add_byte: > num_bytes += 1 > return os.urandom(bits) You should really be calling that with "num_bytes" now ;-) > Compared to the os.urandom() call itself, the bits -> bytes > calculation should disappear into the noise from a speed perspective > (and a JIT compiled runtime like PyPy could likely optimise it away > entirely). Goodness - "premature optimization" already?! ;-) Fastest in pure Python is likely num_bytes = (bits + 7) >> 3 But if I were bikeshedding I'd question why the function weren't: def rand_bytes(nbytes): return os.urandom(nbytes) instead. A rand_bits(nbits) that meant what it said would likely also be useful: From srkunze at mail.de Thu Sep 17 23:13:48 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 17 Sep 2015 23:13:48 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <87wpvpx3d6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> <87wpvpx3d6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <55FB2D0C.2020008@mail.de> On 17.09.2015 03:04, Stephen J. Turnbull wrote: > Sven R. Kunze writes: > > > Do you know what makes me sad? If you do that for this function but > > don't do it for another what is the guideline then? Python Zen tells us > > to have one obvious way to do sth. At least for me, it's not obvious > > anymore when to annotate and when not to annote. Just a random guess > > depending on the moon phase? :( > > No. There's a simple rule: if it's obvious to you that type > annotation is useful, do it. If it's not obvious you want it, you > don't, and you don't do it. You obviously are unlikely to do it for > some time, if ever. Me too. I was talking about specific examples (functions and methods). You were talking about the concept as a whole if I am not completely mistaken. From rymg19 at gmail.com Thu Sep 17 23:19:30 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Thu, 17 Sep 2015 16:19:30 -0500 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55F9D42C.1080208@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <55F9D42C.1080208@mail.de> Message-ID: <881AE78F-DCDB-4085-A298-28400E880A50@gmail.com> On September 16, 2015 3:42:20 PM CDT, "Sven R. Kunze" wrote: >On 11.09.2015 08:24, Jukka Lehtosalo wrote: >> On Thu, Sep 10, 2015 at 9:42 AM, Sven R. Kunze > > wrote: >> >> If my variables have crappy names, so I need to add type hints to >> them, well, then, I rather fix them first. >> >> >> Even good variable names can leave the type ambiguous. > >Try harder then. def process_integer_coordinate_tuples(integer_tuple_1, integer_tuple_2, is_fast): ... vs def process_coords(t1: Tuple[int, int], t2: Tuple[int, int], fast: bool): ... Java's fatal mistake. > >> And besides, if you assume that all code is perfect or can be made >> perfect I think that you've already lost the discussion. Reality >> disagrees with you. ;-) > >Not sure where I said this. > >> You can't just wave a magic wand and to get every programmer to >> document their code and write unit tests. However, we know quite well > >> that programmers are perfectly capable of writing type annotations, >> and tools can even enforce that they are present (witness all the >Java >> code in existence). > >You can't just wave a magic wand and to get every programmer to add >type >annotations to their code. However, we know quite well that programmers > >are perfectly capable of writing unit tests, and tools can even enforce > >that they are present (witness coverage tools and hooks in SCM systems >preventing it from dropping). > >[ Interesting, that it was that easy to exchange the parts you've given > >me ;) ] > >Btw. have you heard of code review? > >> Tools can't verify that you have good variable names or useful >> docstrings, and people are too inconsistent or lazy to be relied on. > >Same can be said for type annotations. > >> In a cost/benefit analysis it may be optimal to spent half the >> available time on annotating parts of the code base to get some (but >> necessarily limited) static checking coverage and spend the remaining > >> half on writing tests for selected parts of the code base, for >> example. It's not all or nothing. > >I would like to peer-review that cost/benefit analysis you've made to >see whether your numbers are sane. > >> >>> You get extra credit if your tests are slow to run and flaky, >> >> We are problem solvers. So, I would tell my team: "make them >> faster and more reliable". >> >> >> But you'd probably also ask them to implement new features (or *your* > >> manager might be unhappy), and they have to find the right balance, >as >> they only have 40 hours a week (or maybe 80 hours if you work at an >> early-stage startup :-). Having more tools gives you more options for > >> spending your time efficiently. > >Yes, I am going to tell him: "Hey, it doesn't work but we got all/most >of the types right." > >> >> Granted. But you still don't know if your code runs correctly. >You >> are better off with tests. And I agree type checking is 1 test to >> perform (out of 10K). >> >> >> Actually a type checker can verify multiple properties of a typical >> line of code. So for 10k lines of code, complete type checking >> coverage would give you the equivalent of maybe 30,000 (simple) >tests. >> :-P > >I think you should be more specific on this. > >Using hypothesis, e.g., you can easily increase the number of simple >tests as well. > >What I can tell is that most of the time, a variable carries the same >type. It is really convenient that it doesn't have to but most of the >time it does. Thus, one test run can probably reveal a dangerous type >mistake. I've seen code where that is not the case indeed and one >variable is either re-used or accidentally have different types. But, >well, you better stay away from it anyway because most of the time it's > >very old code. > >Moreover, in order to add *reasonable* type annotations you would >probably invest equal amount of time that you would invest to write >some >tests for it. The majority of time is about *understanding* the code. >And there, better variable names help a lot. > >> It's often not cost effective to have good test coverage (and even >> 100% line coverage doesn't give you full coverage of all >> interactions). Testing can't prove that your code doesn't have >defects >> -- it just proves that for a tiny subset of possible inputs you code >> works as expected. A type checker may be able to prove that for *all* > >> possible inputs your code doesn't do certain bad things, but it can't > >> prove that it does the good things. Neither subsumes the other, and >> both of these are approaches are useful and complementary (but >> incomplete). > >I fully agree on this. Yet I don't need type annotations. ;) A simple >test running a typechecker working at 40%-60% (depending on whom you >ask) efficiency suffices at least for me. > >I would love to see better typecheckers rather than cluttering our code > >with some questionable annotations; btw. of which I don't know of are >necessary at all. > >Don't be fooled by the possibility of dynamic typing in Python. Just >because it's possible doesn't necessarily mean it's the usual thing. > >> I think that there was a good talk basically about this at PyCon this > >> year, by the way, but I can't remember the title. > >It'll be great to have it. :) > >Best, >Sven > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. From srkunze at mail.de Thu Sep 17 23:24:53 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 17 Sep 2015 23:24:53 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <20150917035915.GL31152@ando.pearwood.info> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> <20150917035915.GL31152@ando.pearwood.info> Message-ID: <55FB2FA5.6050501@mail.de> On 17.09.2015 05:59, Steven D'Aprano wrote: > People will use the same sort of heuristic for deciding which functions > get annotated: > > - does the function need annotations/documentation/tests? > - do I have time to write annotations/documentation/tests? > - is my manager telling me to add annotations/documentation/tests? > - if I don't, will bad things happen? > - if it easy or interesting to add them? > - or difficult and boring? I fear I am not convinced of that analogy. Tests and documentation is all or nothing. Either you have them or you don't and one is not worthier than another. Type annotations (as far as I understand them) are basically completing a picture of 40%-of-already-inferred types. So, I have difficulties to infer which parameters actually would benefit from annotating. I am either doing redundant work (because the typechecker is already very well aware of the type) or I actually insert explicit knowledge (which might become redundant in case typecheckers actually become better). From srkunze at mail.de Thu Sep 17 23:42:52 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 17 Sep 2015 23:42:52 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <058F4ABF-CCBC-455B-9A37-41742AA35C5E@yahoo.com> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> <058F4ABF-CCBC-455B-9A37-41742AA35C5E@yahoo.com> Message-ID: <55FB33DC.6080203@mail.de> On 17.09.2015 07:56, Andrew Barnert wrote: > I'm not sure which problems you want solved. > > If you want every type to be inferable, for a language with a sufficiently powerful type system, that's provably equivalent to the halting problem, so it's not going to happen. Nobody said it must be perfect. It just needs to be good enough. > More importantly, we already have languages with a powerful static type system and a great inference engine, and experience with those languages shows that it's often useful to annotate some types for readability that the inference engine could have figured out. If a particular function is more understandable to the reader when it declared its parameter types, I can't imagine what research anyone would do that would cause me to stop wanting to declaring those types. Because it's more code, redundant, needs to me maintained and so on and so forth. > Also, even when you want to rely on inference, you still want the types to have meaningful names that you can read, and could have figured out how to construct on your own, for things like error messages, debuggers, and reflective code. So, the work that Jukka is proposing would still be worth doing even if we had perfect inference. I totally agree (and I said this before). Speaking of meaningful names, which name(s) are debuggers supposed to show when there is a multitude of protocols that would fit? > >> Btw. I can tell the same anecdote when switching from C/C++/C#/Java to Python. It was like a liberation---no explicit type declarations anymore. I was baffled and frightened the first week using it. But I love it now and I don't want to give that freedom up. Maybe, that's why I am reluctant to use it in production. > The problem here is that you're coming from C++/C#/Java, which are terrible examples of static typing. Disliking static typing because of Java is like disliking dynamic typing because of Tcl. I won't get into details of why they're so bad, but: if you don't have the time to learn you a Haskell for great good, you can probably at least pick up Boo in an hour or so, to at least see what static typing is like with inference by default and annotations only when needed in a very pythonesque languages, and that will give you half the answer. I came across Haskell quite some time ago and I have to admit it feels not natural but for other reasons than its typing system and inference. From srkunze at mail.de Thu Sep 17 23:45:59 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 17 Sep 2015 23:45:59 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <881AE78F-DCDB-4085-A298-28400E880A50@gmail.com> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <55F9D42C.1080208@mail.de> <881AE78F-DCDB-4085-A298-28400E880A50@gmail.com> Message-ID: <55FB3497.2040103@mail.de> On 17.09.2015 23:19, Ryan Gonzalez wrote: > def process_integer_coordinate_tuples(integer_tuple_1, integer_tuple_2, is_fast): ... > > vs > > def process_coords(t1: Tuple[int, int], t2: Tuple[int, int], fast: bool): ... > > Java's fatal mistake. Care to elaborate? From rymg19 at gmail.com Thu Sep 17 23:56:33 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Thu, 17 Sep 2015 16:56:33 -0500 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55FB3497.2040103@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <55F9D42C.1080208@mail.de> <881AE78F-DCDB-4085-A298-28400E880A50@gmail.com> <55FB3497.2040103@mail.de> Message-ID: Embedding type names in arguments and method names. On Thu, Sep 17, 2015 at 4:45 PM, Sven R. Kunze wrote: > On 17.09.2015 23:19, Ryan Gonzalez wrote: > >> def process_integer_coordinate_tuples(integer_tuple_1, integer_tuple_2, >> is_fast): ... >> >> vs >> >> def process_coords(t1: Tuple[int, int], t2: Tuple[int, int], fast: bool): >> ... >> >> Java's fatal mistake. >> > > Care to elaborate? > You said: > Even good variable names can leave the type ambiguous. These are names that don't leave anything ambiguous! :D Really, though: relying on naming to make types explicit fails badly whenever you start refactoring and makes hell for the users of the API you made. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Fri Sep 18 00:21:46 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 18 Sep 2015 00:21:46 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <55F9D42C.1080208@mail.de> <881AE78F-DCDB-4085-A298-28400E880A50@gmail.com> <55FB3497.2040103@mail.de> Message-ID: <55FB3CFA.40707@mail.de> On 17.09.2015 23:56, Ryan Gonzalez wrote: > Embedding type names in arguments and method names. > > On Thu, Sep 17, 2015 at 4:45 PM, Sven R. Kunze > wrote: > > On 17.09.2015 23:19, Ryan Gonzalez wrote: > > def process_integer_coordinate_tuples(integer_tuple_1, > integer_tuple_2, is_fast): ... > > vs > > def process_coords(t1: Tuple[int, int], t2: Tuple[int, int], > fast: bool): ... > > Java's fatal mistake. > > > Care to elaborate? > > I was actually confused by 'Java' in your reply. > You said: > > > Even good variable names can leave the type ambiguous. > > These are names that don't leave anything ambiguous! :D > They just do. Because they don't tell me why I would want to call that function and with what. If any of these versions is supposed to represent good style, you still need to learn a lot. > Really, though: relying on naming to make types explicit fails badly > whenever you start refactoring and makes hell for the users of the API > you made. Professional refactoring would not change venerable APIs. It would provide another version of it and slowly deprecate the old one. Not sure where you heading here but do you say t1 and t2 are good names? Not sure how big the applications you work with are but those I know of are very large. So, I am glad when 2000 lines and 10 files later a variable somehow tells me something about it. And no "*Tuple[int, int]*" doesn't tell me anything (even when an IDE could tell that). Most of the time when discussing typecheckers and so forth, I get the feeling people think most applications are using data structures like *tuples of tuples of tuples of ints*. That is definitely not the case (anymore). Most of the time the data types are instances, list of instances and dicts of instances. That's one reason I somehow like Jukka's structural proposal because I actually can see some real-world benefit which goes beyond the tuples of tuples and that is: *inferring proper names*. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Sep 18 04:13:39 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 18 Sep 2015 12:13:39 +1000 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <55F9D42C.1080208@mail.de> <881AE78F-DCDB-4085-A298-28400E880A50@gmail.com> <55FB3497.2040103@mail.de> Message-ID: <20150918021339.GM31152@ando.pearwood.info> On Thu, Sep 17, 2015 at 04:56:33PM -0500, Ryan Gonzalez wrote: > Embedding type names in arguments and method names. supposedly being "Java's fatal mistake". I'm not sure that Java developers commonly make a practice of doing that. It would be strange, since Java requires type declarations. I'm not really a Java guy, but I think this would be more like what you would expect: public class Example{ public void processCoords(Point t1, Point t2, boolean fast){ ... } where Point is equivalent to a (int, int) tuple. You seem to be describing a verbose version of "Apps Hungarian Notation". I don't think Hungarian Notation was ever standard practice in the Java world, although I did find at least one tutorial (from 1999) recommending it: http://www.developer.com/java/ent/article.php/615891/Applying-Hungarian-Notation-to-Java-programs-Part-1.htm In any case, I *think* that your intended lesson is that type annotations can increase the quality of code even without a type checker, as they act as type documentation to the reader. I agree with that. -- Steve From steve at pearwood.info Fri Sep 18 05:00:26 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 18 Sep 2015 13:00:26 +1000 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55FB2FA5.6050501@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> <20150917035915.GL31152@ando.pearwood.info> <55FB2FA5.6050501@mail.de> Message-ID: <20150918030024.GN31152@ando.pearwood.info> On Thu, Sep 17, 2015 at 11:24:53PM +0200, Sven R. Kunze wrote: > On 17.09.2015 05:59, Steven D'Aprano wrote: > >People will use the same sort of heuristic for deciding which functions > >get annotated: > > > >- does the function need annotations/documentation/tests? > >- do I have time to write annotations/documentation/tests? > >- is my manager telling me to add annotations/documentation/tests? > >- if I don't, will bad things happen? > >- if it easy or interesting to add them? > >- or difficult and boring? > > I fear I am not convinced of that analogy. > > > Tests and documentation is all or nothing. Either you have them or you > don't and one is not worthier than another. I don't think they are all or nothing. I think it is possible to have incomplete documentation and partial test coverage -- it isn't like you go from "no documentation at all and zero tests" to "fully documented and 100% test coverage" in a single step. Unless you are religiously following something like Test Driven Development, where code is always written to follow a failed test, there will be times where you have to decide between writing new code or improving test coverage. Other choices may include: - improve documentation; - fix bugs; - run a linter and fix the warnings it generates; Adding "fix type errors found by the type checker" doesn't fundamentally change the nature of the work. You are still deciding what your priorities are, according to the needs of the project, your own personal preferences, and the instructions of your project manager (if you have one). > Type annotations (as far as I understand them) are basically completing > a picture of 40%-of-already-inferred types. That's one use-case for them. Another use-case is as documentation: def agm(x:float, y:float)->float: """Return the arithmetic-geometric mean of x and y.""" versus def agm(x, y): """Return the arithmetic-geometric mean of x and y. Args: x (float): A number. y (float): A number. Returns: float: The agm of the two numbers. """ > So, I have difficulties to > infer which parameters actually would benefit from annotating. The simplest process may be something like this: - run the type-checker in a mode where it warns about variables with unknown types; - add just enough annotations so that the warnings go away. This is, in part, a matter of the quality of your tools. A good type checker should be able to tell you where it can, or can't, infer a type. > I am > either doing redundant work (because the typechecker is already very > well aware of the type) or I actually insert explicit knowledge (which > might become redundant in case typecheckers actually become better). You make it sound like, alone out of everything else in Python programming, once a type annotation is added to a function it is carved in stone forever, never to be removed or changed :-) If you add redundant type annotations, no harm is done. For example: def spam(n=3): return "spam"*n A decent type-checker should be able to infer that n is an int. What if you add a type annotation? def spam(n:int=3): return "spam"*n Is that really such a big problem that you need to worry about this? I don't think so. The choice whether to rigorously stamp out all redundant type annotations, or leave them in, is a decision for your project. There is no universal right or wrong answer. -- Steve From stephen at xemacs.org Fri Sep 18 10:15:11 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 18 Sep 2015 17:15:11 +0900 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55FB2D0C.2020008@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> <87wpvpx3d6.fsf@uwakimon.sk.tsukuba.ac.jp> <55FB2D0C.2020008@mail.de> Message-ID: <87fv2cw3cg.fsf@uwakimon.sk.tsukuba.ac.jp> Sven R. Kunze writes: > On 17.09.2015 03:04, Stephen J. Turnbull wrote: > > Sven R. Kunze writes: > > > At least for me, it's not obvious anymore when to annotate > > > and when not to annote. Just a random guess depending on the > > > moon phase? :( > > > > No. There's a simple rule: if it's obvious to you that type > > annotation is useful, do it. If it's not obvious you want it, you > > don't, and you don't do it. You obviously are unlikely to do it for > > some time, if ever. Me too. > > I was talking about specific examples (functions and methods). You were > talking about the concept as a whole if I am not completely mistaken. Nope. I was talking about each time you write a function. From gokoproject at gmail.com Fri Sep 18 16:44:27 2015 From: gokoproject at gmail.com (John Wong) Date: Fri, 18 Sep 2015 10:44:27 -0400 Subject: [Python-ideas] Bring line continuation to multi-level dictionary lookup In-Reply-To: References: Message-ID: On Thu, Sep 17, 2015 at 2:06 AM, Andrew Barnert wrote: > On Sep 16, 2015, at 21:07, John Wong wrote: > > So here I am, thinking, what if we can do this? > > response( > ["DescribeDBSnapshotsResponse"] > ["DescribeDBSnapshotsResult"] > ) > > > This already has a perfectly valid meaning: you have a list of one string, > you're indexing it with another string, and passing the result to a > function. If this isn't obvious, try this example: > > frobulate(['a', 'e', 'i', 'o', 'u'][vowel]) > > So, giving it a second meaning would be ambiguous. > Great catch... I did not even consider this. You are right. > Also, there's already a perfectly good way to write what you want. > (Actually two, because square brackets continue the exact same way parens > do, but I wouldn't recommend that here.) > > (response > ["DescribeDBSnapshotsResponse"] > ["DescribeDBSnapshotsResult"] > ) > Thank you all. I think I should have noticed (response[..]) would work... > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine at python.org Fri Sep 18 17:50:42 2015 From: antoine at python.org (Antoine Pitrou) Date: Fri, 18 Sep 2015 15:50:42 +0000 (UTC) Subject: [Python-ideas] PEP 504: Using the system RNG by default References: <87pp1jxiwk.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Nick Coghlan writes: > > On 17 September 2015 at 04:55, Tim Peters wrote: > > [Brett Cannon ] > >> And if yes to a PEP, who's writing it? And then who is writing the > >> implementation in the end? > > > > Did you just volunteer? Great! Thanks OK, Steven already > > volunteered to write a PEP for his proposal. > > As far as implementation goes, based on a separate discussion at > https://github.com/pyca/cryptography/issues/2347, I believe the > essential cases can all be covered by: > > def random_bits(bits): > return os.urandom(bits//8) > > def random_int(bits): > return int.from_bytes(random_bits(bits), byteorder="big") > > def random_token(bits): > return base64.urlsafe_b64encode(random_bits(bits)).decode("ascii") > > def random_hex_digits(bits): > return binascii.hexlify(random_bits(bits)).decode("ascii") I think you want a little bit more flexibility than that, because the allowed characters may depend on the specific protocol (of course, people can use the hex digits version, but the output is longer). (quite a good idea, that "secrets" library - I wonder why nobody proposed it before ;-)) Regards Antoine. From srkunze at mail.de Fri Sep 18 19:35:17 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 18 Sep 2015 19:35:17 +0200 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <20150918030024.GN31152@ando.pearwood.info> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> <20150917035915.GL31152@ando.pearwood.info> <55FB2FA5.6050501@mail.de> <20150918030024.GN31152@ando.pearwood.info> Message-ID: <55FC4B55.1050908@mail.de> On 18.09.2015 05:00, Steven D'Aprano wrote: > I don't think they are all or nothing. I think it is possible to have > incomplete documentation and partial test coverage -- it isn't like you > go from "no documentation at all and zero tests" to "fully documented > and 100% test coverage" in a single step. This was a misunderstanding. The "all or nothing" wasn't about "test everything or don't do it at all". It was about the robustness of future benefits you gain from it. Either you have a test or you don't. With type annotations you have 40% or 60% *depending* on the quality of the tool you use. It's fuzzy. I don't like to build stuff on jello. Just my personal feeling here. > That's one use-case for them. Another use-case is as documentation: > > def agm(x:float, y:float)->float: > """Return the arithmetic-geometric mean of x and y.""" > > versus > > def agm(x, y): > """Return the arithmetic-geometric mean of x and y. > > Args: > x (float): A number. > y (float): A number. > > Returns: > float: The agm of the two numbers. > """ The type annotation explains nothing. The short doc-string "arithmetic-geometric mean" explains everything (or prepare you to google it). So, I would prefer this one: def agm(x, y): """Return the arithmetic-geometric mean of x and y.""" >> So, I have difficulties to >> infer which parameters actually would benefit from annotating. > The simplest process may be something like this: > > - run the type-checker in a mode where it warns about variables > with unknown types; > - add just enough annotations so that the warnings go away. > > This is, in part, a matter of the quality of your tools. A good type > checker should be able to tell you where it can, or can't, infer a type. You see? Depending on who runs which tools, type annotations need to be added which are redundant for one tool and not for another and vice versa. (Yes, we allow that because we grant the liberty to our devs to use the tools they perform best with.) Coverage, on the other hand, is strict. Either you traverse that line of code or you don't (assuming no bugs in the coverage tools). >> I am >> either doing redundant work (because the typechecker is already very >> well aware of the type) or I actually insert explicit knowledge (which >> might become redundant in case typecheckers actually become better). > You make it sound like, alone out of everything else in Python > programming, once a type annotation is added to a function it is carved > in stone forever, never to be removed or changed :-) Let me reformulate my point: it's not about setting things in stone. It's about having more to read/process mentally. You might think, 'nah, he's exaggerating; it's just one tiny little ": int" more here and there', but these things build up slowly over time, due to missing clear guidelines (see the fuzziness I described above). Devs will simply add them just everywhere just to make sure OR ignore the whole concept completely. It's simply not good enough. :( Nevertheless, I like the protocol idea more as it introduces actual names to be exposed by IDEs without any work from the devs. That's great! You might further think, 'you're so lazy, Sven. First, you don't want to help the type checker but you still want to use it?' Yes, I am lazy! And I already benefit from it when using PyCharm. It might not be perfect but it still amazes me again and again what it can infer without any type annotations present. > def spam(n=3): > return "spam"*n > > A decent type-checker should be able to infer that n is an int. What if > you add a type annotation? > > def spam(n:int=3): > return "spam"*n It's nothing seriously wrong with it (except what I described above). However, these examples (this one in particular) are/should not be real-world code. The function name is not helpful, the parameter name is not helpful, the functionality is a toy. My observation so far: 1) Type checking illustrates its point well when using academic examples, such as the tuples-of-tuples-of-tuples-of-ints I described somewhere else on this thread or unreasonably short toy examples. (This might be domain specific; I can witness it for business applications and web applications none of which actually need to solve hard problems admittedly.) 2) Just using constant and sane types like a class, lists of single-class instances and dicts of single-class instances for a single variable enables you to assign a proper name to it and forces you to design a reasonable architecture of your functionality by keeping the level of nesting at 0 or 1 and split out pieces into separate code blocks. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From mehaase at gmail.com Fri Sep 18 19:42:59 2015 From: mehaase at gmail.com (Mark Haase) Date: Fri, 18 Sep 2015 10:42:59 -0700 (PDT) Subject: [Python-ideas] Null coalescing operators Message-ID: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> StackOverflow has many questions on the topic of null coalescing operators in Python, but I can't find any discussions of them on this list or in any of the PEPs. Has the addition of null coalescing operators into Python ever been discussed publicly? Python has an "or" operator that can be used to coalesce false-y values, but it does not have an operator to coalesce "None" exclusively. C# has nice operators for handling null: "??" (null coalesce), "?." (null-aware member access), and "?[]" (null-aware index access). They are concise and easy to reason about. I think these would be a great addition to Python. As a motivating example: when writing web services, I often want to change the representation of a non-None value but also need to handle None gracefully. I write code like this frequently: response = json.dumps({ 'created': created.isoformat() if created is not None else None, 'updated': updated.isoformat() if updated is not None else None, ... }) With a null-aware member access operator, I could write this instead: response = json.dumps({ 'created': created?.isoformat(), 'updated': updated?.isoformat(), ... }) I can implement this behavior myself in pure Python, but it would be (a) nice to have it the in the standard library, and (b) even nicer to have an operator in the language, since terseness is the goal. I assume that this has never been brought up in the past because it's so heinously un-Pythonic that you'd have to be a fool to risk the public mockery and shunning associated with asking this question. Well, I guess I'm that fool: flame away... Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From trent at snakebite.org Fri Sep 18 20:21:39 2015 From: trent at snakebite.org (Trent Nelson) Date: Fri, 18 Sep 2015 14:21:39 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> Message-ID: <20150918182138.GA64237@trent.me> On Fri, Sep 18, 2015 at 10:42:59AM -0700, Mark Haase wrote: > StackOverflow has many questions > on the > topic of null coalescing operators in Python, but I can't find any > discussions of them on this list or in any of the PEPs. Has the > addition of null coalescing operators into Python ever been discussed > publicly? > > Python has an "or" operator that can be used to coalesce false-y > values, but it does not have an operator to coalesce "None" > exclusively. Hmmm, I use this NullObject class when I want to do stuff similar to what you've described: class NullObject(object): """ This is a helper class that does its best to pretend to be forgivingly null-like. >>> n = NullObject() >>> n None >>> n.foo None >>> n.foo.bar.moo None >>> n.foo().bar.moo(True).cat().hello(False, abc=123) None >>> n.hornet(afterburner=True).shotdown(by=n().tomcat) None >>> n or 1 1 >>> str(n) '' >>> int(n) 0 >>> len(n) 0 """ def __getattr__(self, name): return self def __getitem__(self, item): return self def __call__(self, *args, **kwds): return self def __nonzero__(self): return False def __repr__(self): return repr(None) def __str__(self): return '' def __int__(self): return 0 def __len__(self): return 0 Source: https://github.com/tpn/tpn/blob/master/lib/tpn/util.py#L1031 Sample use: https://github.com/enversion/enversion/blob/master/lib/evn/change.py#L1300 class ChangeSet(AbstractChangeSet): @property def top(self): """ Iff one child change is present, return it. Otherwise, return an instance of a NullObject. """ if self.child_count != 1: return NullObject() else: top = None for child in self: top = child break return top @property def is_tag_create(self): return self.top.is_tag_create @property def is_tag_remove(self): return self.top.is_tag_remove @property def is_branch_create(self): return self.top.is_branch_create @property def is_branch_remove(self): return self.top.is_branch_remove Having self.top potentially return a NullObject simplifies the code for the four following properties. > I can implement this behavior myself in pure Python, but it would be > (a) nice to have it the in the standard library, and (b) even nicer to > have an operator in the language, since terseness is the goal. > > As a motivating example: when writing web services, I often want to > change the representation of a non-None value but also need to handle > None gracefully. I write code like this frequently: > > response = json.dumps({ 'created': created.isoformat() if created > is not None else None, 'updated': updated.isoformat() if updated > is not None else None, ... }) > > With a null-aware member access operator, I could write this instead: > > response = json.dumps({ 'created': created?.isoformat(), > 'updated': updated?.isoformat(), ... }) If you can alter the part that creates `created` or `updated` to return a NullObject() instead of None when applicable, you could call `created.isoformat()` with out the addition clause. > Thanks, Mark Trent. From abarnert at yahoo.com Fri Sep 18 20:57:24 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Sep 2015 11:57:24 -0700 Subject: [Python-ideas] Structural type checking for PEP 484 In-Reply-To: <55FC4B55.1050908@mail.de> References: <55F0AC83.3050505@mail.de> <55F1B306.5070705@mail.de> <05084F79-C505-4A27-9F08-DA98D4B19963@yahoo.com> <55F9D7B9.8060109@mail.de> <20150917035915.GL31152@ando.pearwood.info> <55FB2FA5.6050501@mail.de> <20150918030024.GN31152@ando.pearwood.info> <55FC4B55.1050908@mail.de> Message-ID: On Sep 18, 2015, at 10:35, Sven R. Kunze wrote: > >> On 18.09.2015 05:00, Steven D'Aprano wrote: >> I don't think they are all or nothing. I think it is possible to have >> incomplete documentation and partial test coverage -- it isn't like you >> go from "no documentation at all and zero tests" to "fully documented >> and 100% test coverage" in a single step. > > This was a misunderstanding. The "all or nothing" wasn't about "test everything or don't do it at all". It was about the robustness of future benefits you gain from it. Either you have a test or you don't. > > With type annotations you have 40% or 60% depending on the quality of the tool you use. It's fuzzy. I don't like to build stuff on jello. Just my personal feeling here. Surely gaining 40% or gaining 60% is better than gaining 0%? At any rate, if you're really concerned with this, there is research you might be interested in. The first static typer that I'm aware of that used a "fallback to any" rule like MyPy was for an ML language, and it used unsafety marking: any time it falls back to any, it marks the code unsafe, and that propagates in the obvious way. At the end of the typer run, it can tell you which parts of your program are type safe and which aren't. (It can also refactor the type safe parts into separate modules, which are then reusable in other programs, with well-defined type-safe APIs.) This sounds really nifty, and is fun to play with, but I don't think people found it useful in practice. (This is not the same as the explicit Unsafe type found in most SML descendants, where it's used explicitly, to mark FFIs and access to interval structures, which definitely is useful--although of course it's not completely unrelated.) I think someone could pretty easily write something similar around PEP 484, and then display the results in a way similar to a code coverage map. If people found it useful, that would become a quality of implementation issue for static typers, IDEs, etc. to compete on, and might be worth adding as a required feature to some future update to the standard; if not, it would just be a checklist on some typer's feature list that would eventually stop being worth maintaining. Would that solve your "40% problem" to your satisfaction? >> That's one use-case for them. Another use-case is as documentation: >> >> def agm(x:float, y:float)->float: >> """Return the arithmetic-geometric mean of x and y.""" >> >> versus >> >> def agm(x, y): >> """Return the arithmetic-geometric mean of x and y. >> >> Args: >> x (float): A number. >> y (float): A number. >> >> Returns: >> float: The agm of the two numbers. >> """ > > The type annotation explains nothing. The short doc-string "arithmetic-geometric mean" explains everything (or prepare you to google it). So, I would prefer this one: > def agm(x, y): > """Return the arithmetic-geometric mean of x and y."" What happens if I call your version with complex numbers? High-precision Decimal objects? NumPy arrays of floats? I know that Steven wasn't expecting any of those, and will probably do the wrong thing (including silently doing something bad like silently throwing away Decimal precision or improperly extending to the complex plane). With yours, I don't know that. I may not even notice that there's a problem and just call it and get a bug months later. Even if I do notice the question, I have to read through your implementation and/or your test suite to find out if you'd considered the case, or write my own tests to find out empirically. And that's exactly what I meant earlier by annotations sometimes being useful for human readers whether or not they're useful to the checker. >>> So, I have difficulties to >>> infer which parameters actually would benefit from annotating. >> The simplest process may be something like this: >> >> - run the type-checker in a mode where it warns about variables >> with unknown types; >> - add just enough annotations so that the warnings go away. >> >> This is, in part, a matter of the quality of your tools. A good type >> checker should be able to tell you where it can, or can't, infer a type. > > You see? Depending on who runs which tools, type annotations need to be added which are redundant for one tool and not for another and vice versa. (Yes, we allow that because we grant the liberty to our devs to use the tools they perform best with.) The only way to avoid that is to define the type system completely and then define the inference engine as part of the language spec. The static type system is inherently an approximation of the much more powerful partly-implicit dynamic type system; not allowing it to act as an approximation would mean severely weakening Python's dynamic type system, which would mean severely weakening what you can write in Python. That's a terrible idea. Something like PEP 484 and an ecosystem of competing checkers is the only possibly useful thing that could be added to Python. If you disagree, nothing that could be feasibly added to Python will ever be useful to you, so you should resign yourself to never using static type checking (which you're allowed to do, of course). > Coverage, on the other hand, is strict. Either you traverse that line of code or you don't (assuming no bugs in the coverage tools). > >>> I am >>> either doing redundant work (because the typechecker is already very >>> well aware of the type) or I actually insert explicit knowledge (which >>> might become redundant in case typecheckers actually become better). >> You make it sound like, alone out of everything else in Python >> programming, once a type annotation is added to a function it is carved >> in stone forever, never to be removed or changed :-) > > Let me reformulate my point: it's not about setting things in stone. It's about having more to read/process mentally. You might think, 'nah, he's exaggerating; it's just one tiny little ": int" more here and there', but these things build up slowly over time, due to missing clear guidelines (see the fuzziness I described above). Devs will simply add them just everywhere just to make sure OR ignore the whole concept completely. > > It's simply not good enough. :( > > > Nevertheless, I like the protocol idea more as it introduces actual names to be exposed by IDEs without any work from the devs. That's great! > > > You might further think, 'you're so lazy, Sven. First, you don't want to help the type checker but you still want to use it?' Yes, I am lazy! And I already benefit from it when using PyCharm. It might not be perfect but it still amazes me again and again what it can infer without any type annotations present. > >> def spam(n=3): >> return "spam"*n >> >> A decent type-checker should be able to infer that n is an int. What if >> you add a type annotation? >> >> def spam(n:int=3): >> return "spam"*n > > It's nothing seriously wrong with it (except what I described above). However, these examples (this one in particular) are/should not be real-world code. The function name is not helpful, the parameter name is not helpful, the functionality is a toy. > > My observation so far: > > 1) Type checking illustrates its point well when using academic examples, such as the tuples-of-tuples-of-tuples-of-ints I described somewhere else on this thread or unreasonably short toy examples. > > (This might be domain specific; I can witness it for business applications and web applications none of which actually need to solve hard problems admittedly.) > > 2) Just using constant and sane types like a class, lists of single-class instances and dicts of single-class instances for a single variable enables you to assign a proper name to it and forces you to design a reasonable architecture of your functionality by keeping the level of nesting at 0 or 1 and split out pieces into separate code blocks. What you're essentially arguing is that if nobody ever used dynamic types (e.g., types with __getattr__, types constructed at runtime by PyObjC or similar bridges, etc.), or dynamically-typed values (like the result of json.loads), or static types that are hard to express manually (like ADTs or dependent types), we could easily build a static type checker that worked near-perfectly, and then we could define exactly where you do and don't need to annotate types. That's true, but it effectively means restricting yourself to the Java type system. Which sucks. There are many things that are easy to write readably in Python (or in Haskell) that require ugliness in Java simply because its type system is too weak. Restricting Python (or even idiomatic Python) to the things that could Java-typed would seriously weaken the language, to the point where I'd rather go find a language that got duck typing right than stick with it. You could argue that Swift actually does a pretty good job of making 90% of your code just work and making it as non-ugly as possible to force the rest of the 10% through escapes in the type system (at least for many kinds of programs). But this actually required a more complicated type system than the one you're suggesting--and, more importantly, it involved explicitly designing the language and the stdlib around that goal. Even the first few public betas didn't work for real programs without a lot of ugliness, requiring drastic changes to the language and stdlib to make it usable. Imagine how much would have to change about a language that was designed for duck typing and grew organically over two and a half decades. Also, there are many corners of Swift that have inconsistently ad-hoc rules that make it much harder to fit the entire language into your brain than Python, despite the language being about the same size. A language that you developed out of performing a similar process on Python might be a good language, maybe even better than Swift, but it would be not be Python, and would not be useful for the same kinds of projects where a language-agnostic programmer would choose Python over other alternatives. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 18 21:28:05 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Sep 2015 12:28:05 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150918182138.GA64237@trent.me> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150918182138.GA64237@trent.me> Message-ID: On Sep 18, 2015, at 11:21, Trent Nelson wrote: > >> On Fri, Sep 18, 2015 at 10:42:59AM -0700, Mark Haase wrote: >> StackOverflow has many questions >> on the >> topic of null coalescing operators in Python, but I can't find any >> discussions of them on this list or in any of the PEPs. Has the >> addition of null coalescing operators into Python ever been discussed >> publicly? I believe it was raised as a side issue during other discussions (conditional expressions, exception-handling expressions, one of the pattern-matching discussions), but I personally can't remember anyone ever writing a serious proposal. I think Armin from PyPy also has a blog post mentioning the idea somewhere, as a spinoff of his arguments against PEP 484 (which turned into a more general "what's wrong with Python's type system and what could be done to fix it). One last place to look, although it'll be harder to search for, is every time people discuss whether things like dict.get are a wart on the language (because there should be a fully general way to do the equivalent) or a feature (because it's actually only useful in a handful of cases, and it's better to mark them explicitly than to try to generalize). But my guess is that the discussion hasn't actually been had in sufficient depth to avoid having it here. (Although even if I'm right, that doesn't mean more searching isn't worth doing--to find arguments and counter arguments you may have missed, draw parallels to successes and failures in other languages, etc.) And, even if Guido hates the idea out of hand, or someone comes up with a slam-dunk argument against it, this could turn into one of those cases where it's worth someone gathering all the info and shepherding the discussion just to write a PEP for Guido to reject explicitly. Personally, for whatever my opinion is worth (not that much), I don't have a good opinion on how it would work in Python without seeing lots of serious examples or trying it out. But I think this would be relatively easy to hack in at the tokenizer level with a quick&dirty import hook. I'll attempt it some time this weekend, in hopes that people can play with the feature. Also, it might be possible to do it less hackily with MacroPy (or it might already be part of MacroPy--often Haoyi's time machine is as good as Guido's). >> Python has an "or" operator that can be used to coalesce false-y >> values, but it does not have an operator to coalesce "None" >> exclusively. > > Hmmm, I use this NullObject class when I want to do stuff similar to what > you've described: This is a very Smalltalk-y solution, which isn't a bad thing. I think having a singleton instance of NullObject (like None is a singleton instance of NoneType) so you can use is-tests, etc. might make it better, but that's arguable. The biggest problem is that you have to write (or wrap) every API to return NullObjects instead of None, and likewise to take NullObjects. (And, if you use a PEP 484 checker, it won't understand that an optional int can hold a NullObject.) Also, there's no way for NullObject to ensure that spam(NullObject) returns NullObject for any function spam (or, more realistically, for any function except special cases, where it's hard to define what counts as a special case but easy to understand intuitively). And finally, there's no obvious way to make NullObject raise when you want it to raise. With syntax for nil coalescing, this is easy: ?. returns None for None, while . raises AttributeError. With separate types instead, you're putting the distinction at the point (possibly far away) where the value is produced, rather than the point where it's used. As a side note, my experience in both Smalltalk and C# is that at some point in a large program, I'm going to end up hackily using a distinction between [nil] and nil somewhere because I needed to distinguish between an optional optional spam that "failed" at the top level vs. one that did so at the bottom level. I like the fact that in Haskell or Swift I can actually distinguish "just nil" from "nil" when I need to but usually don't have to (and the code is briefer when I don't have to), but I don't know whether that's actually essential (the [nil]) hack almost always works, and isn't that hard to read if it's used sparsely, which it almost always is). From guido at python.org Fri Sep 18 21:45:24 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Sep 2015 12:45:24 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150918182138.GA64237@trent.me> Message-ID: FWIW, I generally hate odd punctuation like this (@ notwithstanding) but I'm not against the idea itself -- maybe a different syntax can be invented, or maybe I could be persuaded that it's okay. On Fri, Sep 18, 2015 at 12:28 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > On Sep 18, 2015, at 11:21, Trent Nelson wrote: > > > >> On Fri, Sep 18, 2015 at 10:42:59AM -0700, Mark Haase wrote: > >> StackOverflow has many questions > >> on the > >> topic of null coalescing operators in Python, but I can't find any > >> discussions of them on this list or in any of the PEPs. Has the > >> addition of null coalescing operators into Python ever been discussed > >> publicly? > > I believe it was raised as a side issue during other discussions > (conditional expressions, exception-handling expressions, one of the > pattern-matching discussions), but I personally can't remember anyone ever > writing a serious proposal. I think Armin from PyPy also has a blog post > mentioning the idea somewhere, as a spinoff of his arguments against PEP > 484 (which turned into a more general "what's wrong with Python's type > system and what could be done to fix it). One last place to look, although > it'll be harder to search for, is every time people discuss whether things > like dict.get are a wart on the language (because there should be a fully > general way to do the equivalent) or a feature (because it's actually only > useful in a handful of cases, and it's better to mark them explicitly than > to try to generalize). > > But my guess is that the discussion hasn't actually been had in sufficient > depth to avoid having it here. (Although even if I'm right, that doesn't > mean more searching isn't worth doing--to find arguments and counter > arguments you may have missed, draw parallels to successes and failures in > other languages, etc.) And, even if Guido hates the idea out of hand, or > someone comes up with a slam-dunk argument against it, this could turn into > one of those cases where it's worth someone gathering all the info and > shepherding the discussion just to write a PEP for Guido to reject > explicitly. > > Personally, for whatever my opinion is worth (not that much), I don't have > a good opinion on how it would work in Python without seeing lots of > serious examples or trying it out. But I think this would be relatively > easy to hack in at the tokenizer level with a quick&dirty import hook. I'll > attempt it some time this weekend, in hopes that people can play with the > feature. Also, it might be possible to do it less hackily with MacroPy (or > it might already be part of MacroPy--often Haoyi's time machine is as good > as Guido's). > > >> Python has an "or" operator that can be used to coalesce false-y > >> values, but it does not have an operator to coalesce "None" > >> exclusively. > > > > Hmmm, I use this NullObject class when I want to do stuff similar to what > > you've described: > > This is a very Smalltalk-y solution, which isn't a bad thing. I think > having a singleton instance of NullObject (like None is a singleton > instance of NoneType) so you can use is-tests, etc. might make it better, > but that's arguable. > > The biggest problem is that you have to write (or wrap) every API to > return NullObjects instead of None, and likewise to take NullObjects. (And, > if you use a PEP 484 checker, it won't understand that an optional int can > hold a NullObject.) > > Also, there's no way for NullObject to ensure that spam(NullObject) > returns NullObject for any function spam (or, more realistically, for any > function except special cases, where it's hard to define what counts as a > special case but easy to understand intuitively). > > And finally, there's no obvious way to make NullObject raise when you want > it to raise. With syntax for nil coalescing, this is easy: ?. returns None > for None, while . raises AttributeError. With separate types instead, > you're putting the distinction at the point (possibly far away) where the > value is produced, rather than the point where it's used. > > As a side note, my experience in both Smalltalk and C# is that at some > point in a large program, I'm going to end up hackily using a distinction > between [nil] and nil somewhere because I needed to distinguish between an > optional optional spam that "failed" at the top level vs. one that did so > at the bottom level. I like the fact that in Haskell or Swift I can > actually distinguish "just nil" from "nil" when I need to but usually don't > have to (and the code is briefer when I don't have to), but I don't know > whether that's actually essential (the [nil]) hack almost always works, and > isn't that hard to read if it's used sparsely, which it almost always is). > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Sep 18 22:59:24 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Sep 2015 13:59:24 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150918182138.GA64237@trent.me> Message-ID: On Sep 18, 2015, at 12:28, Andrew Barnert via Python-ideas wrote: > > Personally, for whatever my opinion is worth (not that much), I don't have a good opinion on how it would work in Python without seeing lots of serious examples or trying it out. But I think this would be relatively easy to hack in at the tokenizer level with a quick&dirty import hook. I'll attempt it some time this weekend, in hopes that people can play with the feature. Also, it might be possible to do it less hackily with MacroPy (or it might already be part of MacroPy--often Haoyi's time machine is as good as Guido's). You can download a quick&dirty hack at https://github.com/abarnert/nonehack This only handles the simple case of identifier?.attribute; using an arbitrary target on the left side of the . doesn't work, and there are no other none-coalescing forms like ?(...) or ?[...]. (The latter would be easy to add; the former, I don't think so.) But that's enough to handle the examples in the initial email. So, feel free to experiment with it, and show off code that proves the usefulness of the feature. Also, if you can think of a better syntax that will make Guido less sad, but don't know how to implement it as a hack, let me know and I'll try to do it for you. From mehaase at gmail.com Sat Sep 19 00:28:33 2015 From: mehaase at gmail.com (Mark E. Haase) Date: Fri, 18 Sep 2015 18:28:33 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150918182138.GA64237@trent.me> Message-ID: Andrew, thanks for putting together that hack. I will check it out. Guido, you lost me at "hate odd punctuation ... maybe a different syntax can be invented". Do you mean introducing a new keyword or implementing this as a function? Or do you mean that some other punctuation might be less odd? I'm willing to write a PEP, even if it's only purpose is to get shot down. On Fri, Sep 18, 2015 at 3:45 PM, Guido van Rossum wrote: > FWIW, I generally hate odd punctuation like this (@ notwithstanding) but > I'm not against the idea itself -- maybe a different syntax can be > invented, or maybe I could be persuaded that it's okay. > > On Fri, Sep 18, 2015 at 12:28 PM, Andrew Barnert via Python-ideas < > python-ideas at python.org> wrote: > >> On Sep 18, 2015, at 11:21, Trent Nelson wrote: >> > >> >> On Fri, Sep 18, 2015 at 10:42:59AM -0700, Mark Haase wrote: >> >> StackOverflow has many questions >> >> on the >> >> topic of null coalescing operators in Python, but I can't find any >> >> discussions of them on this list or in any of the PEPs. Has the >> >> addition of null coalescing operators into Python ever been discussed >> >> publicly? >> >> I believe it was raised as a side issue during other discussions >> (conditional expressions, exception-handling expressions, one of the >> pattern-matching discussions), but I personally can't remember anyone ever >> writing a serious proposal. I think Armin from PyPy also has a blog post >> mentioning the idea somewhere, as a spinoff of his arguments against PEP >> 484 (which turned into a more general "what's wrong with Python's type >> system and what could be done to fix it). One last place to look, although >> it'll be harder to search for, is every time people discuss whether things >> like dict.get are a wart on the language (because there should be a fully >> general way to do the equivalent) or a feature (because it's actually only >> useful in a handful of cases, and it's better to mark them explicitly than >> to try to generalize). >> >> But my guess is that the discussion hasn't actually been had in >> sufficient depth to avoid having it here. (Although even if I'm right, that >> doesn't mean more searching isn't worth doing--to find arguments and >> counter arguments you may have missed, draw parallels to successes and >> failures in other languages, etc.) And, even if Guido hates the idea out of >> hand, or someone comes up with a slam-dunk argument against it, this could >> turn into one of those cases where it's worth someone gathering all the >> info and shepherding the discussion just to write a PEP for Guido to reject >> explicitly. >> >> Personally, for whatever my opinion is worth (not that much), I don't >> have a good opinion on how it would work in Python without seeing lots of >> serious examples or trying it out. But I think this would be relatively >> easy to hack in at the tokenizer level with a quick&dirty import hook. I'll >> attempt it some time this weekend, in hopes that people can play with the >> feature. Also, it might be possible to do it less hackily with MacroPy (or >> it might already be part of MacroPy--often Haoyi's time machine is as good >> as Guido's). >> >> >> Python has an "or" operator that can be used to coalesce false-y >> >> values, but it does not have an operator to coalesce "None" >> >> exclusively. >> > >> > Hmmm, I use this NullObject class when I want to do stuff similar to >> what >> > you've described: >> >> This is a very Smalltalk-y solution, which isn't a bad thing. I think >> having a singleton instance of NullObject (like None is a singleton >> instance of NoneType) so you can use is-tests, etc. might make it better, >> but that's arguable. >> >> The biggest problem is that you have to write (or wrap) every API to >> return NullObjects instead of None, and likewise to take NullObjects. (And, >> if you use a PEP 484 checker, it won't understand that an optional int can >> hold a NullObject.) >> >> Also, there's no way for NullObject to ensure that spam(NullObject) >> returns NullObject for any function spam (or, more realistically, for any >> function except special cases, where it's hard to define what counts as a >> special case but easy to understand intuitively). >> >> And finally, there's no obvious way to make NullObject raise when you >> want it to raise. With syntax for nil coalescing, this is easy: ?. returns >> None for None, while . raises AttributeError. With separate types instead, >> you're putting the distinction at the point (possibly far away) where the >> value is produced, rather than the point where it's used. >> >> As a side note, my experience in both Smalltalk and C# is that at some >> point in a large program, I'm going to end up hackily using a distinction >> between [nil] and nil somewhere because I needed to distinguish between an >> optional optional spam that "failed" at the top level vs. one that did so >> at the bottom level. I like the fact that in Haskell or Swift I can >> actually distinguish "just nil" from "nil" when I need to but usually don't >> have to (and the code is briefer when I don't have to), but I don't know >> whether that's actually essential (the [nil]) hack almost always works, and >> isn't that hard to read if it's used sparsely, which it almost always is). >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Mark E. Haase 202-815-0201 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Sep 19 00:37:17 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 19 Sep 2015 08:37:17 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> Message-ID: On Sat, Sep 19, 2015 at 3:42 AM, Mark Haase wrote: > StackOverflow has many questions on the topic of null coalescing operators > in Python, but I can't find any discussions of them on this list or in any > of the PEPs. Has the addition of null coalescing operators into Python ever > been discussed publicly? > > Python has an "or" operator that can be used to coalesce false-y values, but > it does not have an operator to coalesce "None" exclusively. Python generally doesn't special-case None, so having a bit of magic that works only on that one object seems a little odd. For comparison purposes, Pike has something very similar to what you're describing, but Pike *does* treat the integer 0 as special, so it makes good sense there. Pike code that wants to return "a thing or NULL" will return an object or the integer 0, where Python code will usually return an object or None. I can't think of any situation in Python where the language itself gives special support to None, other than it being a keyword. You're breaking new ground. But in my opinion, the practicality is worth it. The use of None to represent the SQL NULL value [1], the absence of useful return value, or other "non-values", is pretty standard. I would define the operator pretty much the way you did above, with one exception. You say: created?.isoformat() # is equivalent to created.isoformat() if created is not None else None but this means there needs to be some magic, because it should be equally possible to write: created?.year # equivalent to created.year if created is not None else None which means that sometimes it has to return None, and sometimes (lambda *a,**ka: None). Three possible solutions: 1) Make None callable. None.__call__(*a, **ka) always returns None. 2) Special-case the immediate call in the syntax, so the equivalencies are a bit different. 3) Add another case: func?(args) evaluates func, and if it's None, evaluates to None without calling anything. Option 1 would potentially mask bugs in a lot of unrelated code. I don't think it's a good idea, but maybe others disagree. Option 2 adds a grammatical distinction that currently doesn't exist. When you see a nullable attribute lookup, you have to check to see if it's a method call, and if it is, do things differently. That means there's a difference between these: func = obj?.attr; func() obj?.attr() Option 3 requires a bit more protection, but is completely explicit. It would also have use in other situations. Personally, I support that option; it maintains all the identities, is explicit that calling None will yield None, and doesn't need any magic special cases. It does add another marker, though: created?.isoformat?() # is equivalent to created.isoformat() if created is not None and created.isoformat is not None else None As to the syntax... IMO this needs to be compact, so ?. has my support. With subscripting, should it be "obj?[idx]" or "obj[?idx]" ? FWIW Pike uses the latter, but if C# uses the former, there's no one obvious choice. ChrisA [1] Or non-value, depending on context From python at lucidity.plus.com Sat Sep 19 00:56:40 2015 From: python at lucidity.plus.com (Erik) Date: Fri, 18 Sep 2015 23:56:40 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> Message-ID: <55FC96A8.605@lucidity.plus.com> On 18/09/15 23:37, Chris Angelico wrote: > Python generally doesn't special-case None, so having a bit of magic > that works only on that one object seems a little odd. So the answer here is to introduce a "magic" hook that None can make use of (but also other classes). I can't think of an appropriate word, so I'll use "foo" to keep it suitably abstract. If the foo operator uses the magic method "__foo__" to mean "return an object to be used in place of the operand should it be considered ... false? [or some other definition - I'm not sure]" then any class can implement that method to return an appropriate proxy object. If that was a postfix operator which has a high precedence, then: bar = foo? bar.isoformat() and the original syntax suggestion: bar = foo?.isoformat() ... are equivalent. "?." is not a new operator. "?" is. This is essentially a slight refinement of Chris's case 3 - > 3) Add another case: func?(args) evaluates func, and if it's None, > evaluates to None without calling anything. [...] > Option 3 requires a bit more protection, but is completely explicit. > It would also have use in other situations. Personally, I support that > option; it maintains all the identities, is explicit that calling None > will yield None, and doesn't need any magic special cases. It does add > another marker, though: E. From python at mrabarnett.plus.com Sat Sep 19 01:02:42 2015 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 19 Sep 2015 00:02:42 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> Message-ID: <55FC9812.2090503@mrabarnett.plus.com> On 2015-09-18 23:37, Chris Angelico wrote: > On Sat, Sep 19, 2015 at 3:42 AM, Mark Haase wrote: >> StackOverflow has many questions on the topic of null coalescing operators >> in Python, but I can't find any discussions of them on this list or in any >> of the PEPs. Has the addition of null coalescing operators into Python ever >> been discussed publicly? >> >> Python has an "or" operator that can be used to coalesce false-y values, but >> it does not have an operator to coalesce "None" exclusively. > [snip] > > created?.isoformat() # is equivalent to > created.isoformat() if created is not None else None > > but this means there needs to be some magic, because it should be > equally possible to write: > > created?.year # equivalent to > created.year if created is not None else None > > which means that sometimes it has to return None, and sometimes > (lambda *a,**ka: None). Three possible solutions: > > 1) Make None callable. None.__call__(*a, **ka) always returns None. > 2) Special-case the immediate call in the syntax, so the equivalencies > are a bit different. > 3) Add another case: func?(args) evaluates func, and if it's None, > evaluates to None without calling anything. > > Option 1 would potentially mask bugs in a lot of unrelated code. I > don't think it's a good idea, but maybe others disagree. > > Option 2 adds a grammatical distinction that currently doesn't exist. > When you see a nullable attribute lookup, you have to check to see if > it's a method call, and if it is, do things differently. That means > there's a difference between these: > > func = obj?.attr; func() > obj?.attr() > > Option 3 requires a bit more protection, but is completely explicit. > It would also have use in other situations. Personally, I support that > option; it maintains all the identities, is explicit that calling None > will yield None, and doesn't need any magic special cases. It does add > another marker, though: > > created?.isoformat?() # is equivalent to > created.isoformat() if created is not None and created.isoformat is > not None else None > > As to the syntax... IMO this needs to be compact, so ?. has my > support. With subscripting, should it be "obj?[idx]" or "obj[?idx]" ? > FWIW Pike uses the latter, but if C# uses the former, there's no one > obvious choice. > To me, the choice _is_ obvious: "obj?[idx]". After all, that's more in keeping with "obj?.attr" and "func?()". If you had "obj?[idx]", then shouldn't it also be "obj.?attr" and "func(?)"? From python at lucidity.plus.com Sat Sep 19 01:18:38 2015 From: python at lucidity.plus.com (Erik) Date: Sat, 19 Sep 2015 00:18:38 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FC96A8.605@lucidity.plus.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: <55FC9BCE.7020703@lucidity.plus.com> Apologies for the self-reply. I just wanted to clarify a couple of things. On 18/09/15 23:56, Erik wrote: > If the foo operator uses the magic method "__foo__" to mean "return an > object to be used in place of the operand should it be considered ... > false? [or some other definition - I'm not sure]" Not "false", I think. The "foo" operator is meant to mean "I will go on to use the resulting object in any way imaginable and it must cope with that and return a value from any attempts to use it that will generally mean 'no'" (*). > If that was a postfix operator which has a high precedence, then: > > bar = foo? > bar.isoformat() > > and the original syntax suggestion: > > bar = foo?.isoformat() Which is clearly wrong - the first part should be: baz = foo? bar = baz.isoformat() E. (*) Should we call the operator "shrug"? From srkunze at mail.de Sat Sep 19 01:19:23 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 19 Sep 2015 01:19:23 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FC9812.2090503@mrabarnett.plus.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> Message-ID: <55FC9BFB.2080006@mail.de> On 19.09.2015 01:02, MRAB wrote: > To me, the choice _is_ obvious: "obj?[idx]". After all, that's more in > keeping with "obj?.attr" and "func?()". > > If you had "obj?[idx]", then shouldn't it also be "obj.?attr" and > "func(?)"? I agree with that. From srkunze at mail.de Sat Sep 19 01:44:31 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 19 Sep 2015 01:44:31 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FC9BCE.7020703@lucidity.plus.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FC9BCE.7020703@lucidity.plus.com> Message-ID: <55FCA1DF.2030004@mail.de> On 19.09.2015 01:18, Erik wrote: > Apologies for the self-reply. I just wanted to clarify a couple of > things. > > On 18/09/15 23:56, Erik wrote: >> If the foo operator uses the magic method "__foo__" to mean "return an >> object to be used in place of the operand should it be considered ... >> false? [or some other definition - I'm not sure]" > > Not "false", I think. The "foo" operator is meant to mean "I will go > on to use the resulting object in any way imaginable and it must cope > with that and return a value from any attempts to use it that will > generally mean 'no'" (*). > >> If that was a postfix operator which has a high precedence, then: >> >> bar = foo? >> bar.isoformat() >> >> and the original syntax suggestion: >> >> bar = foo?.isoformat() > > Which is clearly wrong - the first part should be: > > baz = foo? > bar = baz.isoformat() > > E. > > (*) Should we call the operator "shrug"? Maybe monad? From rymg19 at gmail.com Sat Sep 19 01:47:31 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 18 Sep 2015 18:47:31 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FCA1DF.2030004@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FC9BCE.7020703@lucidity.plus.com> <55FCA1DF.2030004@mail.de> Message-ID: <550D45BB-1103-4B70-9D1A-E17AEE519DBF@gmail.com> What about "apply"? It's the closest thing to "fmap" I can think of that won't coblnfuse people... On September 18, 2015 6:44:31 PM CDT, "Sven R. Kunze" wrote: > > >On 19.09.2015 01:18, Erik wrote: >> Apologies for the self-reply. I just wanted to clarify a couple of >> things. >> >> On 18/09/15 23:56, Erik wrote: >>> If the foo operator uses the magic method "__foo__" to mean "return >an >>> object to be used in place of the operand should it be considered >... >>> false? [or some other definition - I'm not sure]" >> >> Not "false", I think. The "foo" operator is meant to mean "I will go >> on to use the resulting object in any way imaginable and it must cope > >> with that and return a value from any attempts to use it that will >> generally mean 'no'" (*). >> >>> If that was a postfix operator which has a high precedence, then: >>> >>> bar = foo? >>> bar.isoformat() >>> >>> and the original syntax suggestion: >>> >>> bar = foo?.isoformat() >> >> Which is clearly wrong - the first part should be: >> >> baz = foo? >> bar = baz.isoformat() >> >> E. >> >> (*) Should we call the operator "shrug"? > >Maybe monad? >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Sat Sep 19 01:58:23 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 19 Sep 2015 01:58:23 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <550D45BB-1103-4B70-9D1A-E17AEE519DBF@gmail.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FC9BCE.7020703@lucidity.plus.com> <55FCA1DF.2030004@mail.de> <550D45BB-1103-4B70-9D1A-E17AEE519DBF@gmail.com> Message-ID: <55FCA51F.5060608@mail.de> On 19.09.2015 01:47, Ryan Gonzalez wrote: > What about "apply"? It's the closest thing to "fmap" I can think of > that won't coblnfuse people... Are you sure? I think "maybe" better reflects the purpose of "?". Nevertheless, I would love to see support for the maybe monad in Python. Best, Sven From joejev at gmail.com Sat Sep 19 02:00:40 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Fri, 18 Sep 2015 20:00:40 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FCA51F.5060608@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FC9BCE.7020703@lucidity.plus.com> <55FCA1DF.2030004@mail.de> <550D45BB-1103-4B70-9D1A-E17AEE519DBF@gmail.com> <55FCA51F.5060608@mail.de> Message-ID: Is there a reason that this needs explicit support, it is trivial to implement maybe in pure python. On Fri, Sep 18, 2015 at 7:58 PM, Sven R. Kunze wrote: > On 19.09.2015 01:47, Ryan Gonzalez wrote: > >> What about "apply"? It's the closest thing to "fmap" I can think of that >> won't coblnfuse people... >> > > Are you sure? I think "maybe" better reflects the purpose of "?". > > > Nevertheless, I would love to see support for the maybe monad in Python. > > Best, > Sven > _______________________________________________ > > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Sep 19 02:01:50 2015 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 19 Sep 2015 01:01:50 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FCA1DF.2030004@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FC9BCE.7020703@lucidity.plus.com> <55FCA1DF.2030004@mail.de> Message-ID: <55FCA5EE.5050607@mrabarnett.plus.com> On 2015-09-19 00:44, Sven R. Kunze wrote: > > > On 19.09.2015 01:18, Erik wrote: >> Apologies for the self-reply. I just wanted to clarify a couple of >> things. >> >> On 18/09/15 23:56, Erik wrote: >>> If the foo operator uses the magic method "__foo__" to mean "return an >>> object to be used in place of the operand should it be considered ... >>> false? [or some other definition - I'm not sure]" >> >> Not "false", I think. The "foo" operator is meant to mean "I will go >> on to use the resulting object in any way imaginable and it must cope >> with that and return a value from any attempts to use it that will >> generally mean 'no'" (*). >> >>> If that was a postfix operator which has a high precedence, then: >>> >>> bar = foo? >>> bar.isoformat() >>> >>> and the original syntax suggestion: >>> >>> bar = foo?.isoformat() >> >> Which is clearly wrong - the first part should be: >> >> baz = foo? >> bar = baz.isoformat() >> >> E. >> >> (*) Should we call the operator "shrug"? > > Maybe monad? > Too fancy. How about "ni"? :-) From rymg19 at gmail.com Sat Sep 19 02:07:56 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 18 Sep 2015 19:07:56 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FCA51F.5060608@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FC9BCE.7020703@lucidity.plus.com> <55FCA1DF.2030004@mail.de> <550D45BB-1103-4B70-9D1A-E17AEE519DBF@gmail.com> <55FCA51F.5060608@mail.de> Message-ID: <0D0C3730-4FC7-4EA9-9262-E997644C1677@gmail.com> On September 18, 2015 6:58:23 PM CDT, "Sven R. Kunze" wrote: >On 19.09.2015 01:47, Ryan Gonzalez wrote: >> What about "apply"? It's the closest thing to "fmap" I can think of >> that won't coblnfuse people... > >Are you sure? I think "maybe" better reflects the purpose of "?". > That's better. Or "optional". > >Nevertheless, I would love to see support for the maybe monad in >Python. > >Best, >Sven -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. From 4kir4.1i at gmail.com Sat Sep 19 02:08:29 2015 From: 4kir4.1i at gmail.com (Akira Li) Date: Sat, 19 Sep 2015 03:08:29 +0300 Subject: [Python-ideas] Null coalescing operators References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FC9BCE.7020703@lucidity.plus.com> <55FCA1DF.2030004@mail.de> <550D45BB-1103-4B70-9D1A-E17AEE519DBF@gmail.com> Message-ID: <871tdv46f6.fsf@gmail.com> Ryan Gonzalez writes: >>On 19.09.2015 01:18, Erik wrote: ... >>> >>> baz = foo? >>> bar = baz.isoformat() >>> >>> E. >>> >>> (*) Should we call the operator "shrug"? >> >>Maybe monad? http://stackoverflow.com/questions/8507200/maybe-kind-of-monad-in-python From abarnert at yahoo.com Sat Sep 19 02:49:36 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Sep 2015 17:49:36 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FC96A8.605@lucidity.plus.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: On Sep 18, 2015, at 15:56, Erik wrote: > >> On 18/09/15 23:37, Chris Angelico wrote: >> Python generally doesn't special-case None, so having a bit of magic >> that works only on that one object seems a little odd. > > So the answer here is to introduce a "magic" hook that None can make use of (but also other classes). I can't think of an appropriate word, so I'll use "foo" to keep it suitably abstract. > > If the foo operator uses the magic method "__foo__" to mean "return an object to be used in place of the operand should it be considered ... false? [or some other definition - I'm not sure]" then any class can implement that method to return an appropriate proxy object. > > If that was a postfix operator which has a high precedence, then: > > bar = foo? > bar.isoformat() > > and the original syntax suggestion: > > bar = foo?.isoformat() > > ... are equivalent. "?." is not a new operator. "?" is. This is essentially a slight refinement of Chris's case 3 - I like this (modulo the corrections later in the thread). It's simpler and more flexible than the other options, and also comes closer to resolving the "spam?.eggs" vs. "spam?.cheese()" issue, by requiring "spam?.cheese?()". Obviously "spam?" returns something with a __getattr__ method that just passes through to spam.__getattr__, except that on NoneType it returns something with a __getattr__ that always returns None. That solves the eggs case. Next, "spam?.cheese?" returns something with a __call__ method that just passed through to spam?.cheese.__call__, except that on NoneType it returns something with a __call__ that always returns None. That solves the cheese case. If you make None? return something whose other dunder methods also return None (except for special cases like __repr__), this also gives you "spam ?+ 3". (I'm not sure if that's a good thing or a bad thing...) Of course there's no way to do "spam ?= 3" (but I'm pretty sure that's a good thing). So, do we need a dunder method for the "?" operator? What else would you use it for besides None? From random832 at fastmail.com Sat Sep 19 02:58:50 2015 From: random832 at fastmail.com (Random832) Date: Fri, 18 Sep 2015 20:58:50 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> Message-ID: <1442624330.2155926.387780601.0BDF9956@webmail.messagingengine.com> On Fri, Sep 18, 2015, at 18:37, Chris Angelico wrote: > created?.isoformat?() # is equivalent to > created.isoformat() if created is not None and created.isoformat is > not None else None More or less - it'd only look up the attribute once. > As to the syntax... IMO this needs to be compact, so ?. has my > support. With subscripting, should it be "obj?[idx]" or "obj[?idx]" ? > FWIW Pike uses the latter, but if C# uses the former, there's no one > obvious choice. ?[ has the benefit of being consistent with ?. - and ?(, for that matter. It actually suggests a whole range of null-coalescing operators. ?* for multiply? A lot of these things are done already by the normal operators for statically-typed nullable operands in C#. That could get hairy fast - I just thought of a radical alternative that I'm not even sure if I support: ?(expr) as a lexical context that changes the meaning of all operators. From rosuav at gmail.com Sat Sep 19 03:00:53 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 19 Sep 2015 11:00:53 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: On Sat, Sep 19, 2015 at 10:49 AM, Andrew Barnert wrote: > Obviously "spam?" returns something with a __getattr__ method that just passes through to spam.__getattr__, except that on NoneType it returns something with a __getattr__ that always returns None. That solves the eggs case. > > Next, "spam?.cheese?" returns something with a __call__ method that just passed through to spam?.cheese.__call__, except that on NoneType it returns something with a __call__ that always returns None. That solves the cheese case. > Hang on, how do you do this? How does the operator know the difference between "spam?", which for None has to have __getattr__ return None, and "spam?.cheese?" that returns (lambda: None)? ChrisA From abarnert at yahoo.com Sat Sep 19 03:03:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Sep 2015 18:03:15 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FCA51F.5060608@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FC9BCE.7020703@lucidity.plus.com> <55FCA1DF.2030004@mail.de> <550D45BB-1103-4B70-9D1A-E17AEE519DBF@gmail.com> <55FCA51F.5060608@mail.de> Message-ID: <2D28867B-6048-490A-9E60-4BC680FFF557@yahoo.com> On Sep 18, 2015, at 16:58, Sven R. Kunze wrote: > >> On 19.09.2015 01:47, Ryan Gonzalez wrote: >> What about "apply"? It's the closest thing to "fmap" I can think of that won't coblnfuse people... > > Are you sure? I think "maybe" better reflects the purpose of "?". > > > Nevertheless, I would love to see support for the maybe monad in Python. I think this, and the whole discussion of maybe and fmap, is off the mark here. It's trivial to create a maybe type in Python. What's missing is the two things that make it useful: (1) pattern matching, and (2) a calling syntax and a general focus on HOFs that make fmap natural. Without at least one of those, maybe isn't useful. And adding either of those to Python is a huge proposal, much larger than null coalescing, and a lot less likely to gain support. Also, the monadic style of failure propagation directly competes with the exception-raising style, and they're both contagious. A well-designed language and library can have both side by side if it, e.g., rigorously restricts exceptions to only truly exceptional cases, but the boat for that sailed decades ago in Python. So just having them side by side would lead to the exact same problems as C++ code that mixes exception-based and status-code-based APIs, or JavaScript code that mixes exceptions and errbacks or promise.fail handlers. Personally, whenever I think to myself "I could really use maybe here" in some Python code, that's a sign that I'm not thinking Pythonically, and either need to switch gears in my brain or switch languages. Just like when I start thinking about how I could get rid of that with statement with an RAII class, and maybe an implicit conversion operator.... From abarnert at yahoo.com Sat Sep 19 03:10:17 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Sep 2015 18:10:17 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: On Sep 18, 2015, at 18:00, Chris Angelico wrote: > >> On Sat, Sep 19, 2015 at 10:49 AM, Andrew Barnert wrote: >> Obviously "spam?" returns something with a __getattr__ method that just passes through to spam.__getattr__, except that on NoneType it returns something with a __getattr__ that always returns None. That solves the eggs case. >> >> Next, "spam?.cheese?" returns something with a __call__ method that just passed through to spam?.cheese.__call__, except that on NoneType it returns something with a __call__ that always returns None. That solves the cheese case. > > Hang on, how do you do this? How does the operator know the difference > between "spam?", which for None has to have __getattr__ return None, > and "spam?.cheese?" that returns (lambda: None)? >>> spam None >>> spam? NoneQuestion >>> spam?.cheese None >>> spam?.cheese? NoneQuestion >>> spam?.cheese?() None All you need to make this work is: * "spam?" returns NoneQuestion if spam is None else spam * NoneQuestion.__getattr__(self, *args, **kw) returns None. * NoneQuestion.__call__(self, *args, **kw) returns None. Optionally, you can add more None-returning methods to NoneQuestion. Also, whether NoneQuestion is a singleton, has an accessible name, etc. are all bikesheddable. I think it's obvious what happens is "spam" is not None and "spam.cheese" is, or of both are None, but if not, I can work them through as well. > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From python at mrabarnett.plus.com Sat Sep 19 03:39:22 2015 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 19 Sep 2015 02:39:22 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: <55FCBCCA.5000001@mrabarnett.plus.com> On 2015-09-19 02:10, Andrew Barnert via Python-ideas wrote: > On Sep 18, 2015, at 18:00, Chris Angelico wrote: >> >>> On Sat, Sep 19, 2015 at 10:49 AM, Andrew Barnert wrote: >>> Obviously "spam?" returns something with a __getattr__ method that just passes through to spam.__getattr__, except that on NoneType it returns something with a __getattr__ that always returns None. That solves the eggs case. >>> >>> Next, "spam?.cheese?" returns something with a __call__ method that just passed through to spam?.cheese.__call__, except that on NoneType it returns something with a __call__ that always returns None. That solves the cheese case. >> >> Hang on, how do you do this? How does the operator know the difference >> between "spam?", which for None has to have __getattr__ return None, >> and "spam?.cheese?" that returns (lambda: None)? > >>>> spam > None >>>> spam? > NoneQuestion >>>> spam?.cheese > None >>>> spam?.cheese? > NoneQuestion >>>> spam?.cheese?() > None > > All you need to make this work is: > > * "spam?" returns NoneQuestion if spam is None else spam > * NoneQuestion.__getattr__(self, *args, **kw) returns None. > * NoneQuestion.__call__(self, *args, **kw) returns None. > > Optionally, you can add more None-returning methods to NoneQuestion. Also, whether NoneQuestion is a singleton, has an accessible name, etc. are all bikesheddable. > > I think it's obvious what happens is "spam" is not None and "spam.cheese" is, or of both are None, but if not, I can work them through as well. > I see it as "spam? doing "Maybe(spam)" and then attribute access checking returning None if the wrapped object is None and getting the attribute from it if not. I think that the optimiser could probably avoid the use of Maybe in cases like "spam?.cheese". From python at mrabarnett.plus.com Sat Sep 19 03:52:08 2015 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 19 Sep 2015 02:52:08 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FCBCCA.5000001@mrabarnett.plus.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FCBCCA.5000001@mrabarnett.plus.com> Message-ID: <55FCBFC8.5060305@mrabarnett.plus.com> On 2015-09-19 02:39, MRAB wrote: > On 2015-09-19 02:10, Andrew Barnert via Python-ideas wrote: >> On Sep 18, 2015, at 18:00, Chris Angelico wrote: >>> >>>> On Sat, Sep 19, 2015 at 10:49 AM, Andrew Barnert wrote: >>>> Obviously "spam?" returns something with a __getattr__ method that just passes through to spam.__getattr__, except that on NoneType it returns something with a __getattr__ that always returns None. That solves the eggs case. >>>> >>>> Next, "spam?.cheese?" returns something with a __call__ method that just passed through to spam?.cheese.__call__, except that on NoneType it returns something with a __call__ that always returns None. That solves the cheese case. >>> >>> Hang on, how do you do this? How does the operator know the difference >>> between "spam?", which for None has to have __getattr__ return None, >>> and "spam?.cheese?" that returns (lambda: None)? >> >>>>> spam >> None >>>>> spam? >> NoneQuestion >>>>> spam?.cheese >> None >>>>> spam?.cheese? >> NoneQuestion >>>>> spam?.cheese?() >> None >> >> All you need to make this work is: >> >> * "spam?" returns NoneQuestion if spam is None else spam >> * NoneQuestion.__getattr__(self, *args, **kw) returns None. >> * NoneQuestion.__call__(self, *args, **kw) returns None. >> >> Optionally, you can add more None-returning methods to NoneQuestion. Also, whether NoneQuestion is a singleton, has an accessible name, etc. are all bikesheddable. >> >> I think it's obvious what happens is "spam" is not None and "spam.cheese" is, or of both are None, but if not, I can work them through as well. >> > I see it as "spam? doing "Maybe(spam)" and then attribute access > checking returning None if the wrapped object is None and getting the > attribute from it if not. > > I think that the optimiser could probably avoid the use of Maybe in > cases like "spam?.cheese". > I've thought of another issue: If you write "spam?(sing_lumberjack_song())", won't it still call sing_lumberjack_song even if spam is None? After all, Python evaluates the arguments before looking up the call, so it won't know that "spam" is None until it tries to call "spam?". That isn't a problem with "spam.sing_lumberjack_song() if spam is not None else None" or if it's optimised to that, but "m = spam?; m(sing_lumberjack_song())" is a different matter. perhaps a "Maybe" object should also support "?" so you could write "m = spam?; m?(sing_lumberjack_song())". "Maybe" could be idempotent, so "Maybe(Maybe(x))" returns the same result as "Maybe(x)". From mehaase at gmail.com Sat Sep 19 04:06:30 2015 From: mehaase at gmail.com (Mark E. Haase) Date: Fri, 18 Sep 2015 22:06:30 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: Andrew, I really like that idea. Turning back to the null coalescing operator (spelled ?? in other languages), how do you think that fits in? Consider this syntax: >>> None? or 1 1 This works if NoneQuestion overrides __nonzero__ to return False. >>> 0? or 1 0 This doesn't work, because 0? returns 0, and "0 or 1" is 1. We could try this instead, if NoneQuestion overrides __or__: >>> 0? | 1 0 >>> 0 ?| 1 0 This looks a little ugly, and it would be nice (as MRAB pointed out) if null coalescing short circuited. >>> None? or None? This also doesn't work quite right. If both operands are None, we want the expression to evaluate to None, not NoneQuestion. *Should null coalescing be a separate operator? And if so, are "?" and "??" too similar?* Can anybody think of realistic use cases for overriding a magic method for the "?" operator? I would like to include such use cases in a PEP. One possible use case: being able to coalesce empty strings. >>> s1 = MyString('') >>> s2 = MyString('foobar') >>> s1? or s2 MyString('foobar') On Fri, Sep 18, 2015 at 9:10 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > On Sep 18, 2015, at 18:00, Chris Angelico wrote: > > > >> On Sat, Sep 19, 2015 at 10:49 AM, Andrew Barnert > wrote: > >> Obviously "spam?" returns something with a __getattr__ method that just > passes through to spam.__getattr__, except that on NoneType it returns > something with a __getattr__ that always returns None. That solves the eggs > case. > >> > >> Next, "spam?.cheese?" returns something with a __call__ method that > just passed through to spam?.cheese.__call__, except that on NoneType it > returns something with a __call__ that always returns None. That solves the > cheese case. > > > > Hang on, how do you do this? How does the operator know the difference > > between "spam?", which for None has to have __getattr__ return None, > > and "spam?.cheese?" that returns (lambda: None)? > > >>> spam > None > >>> spam? > NoneQuestion > >>> spam?.cheese > None > >>> spam?.cheese? > NoneQuestion > >>> spam?.cheese?() > None > > All you need to make this work is: > > * "spam?" returns NoneQuestion if spam is None else spam > * NoneQuestion.__getattr__(self, *args, **kw) returns None. > * NoneQuestion.__call__(self, *args, **kw) returns None. > > Optionally, you can add more None-returning methods to NoneQuestion. Also, > whether NoneQuestion is a singleton, has an accessible name, etc. are all > bikesheddable. > > I think it's obvious what happens is "spam" is not None and "spam.cheese" > is, or of both are None, but if not, I can work them through as well. > > > > ChrisA > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Mark E. Haase 202-815-0201 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Sep 19 04:26:11 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 19 Sep 2015 12:26:11 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: On Sat, Sep 19, 2015 at 12:06 PM, Mark E. Haase wrote: > Can anybody think of realistic use cases for overriding a magic method for > the "?" operator? I would like to include such use cases in a PEP. One > possible use case: being able to coalesce empty strings. > >>>> s1 = MyString('') >>>> s2 = MyString('foobar') >>>> s1? or s2 > MyString('foobar') Frankly, I think this is a bad idea. You're potentially coalescing multiple things with the same expression, and we already have a way of spelling that: the "or" operator. If you don't want a generic "if it's false, use this", and don't want a super-specific "if it's None, use this", then how are you going to define what it is? And more importantly, how do you reason about the expression "s1? or s2" without knowing exactly what types coalesce to what? Let's keep the rules simple. Make this a special feature of the None singleton, and all other objects simply return themselves - for the same reason that a class isn't allowed to override the "is" operator. ChrisA From abarnert at yahoo.com Sat Sep 19 05:20:49 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Sep 2015 20:20:49 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FCBFC8.5060305@mrabarnett.plus.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <55FCBCCA.5000001@mrabarnett.plus.com> <55FCBFC8.5060305@mrabarnett.plus.com> Message-ID: <059B044A-173C-488C-9AAD-EF32CF03D6AC@yahoo.com> On Sep 18, 2015, at 18:52, MRAB wrote: > >> On 2015-09-19 02:39, MRAB wrote: >>> On 2015-09-19 02:10, Andrew Barnert via Python-ideas wrote: >>>> On Sep 18, 2015, at 18:00, Chris Angelico wrote: >>>> >>>>> On Sat, Sep 19, 2015 at 10:49 AM, Andrew Barnert wrote: >>>>> Obviously "spam?" returns something with a __getattr__ method that just passes through to spam.__getattr__, except that on NoneType it returns something with a __getattr__ that always returns None. That solves the eggs case. >>>>> >>>>> Next, "spam?.cheese?" returns something with a __call__ method that just passed through to spam?.cheese.__call__, except that on NoneType it returns something with a __call__ that always returns None. That solves the cheese case. >>>> >>>> Hang on, how do you do this? How does the operator know the difference >>>> between "spam?", which for None has to have __getattr__ return None, >>>> and "spam?.cheese?" that returns (lambda: None)? >>> >>>>>> spam >>> None >>>>>> spam? >>> NoneQuestion >>>>>> spam?.cheese >>> None >>>>>> spam?.cheese? >>> NoneQuestion >>>>>> spam?.cheese?() >>> None >>> >>> All you need to make this work is: >>> >>> * "spam?" returns NoneQuestion if spam is None else spam >>> * NoneQuestion.__getattr__(self, *args, **kw) returns None. >>> * NoneQuestion.__call__(self, *args, **kw) returns None. >>> >>> Optionally, you can add more None-returning methods to NoneQuestion. Also, whether NoneQuestion is a singleton, has an accessible name, etc. are all bikesheddable. >>> >>> I think it's obvious what happens is "spam" is not None and "spam.cheese" is, or of both are None, but if not, I can work them through as well. >> I see it as "spam? doing "Maybe(spam)" and then attribute access >> checking returning None if the wrapped object is None and getting the >> attribute from it if not. >> >> I think that the optimiser could probably avoid the use of Maybe in >> cases like "spam?.cheese". > I've thought of another issue: > > If you write "spam?(sing_lumberjack_song())", won't it still call > sing_lumberjack_song even if spam is None? You're right; I didn't think about that. But I don't think that's a problem. I believe C#, Swift, etc. all evaluate the arguments in their equivalent. And languages like ObjC that do automatic nil coalescing for all method calls definitely evaluate them. If you really want to switch on spam and not call sing_lumberjack_song, you can always do that manually, right? > perhaps a "Maybe" object should also support "?" so you could write "m > = spam?; m?(sing_lumberjack_song())". "Maybe" could be idempotent, so > "Maybe(Maybe(x))" returns the same result as "Maybe(x)". That actually makes sense just for its own reasons. Actually, now that I think about it, the way I defined it above already gives you think: if spam? is spam if it's anything but None, then spam?? is always spam?, right? > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Sat Sep 19 05:30:48 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 18 Sep 2015 20:30:48 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: <8EA8F9B6-5A00-43C6-A4AA-12DDB69125C6@yahoo.com> On Sep 18, 2015, at 19:06, Mark E. Haase wrote: > > Andrew, I really like that idea. Turning back to the null coalescing operator (spelled ?? in other languages), how do you think that fits in? > > Consider this syntax: > > >>> None? or 1 I don't think there's any easy way to make "spam? or 1" work any better than "spam or 1" already does, partly for the reasons you give below, but also because it doesn't seem to fit the design in any obvious way. I guess that means postix ? doesn't quite magically solve everything... > This also doesn't work quite right. If both operands are None, we want the expression to evaluate to None, not NoneQuestion. Should null coalescing be a separate operator? And if so, are "?" and "??" too similar? As MRAB pointed out, there seem to be good reasons to let spam?? mean the same thing as spam? (and that follows automatically from the simplest possible definition, the one I gave above). So I think "spam ?? eggs" is ambiguous between the postfix operator and the infix operator without lookahead, at least to a human, and possibly to the compiler as well. I suppose ?: as in ColdFusion might work, but (a) ewwww, (b) it regularly confuses novices to CF, and (c) it's impossible to search for, because ?: no matter how you quote it gets you the C ternary operator.... > Can anybody think of realistic use cases for overriding a magic method for the "?" operator? I would like to include such use cases in a PEP. One possible use case: being able to coalesce empty strings. > > >>> s1 = MyString('') > >>> s2 = MyString('foobar') > >>> s1? or s2 > MyString('foobar') This seems like a bad idea. Empty strings are already falsey. If you want this behavior, why not just use "s1 or s2", which already works, and for obvious reasons? >> On Fri, Sep 18, 2015 at 9:10 PM, Andrew Barnert via Python-ideas wrote: >> On Sep 18, 2015, at 18:00, Chris Angelico wrote: >> > >> >> On Sat, Sep 19, 2015 at 10:49 AM, Andrew Barnert wrote: >> >> Obviously "spam?" returns something with a __getattr__ method that just passes through to spam.__getattr__, except that on NoneType it returns something with a __getattr__ that always returns None. That solves the eggs case. >> >> >> >> Next, "spam?.cheese?" returns something with a __call__ method that just passed through to spam?.cheese.__call__, except that on NoneType it returns something with a __call__ that always returns None. That solves the cheese case. >> > >> > Hang on, how do you do this? How does the operator know the difference >> > between "spam?", which for None has to have __getattr__ return None, >> > and "spam?.cheese?" that returns (lambda: None)? >> >> >>> spam >> None >> >>> spam? >> NoneQuestion >> >>> spam?.cheese >> None >> >>> spam?.cheese? >> NoneQuestion >> >>> spam?.cheese?() >> None >> >> All you need to make this work is: >> >> * "spam?" returns NoneQuestion if spam is None else spam >> * NoneQuestion.__getattr__(self, *args, **kw) returns None. >> * NoneQuestion.__call__(self, *args, **kw) returns None. >> >> Optionally, you can add more None-returning methods to NoneQuestion. Also, whether NoneQuestion is a singleton, has an accessible name, etc. are all bikesheddable. >> >> I think it's obvious what happens is "spam" is not None and "spam.cheese" is, or of both are None, but if not, I can work them through as well. >> >> >> > ChrisA >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Mark E. Haase > 202-815-0201 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 19 05:41:12 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Sep 2015 13:41:12 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FC9812.2090503@mrabarnett.plus.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> Message-ID: <20150919034112.GQ31152@ando.pearwood.info> On Sat, Sep 19, 2015 at 12:02:42AM +0100, MRAB wrote: > To me, the choice _is_ obvious: "obj?[idx]". After all, that's more in > keeping with "obj?.attr" and "func?()". > > If you had "obj?[idx]", then shouldn't it also be "obj.?attr" and > "func(?)"? No. If I understand the idea, obj?.attr returns None if obj is None, otherwise returns obj.attr. The question mark (shrug operator?) applies to `obj` *before* the attribute lookup, so it should appear *before* the dot (since we read from left-to-right). The heuristic for remembering the order is that the "shrug" (question mark) operator applies to obj, so it is attached to obj, before any subsequent operation. For the sake of brevity, using @ as a placeholder for one of attribute access, item/key lookup, or function call, then we have: obj?@ as syntactic sugar for: None if obj is None else obj@ Furthermore, we should be able to chain a sequence of such @s: paperboy.receive(customer?.trousers.backpocket.wallet.extract(2.99)) being equivalent to: paperboy.receive(None if customer is None else customer.trousers.backpocket.wallet.extract(2.99) ) Let's just assume we have a good reason for chaining lookups that isn't an egregious violation of the Law of Demeter, and not get into a debate over OOP best practices, okay? :-) Suppose that wallet itself may also be None. Then we can easily deal with that situation too: paperboy.receive(customer?.trousers.backpocket.wallet?.extract(2.99)) which I think is a big win over either of these two alternatives: # 1 paperboy.receive(None if customer is None else None if customer.trousers.backpocket.wallet is None else customer.trousers.backpocket.wallet.extract(2.99) ) # 2 if customer is not None: wallet = customer.trousers.backpocket.wallet if wallet is not None: paperboy.receive(wallet.extract(2.99)) It's a funny thing, I'm usually not a huge fan of symbols outside of maths operators, and I strongly dislike the C ? ternary operator, but this one feels really natural to me. I didn't have even the most momentary "if you want Perl, you know where to find it" thought. -- Steve From rymg19 at gmail.com Sat Sep 19 05:43:16 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 18 Sep 2015 22:43:16 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <8EA8F9B6-5A00-43C6-A4AA-12DDB69125C6@yahoo.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <8EA8F9B6-5A00-43C6-A4AA-12DDB69125C6@yahoo.com> Message-ID: <15397B59-288D-478B-8DC1-00F5CB16B30D@gmail.com> This is likely going to get shot down quickly... I know CoffeeScript is not regarded too well in this community (well, at least based on Guido's remarks on parsing it), but what if x? was shorthand for x is None? In CS, it's called the existential operator. On September 18, 2015 10:30:48 PM CDT, Andrew Barnert via Python-ideas wrote: >On Sep 18, 2015, at 19:06, Mark E. Haase wrote: >> >> Andrew, I really like that idea. Turning back to the null coalescing >operator (spelled ?? in other languages), how do you think that fits >in? >> >> Consider this syntax: >> >> >>> None? or 1 > >I don't think there's any easy way to make "spam? or 1" work any better >than "spam or 1" already does, partly for the reasons you give below, >but also because it doesn't seem to fit the design in any obvious way. > >I guess that means postix ? doesn't quite magically solve everything... > >> This also doesn't work quite right. If both operands are None, we >want the expression to evaluate to None, not NoneQuestion. Should null >coalescing be a separate operator? And if so, are "?" and "??" too >similar? > >As MRAB pointed out, there seem to be good reasons to let spam?? mean >the same thing as spam? (and that follows automatically from the >simplest possible definition, the one I gave above). So I think "spam >?? eggs" is ambiguous between the postfix operator and the infix >operator without lookahead, at least to a human, and possibly to the >compiler as well. > >I suppose ?: as in ColdFusion might work, but (a) ewwww, (b) it >regularly confuses novices to CF, and (c) it's impossible to search >for, because ?: no matter how you quote it gets you the C ternary >operator.... > >> Can anybody think of realistic use cases for overriding a magic >method for the "?" operator? I would like to include such use cases in >a PEP. One possible use case: being able to coalesce empty strings. >> >> >>> s1 = MyString('') >> >>> s2 = MyString('foobar') >> >>> s1? or s2 >> MyString('foobar') > >This seems like a bad idea. Empty strings are already falsey. If you >want this behavior, why not just use "s1 or s2", which already works, >and for obvious reasons? > >>> On Fri, Sep 18, 2015 at 9:10 PM, Andrew Barnert via Python-ideas > wrote: >>> On Sep 18, 2015, at 18:00, Chris Angelico wrote: >>> > >>> >> On Sat, Sep 19, 2015 at 10:49 AM, Andrew Barnert > wrote: >>> >> Obviously "spam?" returns something with a __getattr__ method >that just passes through to spam.__getattr__, except that on NoneType >it returns something with a __getattr__ that always returns None. That >solves the eggs case. >>> >> >>> >> Next, "spam?.cheese?" returns something with a __call__ method >that just passed through to spam?.cheese.__call__, except that on >NoneType it returns something with a __call__ that always returns None. >That solves the cheese case. >>> > >>> > Hang on, how do you do this? How does the operator know the >difference >>> > between "spam?", which for None has to have __getattr__ return >None, >>> > and "spam?.cheese?" that returns (lambda: None)? >>> >>> >>> spam >>> None >>> >>> spam? >>> NoneQuestion >>> >>> spam?.cheese >>> None >>> >>> spam?.cheese? >>> NoneQuestion >>> >>> spam?.cheese?() >>> None >>> >>> All you need to make this work is: >>> >>> * "spam?" returns NoneQuestion if spam is None else spam >>> * NoneQuestion.__getattr__(self, *args, **kw) returns None. >>> * NoneQuestion.__call__(self, *args, **kw) returns None. >>> >>> Optionally, you can add more None-returning methods to NoneQuestion. >Also, whether NoneQuestion is a singleton, has an accessible name, etc. >are all bikesheddable. >>> >>> I think it's obvious what happens is "spam" is not None and >"spam.cheese" is, or of both are None, but if not, I can work them >through as well. >>> >>> >>> > ChrisA >>> > _______________________________________________ >>> > Python-ideas mailing list >>> > Python-ideas at python.org >>> > https://mail.python.org/mailman/listinfo/python-ideas >>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> Mark E. Haase >> 202-815-0201 > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Sep 19 06:21:56 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Sep 2015 21:21:56 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150919034112.GQ31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: On Fri, Sep 18, 2015 at 8:41 PM, Steven D'Aprano wrote: > It's a funny thing, I'm usually not a huge fan of symbols outside of > maths operators, and I strongly dislike the C ? ternary operator, but > this one feels really natural to me. I didn't have even the most > momentary "if you want Perl, you know where to find it" thought. > I do, but at least the '?' is part of an operator, not part of the name (as it is in Ruby?). I really, really, really don't like how it looks, but here's one thing: the discussion can be cut short and focus almost entirely on whether this is worth making Python uglier (and whether it's even ugly :-). The semantics are crystal clear and it's obvious that the way it should work is by making "?.", ?(" and "?[" new operators or operator pairs -- the "?" should not be a unary postfix operator but a symbol that combines with certain other symbols. Let me propose a (hyper?)generalization: it could be combined with any binary operation, e.g. "a?+b" would mean "None if a is None else a+b". Sadly (as hypergeneralizations tend to do?) this also leads to a negative observation: what if I wanted to write "None if b is None else a+b"? (And don't be funny and say I should swap a and b -- they could be strings.) Similar for what if you wanted to do this with a unary operator, e.g. None if x is None else -x. Maybe we could write "a+?b" and "-?x"? But I don't think the use cases warrant these much. Finally, let's give it a proper name -- let's call it the uptalk operator. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 19 07:06:48 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Sep 2015 15:06:48 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: <20150919050647.GR31152@ando.pearwood.info> On Fri, Sep 18, 2015 at 05:49:36PM -0700, Andrew Barnert via Python-ideas wrote: > Obviously "spam?" returns something with a __getattr__ method that > just passes through to spam.__getattr__, except that on NoneType it > returns something with a __getattr__ that always returns None. That > solves the eggs case. Ah, and now my enthusiasm for the whole idea is gone... In my previous response, I imagined spam?.attr to be syntactic sugar for `None if spam is None else spam.attr`. But having ? be an ordinary operator that returns a special Null object feels too "Design Pattern-y" to me. I think the Null design pattern is actually harmful, and I would not like to see this proposal implemented this way. (In another email, Andrew called the special object something like NoneMaybe or NoneQuestion, I forget which. I'm going to call the object Null, since that's less typing.) The Null object pattern sounds like a great idea at first, but I find it to be a code smell at best and outright harmful at worst. If you are passing around an object which is conceptually None, but unlike None reliably does nothing without raising an exception no matter what you do with it, that suggests to me that something about your code is not right. If your functions already accept None, then you should just use None. If they don't accept None, then why are you trying to smuggle None into them using a quasi-None that unconditionally hides errors? Here are some problems with the Null pattern as I see it: (1) Suppose that spam? returns a special Null object, and Null.attr itself returns Null. (As do Null[item] and Null(arg), of course.) This matches the classic Null object design pattern, and gives us chaining for free: value = obj?.spam.eggs.cheese But now `value` is Null, which may not be what we expect and may in fact be a problem if we're expecting it to be "an actual value, or None" rather than our quasi-None Null object. Because `value` is now a Null, every time we pass it to a function, we risk getting new Nulls in places that shouldn't get them. If a function isn't expecting None, we should get an exception, but Null is designed to not raise exceptions no matter what you do with it. So we risk contaminating our data with Nulls in unexpected places. Eventually, of course, there comes a time where we need to deal with the actual value. With the Null pattern in place, we have to deal with two special cases, not one: # I assume Null is a singleton, otherwise use isinstance if filename is not None and filename is not Null: os.unlink(filename) A small nuisance, to be sure, but part of the reason why I really don't think much of the Null object pattern. It sounds good on paper, but I think it's actually more dangerous and inconvenient than the problem it tries to solve. (2) We can avoid the worst of the Null design (anti-)pattern by having Null.attr return None instead of Null. Unfortunately, that means we've lost automatic chaining. If you have an object that might be None, we have to explicitly use the ? operator after each lookup except the last: value = obj?.spam?.eggs?.cheese which is (a) messy, (b) potentially inefficient, and (c) potentially hides subtle bugs. Here is a scenario where it hides bugs. Suppose obj may be None, but if it is not, then obj.spam *must* be a object with an eggs attribute. If obj.spam is None, that's a bug that needs fixing. Suppose we start off by writing the obvious thing: obj?.spam.eggs but that fails because obj=None raises an exception: obj? returns Null Null.spam returns None None.eggs raises So to protect against that, we might write: obj?.spam?.eggs but that protects against too much, and hides the fact that obj.spam exists but is None. As far as I am concerned, any use of a Null object has serious downsides. If people want to explicitly use it in their own code, well, good luck with that. I don't think Python should be making it a built-in. I think the first case, the classic Null design pattern, is actually *better* because the downsides are anything but subtle, and people will soon learn not to touch it with a 10ft pole *wink*, while the second case, the "Null.attr gives None" case, is actually worse because it isn't *obviously* wrong and can subtly hide bugs. How does my earlier idea of ? as syntactic sugar compare with those? In that case, there is no special Null object, there's only None. So we avoid the risk of Null infection, and avoid needing to check specially for Null. It also avoids the bug-hiding scenario: obj?.spam.eggs.cheese is equivalent to: None if obj is None else obj.spam.eggs If obj is None, we get None, as we expect. If it is not None, we get obj.spam.eggs as we expect. If obj.spam is wrongly None, then we get an exception, as we should. -- Steve From stephen at xemacs.org Sat Sep 19 07:14:40 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 19 Sep 2015 14:14:40 +0900 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> Message-ID: <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert via Python-ideas writes: > So, do we need a dunder method for the "?" operator? What else > would you use it for besides None? NaNs in a pure-Python implementation of float or Decimal. (This is not a practical suggestion.) A true SQL NULL type. It's always bothered me that most ORMs map NULL to None but there are plenty of other ways to inject None into a Python computation. (This probably isn't a practical suggestion either unless Random832's suggestion of ?() establishing a lexical context were adopted.) The point is that Maybe behavior is at least theoretically useful in subcategories, with special objects other than None. Sven's suggestion of calling this the "monad" operator triggers a worry in me, however. In Haskell, the Monad type doesn't enforce the monad laws, only the property of being an endofunctor. That apparently turns out to be enough in practice to make the Monad type very useful. However, in Python we have no way to enforce that property. I don't have the imagination to come up with a truly attractive nuisance here, and this operator doesn't enable general functorial behavior, so maybe it's not a problem. From random832 at fastmail.com Sat Sep 19 08:55:13 2015 From: random832 at fastmail.com (Random832) Date: Sat, 19 Sep 2015 02:55:13 -0400 Subject: [Python-ideas] Null coalescing operators References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: Guido van Rossum writes: > Let me propose a (hyper?)generalization: it could be combined with any > binary operation, e.g. "a?+b" would mean "None if a is None else a+b". I'd have read it as "None if a is None or b is None else a+b". If you want to only do it for one of the operands you should be explicit. I'm not sure if I have a coherent argument for why this shouldn't apply to ?[, though. From anthony at xtfx.me Sat Sep 19 10:17:07 2015 From: anthony at xtfx.me (C Anthony Risinger) Date: Sat, 19 Sep 2015 03:17:07 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: On Fri, Sep 18, 2015 at 11:21 PM, Guido van Rossum wrote: > On Fri, Sep 18, 2015 at 8:41 PM, Steven D'Aprano > wrote: > >> It's a funny thing, I'm usually not a huge fan of symbols outside of >> maths operators, and I strongly dislike the C ? ternary operator, but >> this one feels really natural to me. I didn't have even the most >> momentary "if you want Perl, you know where to find it" thought. >> > > I do, but at least the '?' is part of an operator, not part of the name > (as it is in Ruby?). > > I really, really, really don't like how it looks, but here's one thing: > the discussion can be cut short and focus almost entirely on whether this > is worth making Python uglier (and whether it's even ugly :-). The > semantics are crystal clear and it's obvious that the way it should work is > by making "?.", ?(" and "?[" new operators or operator pairs -- the "?" > should not be a unary postfix operator but a symbol that combines with > certain other symbols. > I really liked this whole thread, and I largely still do -- I?think -- but I'm not sure I like how `?` suddenly prevents whole blocks of code from being evaluated. Anything within the (...) or [...] is now skipped (IIUC) just because a `?` was added, which seems like it could have side effects on the surrounding state, especially since I expect people will use it for squashing/silencing or as a convenient trick after the fact, possibly in code they did not originally write. If the original example included a `?` like so: response = json.dumps?({ 'created': created?.isoformat(), 'updated': updated?.isoformat(), ... }) should "dumps" be None, the additional `?` (although though you can barely see it) prevents *everything else* from executing. This may cause confusion about what is being executed, and when, especially once nesting (to any degree really) and/or chaining comes into play! Usually when I want to use this pattern, I find I just need to write things out more. The concept itself vaguely reminds me of PHP's use of `@` for squashing errors. In my opinion, it has some utility but has too much potential impact on program flow without being very noticeable. If I saw more than 1 per line, or a couple within a few lines, I think my ability to quickly identify -> analyze -> comprehend possible routes in program control flow decreases. I feel like I'll fault more, double back, and/or make sure I forevermore look harder for sneaky `?`s. I probably need to research more examples of how such a thing is used in real code, today. This will help me get a feel for how people might want to integrate the new `?` capability into their libraries and apis, maybe that will ease my readability reservations. Thanks, -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sat Sep 19 09:03:27 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 Sep 2015 19:03:27 +1200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: <55FD08BF.9090800@canterbury.ac.nz> Guido van Rossum wrote: > Finally, let's give it a proper name -- let's call it the uptalk operator. Um... why? Is this Monty reference I'm missing? -- Greg From srkunze at mail.de Sat Sep 19 11:48:01 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 19 Sep 2015 11:48:01 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <55FD2F51.6020303@mail.de> On 19.09.2015 07:14, Stephen J. Turnbull wrote: > A true SQL NULL type. It's always bothered me that most ORMs map NULL > to None but there are plenty of other ways to inject None into a > Python computation. (This probably isn't a practical suggestion > either unless Random832's suggestion of ?() establishing a lexical > context were adopted.) I definitely agree here. Internally, we have a guideline telling us to avoid None or NULL whenever possible. Andrew's remark about 'code smell' is definitely appropriate. There was a great discussion some years ago on one of the RDF semantics mailing list about the semantics of NULL (in RDF). It turned out to have 6 or 7 semantics WITHOUT any domain-specific focus (don't know, don't exists, is missing, etc. -- can't remember all of them). I feel that is one reason why Python programs should avoid None: we don't guess. > The point is that Maybe behavior is at least theoretically useful in > subcategories, with special objects other than None. > > Sven's suggestion of calling this the "monad" operator triggers a > worry in me, however. In Haskell, the Monad type doesn't enforce the > monad laws, only the property of being an endofunctor. That > apparently turns out to be enough in practice to make the Monad type > very useful. However, in Python we have no way to enforce that > property. I don't have the imagination to come up with a truly > attractive nuisance here, and this operator doesn't enable general > functorial behavior, so maybe it's not a problem. Sleeping one night over it, I now tend to change my mind regarding this. Maybe, it's *better to DEAL with None as in remove* *them* from the code, from the database, from the YAML files and so forth *instead**of *making it easier to work with them. Restricting oneself, would eventually lead to more predictable designs. Does this makes sense somehow? Issue is, None is so convenient to work with. You only find out the code smell when you discover a "NoneType object does not have attribute X" exception some months later and start looking where the heck the None could come from. What can we do here? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 19 14:06:24 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Sep 2015 22:06:24 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: <20150919120624.GS31152@ando.pearwood.info> On Sat, Sep 19, 2015 at 03:17:07AM -0500, C Anthony Risinger wrote: > I really liked this whole thread, and I largely still do -- I?think -- but > I'm not sure I like how `?` suddenly prevents whole blocks of code from > being evaluated. Anything within the (...) or [...] is now skipped (IIUC) > just because a `?` was added, which seems like it could have side effects > on the surrounding state, especially since I expect people will use it for > squashing/silencing or as a convenient trick after the fact, possibly in > code they did not originally write. I don't think this is any different from other short-circuiting operators, particularly `and` and the ternary `if` operator: result = obj and obj.method(expression) result = obj.method(expression) if obj else default In both cases, `expression` is not evaluated if obj is falsey. That's the whole point. > If the original example included a `?` like so: > > response = json.dumps?({ > 'created': created?.isoformat(), > 'updated': updated?.isoformat(), > ... > }) > > should "dumps" be None, the additional `?` (although though you can barely > see it) prevents *everything else* from executing. We're still discussing the syntax and semantics of this, so I could be wrong, but my understanding of this is that the *first* question mark prevents the expressions in the parens from being executed: json.dumps?( ... ) evaluates as None if json.dumps is None, otherwise it evaluates the arguments and calls the dumps object. In other words, rather like this: _temp = json.dumps # temporary value if _temp is None: response = None else: response = _temp({ 'created': None if created is None else created.isoformat(), 'updated': None if updated is None else updated.isoformat(), ... }) del _temp except the _temp name isn't actually used. The whole point is to avoid evaluating an expression (attribute looking, index/key lookup, function call) which will fail if the object is None, and if you're not going to call the function, why evaluate the arguments to the function? > This may cause confusion > about what is being executed, and when, especially once nesting (to any > degree really) and/or chaining comes into play! Well, yes, people can abuse most any syntax. > Usually when I want to use this pattern, I find I just need to write things > out more. The concept itself vaguely reminds me of PHP's use of `@` for > squashing errors. I had to look up PHP's @ and I must say I'm rather horrified. According to the docs, all it does is suppress the error reporting, it does nothing to prevent or recover from errors. There's not really an equivalent in Python, but I suppose this is the closest: # similar to PHP's $result = @(expression); try: result = expression except: result = None This is nothing like this proposal. It doesn't suppress arbitrary errors. It's more like a conditional: # result = obj?(expression) if obj is None: result = None else: result = obj(expression) If `expression` raises an exception, it will still be raised, but only if it is actually evaluated, just like anything else protected by an if...else or short-circuit operator. -- Steve From stephen at xemacs.org Sat Sep 19 14:48:52 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 19 Sep 2015 21:48:52 +0900 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FD2F51.6020303@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> Message-ID: <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> Sven R. Kunze writes: > Issue is, None is so convenient to work with. You only find out the > code smell when you discover a "NoneType object does not have > attribute X" That's exactly what should happen (analogous to a "signalling NaN"). The problem is if you are using None as a proxy for a NULL in another subsystem that has "NULL contagion" (I prefer that to "coalescing"). At this point the thread ends for me because I'm not going try to tell the many libraries that have chosen to translate NULL to None and vice versa that they are wrong. From guido at python.org Sat Sep 19 18:21:04 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Sep 2015 09:21:04 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: "Uptalk" is an interesting speech pattern where every sentence sounds like a question. Google it, there's some interesting research. The "null pattern" is terrible. Uptalk should not be considered a unary operator that returns a magical value. It's a modifier on other operators (somewhat similar to the way "+=" and friends are formed). In case someone missed it, uptalk should test for None, not for a falsey value. I forgot to think about the scope of the uptalk operator (i.e. what is skipped when it finds a None). There are some clear cases (the actual implementation should avoid double evaluation of the tested expression, of course): a.b?.c.d[x, y](p, q) === None if a.b is None else a.b.c.d[x, y](p, q) a.b?[x, y].c.d(p, q) === None if a.b is None else a.b[x, y].c.d(p, q) a.b?(p, q).c.d[x, y] === None if a.b is None else a.b(p, q).c.d[x, y] But what about its effect on other operators in the same expression? I think this is reasonable: a?.b + c.d === None if a is None else a.b + c.d OTOH I don't think it should affect shortcut boolean operators (and, or): a?.b or x === (None if a is None else a.b) or x It also shouldn't escape out of comma-separated lists, argument lists, etc.: (a?.b, x) === ((None if a is None else a.b), x) f(a?.b) === f((None if a is None else a.b)) Should it escape from plain parentheses? Which of these is better? (a?.b) + c === (None if a is None else a.b) + c # Fails unless c overloads None+c (a?.b) + c === None if a is None else (a.b) + c # Could be surprising if ? is deeply nested Here are some more edge cases / hypergeneralizations: {k1?: v1, k2: v2} === {k2: v2} if k1 is None else {k1: v1, k2: v2} # ?: skips if key is None # But what to do to skip None values? Could we give ?= a meaning in assignment, e.g. x ?= y could mean: if y is not None: x = y More fun: x ?+= y could mean: if x is None: x = y elif y is not None: y += y You see where this is going. Downhill fast. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Sep 19 18:27:09 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 20 Sep 2015 02:27:09 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Sep 20, 2015 at 2:21 AM, Guido van Rossum wrote: > Should it escape from plain parentheses? Which of these is better? > > (a?.b) + c === (None if a is None else a.b) + c # Fails unless c > overloads None+c > (a?.b) + c === None if a is None else (a.b) + c # Could be surprising > if ? is deeply nested My recommendation: It should _not_ escape. That way, you get control over how far out the Noneness goes - you can bracket it in as tight as you like. ChrisA From chris.barker at noaa.gov Sat Sep 19 19:50:04 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Sat, 19 Sep 2015 10:50:04 -0700 Subject: [Python-ideas] add a single __future__ for py3? Message-ID: Hi all, the common advise, these days, if you want to write py2/3 compatible code, is to do: from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals https://docs.python.org/2/howto/pyporting.html#prevent-compatibility-regressions I'm trying to do this in my code, and teaching my students to do it to. but that's actually a lot of code to write. It would be nice to have a: from __future__ import py3 or something like that, that would do all of those in one swipe. IIIC, l can't make a little module that does that, because the __future__ imports only effect the module in which they are imported Sure, it's not a huge deal, but it would make it easier for folks wanting to keep up this best practice. Of course, this wouldn't happen until 2.7.11, if an when there even is one, but it would be nice to get it on the list.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Sep 19 20:16:12 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 20 Sep 2015 04:16:12 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library Message-ID: <20150919181612.GT31152@ando.pearwood.info> Following on to the discussions about changing the default random number generator, I would like to propose an alternative: adding a secrets module to the standard library. Attached is a draft PEP. Feedback is requested. (I'm going to only be intermittently at the keyboard for the next day or so, so my responses may be rather slow.) -- Steve -------------- next part -------------- PEP: xxx Title: Adding A Secrets Module To The Standard Library Version: $Revision$ Last-Modified: $Date$ Author: Steven D'Aprano Status: Draft Type: Standards Track Content-Type: text/plain Created: 19-Sep-2015 Python-Version: 3.6 Post-History: Abstract This PEP proposes the addition of a module for common security-related functions such as generating tokens to the Python standard library. Definitions Some common abbreviations used in this proposal: PRNG: Pseudo Random Number Generator. A deterministic algorithm used to produce random-looking numbers with certain desirable statistical properties. CSPRNG: Cryptographically Strong Pseudo Random Number Generator. An algorithm used to produce random-looking numbers which are resistant to prediction. MT: Mersenne Twister. An extensively studied PRNG which is currently used by the ``random`` module as the default. Rationale This proposal is motivated by concerns that Python's standard library makes it too easy for developers to inadvertently make serious security errors. Theo de Raadt, the founder of OpenBSD, contacted Guido van Rossum and expressed some concern[1] about the use of MT for generating sensitive information such as passwords, secure tokens, session keys and similar. Although the documentation for the random module explicitly states that the default is not suitable for security purposes[2], it is strongly believed that this warning may be missed, ignored or misunderstood by many Python developers. In particular: - developers may not have read the documentation and consequently not seen the warning; - they may not realise that their specific use of it has security implications; or - not realising that there could be a problem, they have copied code (or learned techniques) from websites which don't offer best practises. The first[3] hit when searching for "python how to generate passwords" on Google is a tutorial that uses the default functions from the ``random`` module[4]. Although it is not intended for use in web applications, it is likely that similar techniques find themselves used in that situation. The second hit is to a StackOverflow question about generating passwords[5]. Most of the answers given, including the accepted one, use the default functions. When one user warned that the default could be easily compromised, they were told "I think you worry too much."[6] This strongly suggests that the existing ``random`` module is an attractive nuisance when it comes to generating (for example) passwords or secure tokens. Additional motivation (of a more philosophical bent) can be found in the post which first proposed this idea[7]. Proposal Alternative proposals have focused on the default PRNG in the ``random`` module, with the aim of providing "secure by default" cryptographically strong primitives that developers can build upon without thinking about security. (See Alternatives below.) This proposes a different approach: * The standard library already provides cryptographically strong primitives, but many users don't know they exist or when to use them. * Instead of requiring crypto-naive users to write secure code, the standard library should include a set of ready-to-use "batteries" for the most common needs, such as generating secure tokens. This code will both directly satisfy a need ("How do I generate a password reset token?"), and act as an example of acceptable practises which developers can learn from[8]. To do this, this PEP proposes that we add a new module to the standard library, with the suggested name ``secrets``. This module will contain a set of ready-to-use functions for common activities with security implications, together with some lower-level primitives. The suggestion is that ``secrets`` becomes the go-to module for dealing with anything which should remain secret (passwords, tokens, etc.) while the ``random`` module remains backward-compatible. API and Implementation The contents of the ``secrets`` module is expected to evolve over time, and likely will evolve between the time of writing this PEP and actual release in the standard library[9]. At the time of writing, the following functions have been suggested: * A high-level function for generating secure tokens suitable for use in (e.g.) password recovery, as session keys, etc. * A limited interface to the system CSPRNG, using either ``os.urandom`` directly or ``random.SystemRandom``. Unlike the ``random`` module, this does not need to provide methods for seeding, getting or setting the state, or any non-uniform distributions. It should provide the following: - A function for choosing items from a sequence, ``secrets.choice``. - A function for generating an integer within some range, such as ``secrets.randrange`` or ``secrets.randint``. - A function for generating a given number of random bits and/or bytes as an integer. - A similar function which returns the value as a hex digit string. * ``hmac.compare_digest`` under the name ``equal``. The consensus appears to be that there is no need to add a new CSPRNG to the ``random`` module to support these uses, ``SystemRandom`` will be sufficient. Some illustrative implementations have been given by Nick Coghlan[10]. This idea has also been discussed on the issue tracker for the "cryptography" module[11]. The ``secrets`` module itself will be pure Python, and other Python implementations can easily make use of it unchanged, or adapt it as necessary. Alternatives One alternative is to change the default PRNG provided by the ``random`` module[12]. This received considerable scepticism and outright opposition: * There is fear that a CSPRNG may be slower than the current PRNG (which in the case of MT is already quite slow). * Some applications (such as scientific simulations, and replaying gameplay) require the ability to seed the PRNG into a known state, which a CSPRNG lacks by design. * Another major use of the ``random`` module is for simple "guess a number" games written by beginners, and many people are loath to make any change to the ``random`` module which may make that harder. * Although there is no proposal to remove MT from the ``random`` module, there was considerable hostility to the idea of having to opt-in to a non-CSPRNG or any backwards-incompatible changes. * Demonstrated attacks against MT are typically against PHP applications. It is believed that PHP's version of MT is a significantly softer target than Python's version, due to a poor seeding technique[13]. Consequently, without a proven attack against Python applications, many people object to a backwards-incompatible change. Nick Coghlan made an earlier suggestion for a globally configurable PRNG which uses the system CSPRNG by default[14], but has since hinted that he may withdraw it in favour of this proposal[15]. Comparison To Other Languages PHP PHP includes a function ``uniqid``[16] which by default returns a thirteen character string based on the current time in microseconds. Translated into Python syntax, it has the following signature: def uniqid(prefix='', more_entropy=False)->str The PHP documentation warns that this function is not suitable for security purposes. Nevertheless, various mature, well-known PHP applications use it for that purpose (citation needed). PHP 5.3 and better also includes a function ``openssl_random_pseudo_bytes``[17]. Translated into Python syntax, it has roughly the following signature: def openssl_random_pseudo_bytes(length:int)->Tuple[str, bool] This function returns a pseudo-random string of bytes of the given length, and an boolean flag giving whether the string is considered cryptographically strong. The PHP manual suggests that returning anything but True should be rare except for old or broken platforms. Javascript Based on a rather cursory search[18], there doesn't appear to be any well-known standard functions for producing strong random values in Javascript, although there may be good quality third-party libraries. Standard Javascript doesn't seem to include an interface to the system CSPRNG either, and people have extensively written about the weaknesses of Javascript's Math.random[19]. Ruby The Ruby standard library includes a module ``SecureRandom``[20] which includes the following methods: * base64 - returns a Base64 encoded random string. * hex - returns a random hexadecimal string. * random_bytes - returns a random byte string. * random_number - depending on the argument, returns either a random integer in the range(0, n), or a random float between 0.0 and 1.0. * urlsafe_base64 - returns a random URL-safe Base64 encoded string. * uuid - return a version 4 random Universally Unique IDentifier. What Should Be The Name Of The Module? There was a proposal to add a "random.safe" submodule, quoting the Zen of Python "Namespaces are one honking great idea" koan. However, the author of the Zen, Tim Peters, has come out against this idea[21], and recommends a top-level module. In discussion on the python-ideas mailing list so far, the name "secrets" has received some approval, and no strong opposition. Frequently Asked Questions Q: Is this a real problem? Surely MT is random enough that nobody can predict its output. A: The consensus among security professionals is that MT is not safe in security contexts. It is not difficult to reconstruct the internal state of MT[22][23] and so predict all past and future values. There are a number of known, practical attacks on systems using MT for randomness[24]. While there are currently no known direct attacks on applications written in Python due to the use of MT, there is widespread agreement that such usage is unsafe. Q: Is this an alternative to specialise cryptographic software such as SSL? A: No. This is a "batteries included" solution, not a full-featured "nuclear reactor". It is intended to mitigate against some basic security errors, not be a solution to all security-related issues. To quote Nick Coghlan referring to his earlier proposal: "...folks really are better off learning to use things like cryptography.io for security sensitive software, so this change is just about harm mitigation given that it's inevitable that a non-trivial proportion of the millions of current and future Python developers won't do that."[25] References [1] https://mail.python.org/pipermail/python-ideas/2015-September/035820.html [2] https://docs.python.org/3/library/random.html [3] As of the date of writing. Also, as Google search terms may be automatically customised for the user without their knowledge, some readers may see different results. [4] http://interactivepython.org/runestone/static/everyday/2013/01/3_password.html [5] http://stackoverflow.com/questions/3854692/generate-password-in-python [6] http://stackoverflow.com/questions/3854692/generate-password-in-python/3854766#3854766 [7] https://mail.python.org/pipermail/python-ideas/2015-September/036238.html [8] At least those who are motivated to read the source code and documentation. [9] Tim Peters suggests that bike-shedding the contents of the module will be 10000 times more time consuming than actually implementing the module. Words do not begin to express how much I am looking forward to this. [10] https://mail.python.org/pipermail/python-ideas/2015-September/036271.html [11] https://github.com/pyca/cryptography/issues/2347 [12] Link needed. [13] By default PHP seeds the MT PRNG with the time (citation needed), which is exploitable by attackers, while Python seeds the PRNG with output from the system CSPRNG, which is believed to be much harder to exploit. [14] http://legacy.python.org/dev/peps/pep-0504/ [15] https://mail.python.org/pipermail/python-ideas/2015-September/036243.html [16] http://php.net/manual/en/function.uniqid.php [17] http://php.net/manual/en/function.openssl-random-pseudo-bytes.php [18] Volunteers and patches are welcome. [19] http://ifsec.blogspot.fr/2012/05/cross-domain-mathrandom-prediction.html [20] http://ruby-doc.org/stdlib-2.1.2/libdoc/securerandom/rdoc/SecureRandom.html [21] https://mail.python.org/pipermail/python-ideas/2015-September/036254.html [22] https://jazzy.id.au/2010/09/22/cracking_random_number_generators_part_3.html [23] https://mail.python.org/pipermail/python-ideas/2015-September/036077.html [24] https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG_WP.pdf [25] https://mail.python.org/pipermail/python-ideas/2015-September/036157.html Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From brett at python.org Sat Sep 19 20:21:48 2015 From: brett at python.org (Brett Cannon) Date: Sat, 19 Sep 2015 18:21:48 +0000 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: Message-ID: On Sat, 19 Sep 2015 at 10:51 Chris Barker wrote: > Hi all, > > the common advise, these days, if you want to write py2/3 compatible code, > is to do: > > from __future__ import absolute_import > from __future__ import division > from __future__ import print_function > from __future__ import unicode_literals > > > https://docs.python.org/2/howto/pyporting.html#prevent-compatibility-regressions > > I'm trying to do this in my code, and teaching my students to do it to. > > but that's actually a lot of code to write. > > It would be nice to have a: > > from __future__ import py3 > > or something like that, that would do all of those in one swipe. > > IIIC, l can't make a little module that does that, because the __future__ > imports only effect the module in which they are imported > > Sure, it's not a huge deal, but it would make it easier for folks wanting > to keep up this best practice. > > Of course, this wouldn't happen until 2.7.11, if an when there even is > one, but it would be nice to get it on the list.... > > While in hindsight having a python3 __future__ statement that just turned on everything would be handy, this runs the risk of breaking code by introducing something that only works in a bugfix release and we went down that route with booleans in 2.2.1 and came to regret it. -Brett > -Chris > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Sat Sep 19 20:41:56 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 19 Sep 2015 20:41:56 +0200 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: Message-ID: <55FDAC74.7050001@mail.de> I totally agree here. On 19.09.2015 19:50, Chris Barker wrote: > Hi all, > > the common advise, these days, if you want to write py2/3 compatible > code, is to do: > > from __future__ import absolute_import > from __future__ import division > from __future__ import print_function > from __future__ import unicode_literals > > https://docs.python.org/2/howto/pyporting.html#prevent-compatibility-regressions > > I'm trying to do this in my code, and teaching my students to do it to. > > but that's actually a lot of code to write. > > It would be nice to have a: > > from __future__ import py3 > > or something like that, that would do all of those in one swipe. > > IIIC, l can't make a little module that does that, because the > __future__ imports only effect the module in which they are imported > > Sure, it's not a huge deal, but it would make it easier for folks > wanting to keep up this best practice. > > Of course, this wouldn't happen until 2.7.11, if an when there even is > one, but it would be nice to get it on the list.... > > -Chris > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Sep 19 20:45:59 2015 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 19 Sep 2015 19:45:59 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <55FDAD67.40303@mrabarnett.plus.com> On 2015-09-19 17:21, Guido van Rossum wrote: > "Uptalk" is an interesting speech pattern where every sentence sounds > like a question. Google it, there's some interesting research. > > The "null pattern" is terrible. Uptalk should not be considered a unary > operator that returns a magical value. It's a modifier on other > operators (somewhat similar to the way "+=" and friends are formed). > > In case someone missed it, uptalk should test for None, not for a falsey > value. > > I forgot to think about the scope of the uptalk operator (i.e. what is > skipped when it finds a None). There are some clear cases (the actual > implementation should avoid double evaluation of the tested expression, > of course): > > a.b?.c.d[x, y](p, q) === None if a.b is None else a.b.c.d[x, y](p, q) > a.b?[x, y].c.d(p, q) === None if a.b is None else a.b[x, y].c.d(p, q) > a.b?(p, q).c.d[x, y] === None if a.b is None else a.b(p, q).c.d[x, y] > > But what about its effect on other operators in the same expression? I > think this is reasonable: > > a?.b + c.d === None if a is None else a.b + c.d > > OTOH I don't think it should affect shortcut boolean operators (and, or): > > a?.b or x === (None if a is None else a.b) or x > > It also shouldn't escape out of comma-separated lists, argument lists, etc.: > > (a?.b, x) === ((None if a is None else a.b), x) > f(a?.b) === f((None if a is None else a.b)) > > Should it escape from plain parentheses? Which of these is better? > > (a?.b) + c === (None if a is None else a.b) + c # Fails unless c > overloads None+c > (a?.b) + c === None if a is None else (a.b) + c # Could be > surprising if ? is deeply nested > It shouldn't escape beyond anything having a lower precedence. > Here are some more edge cases / hypergeneralizations: > > {k1?: v1, k2: v2} === {k2: v2} if k1 is None else {k1: v1, k2: v2} > # ?: skips if key is None > # But what to do to skip None values? > > Could we give ?= a meaning in assignment, e.g. x ?= y could mean: > > if y is not None: > x = y > Shouldn't that be: if x is not None: x = y ? It's the value before the '?' that's tested. > More fun: x ?+= y could mean: > > if x is None: > x = y > elif y is not None: > y += y > Or: if x is None: pass else: x += y > You see where this is going. Downhill fast. :-) > Could it be used postfix: a +? b === None if b is None else a + b -?a === None if a is None else -a or both prefix and postfix: a ?+? b === None if a is None or b is None else a + b ? From srkunze at mail.de Sat Sep 19 21:09:48 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Sat, 19 Sep 2015 21:09:48 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <55FDB2FC.3080104@mail.de> On 19.09.2015 14:48, Stephen J. Turnbull wrote: > Sven R. Kunze writes: > > > Issue is, None is so convenient to work with. You only find out the > > code smell when you discover a "NoneType object does not have > > attribute X" > > That's exactly what should happen (analogous to a "signalling NaN"). Not my point, Stephen. My point is, you better avoid None (despite its convenience) because you are going to have a hard time finding its origin later in the control flow. Question still stands: is None really necessary to justify the introduction of convenience operators like "?." etc.? > The problem is if you are using None as a proxy for a NULL in another > subsystem that has "NULL contagion" (I prefer that to "coalescing"). How would you solve instead? Best, Sven From rymg19 at gmail.com Sat Sep 19 21:24:19 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 19 Sep 2015 14:24:19 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FDB2FC.3080104@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FDB2FC.3080104@mail.de> Message-ID: <5FE190D3-B6E1-4CBE-9082-366A672D9D0A@gmail.com> I think the core issue is that, whether or not it should be used, APIs already return None values, so a convenience operator might as well be added. On September 19, 2015 2:09:48 PM CDT, "Sven R. Kunze" wrote: >On 19.09.2015 14:48, Stephen J. Turnbull wrote: >> Sven R. Kunze writes: >> >> > Issue is, None is so convenient to work with. You only find out >the >> > code smell when you discover a "NoneType object does not have >> > attribute X" >> >> That's exactly what should happen (analogous to a "signalling NaN"). > >Not my point, Stephen. My point is, you better avoid None (despite its >convenience) because you are going to have a hard time finding its >origin later in the control flow. > >Question still stands: is None really necessary to justify the >introduction of convenience operators like "?." etc.? > >> The problem is if you are using None as a proxy for a NULL in another >> subsystem that has "NULL contagion" (I prefer that to "coalescing"). > >How would you solve instead? > > >Best, >Sven >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.combelle at gmail.com Sat Sep 19 22:03:10 2015 From: xavier.combelle at gmail.com (Xavier Combelle) Date: Sat, 19 Sep 2015 22:03:10 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <5FE190D3-B6E1-4CBE-9082-366A672D9D0A@gmail.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FDB2FC.3080104@mail.de> <5FE190D3-B6E1-4CBE-9082-366A672D9D0A@gmail.com> Message-ID: 2015-09-19 21:24 GMT+02:00 Ryan Gonzalez : > I think the core issue is that, whether or not it should be used, APIs > already return None values, so a convenience operator might as well be > added. > > I'm curious on which API returning None, a major bonus on using python is that I pretty never stumbled upon the equivalent of NullPointerException. Moreover, I wonder if that this convenience operator will do something more than hide bugs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Sep 19 22:57:29 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Sep 2015 13:57:29 -0700 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150919181612.GT31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: Thanks! I'd accept this (and I'd reject 504 at the same time). I like the secrets name. I wonder though, should the PEP propose a specific set of functions? (With the understanding that we might add more later.) Hopefully someone on the peps team can commit your PEP in the repo. It's probably going to be PEP 506. On Sat, Sep 19, 2015 at 11:16 AM, Steven D'Aprano wrote: > Following on to the discussions about changing the default random number > generator, I would like to propose an alternative: adding a secrets > module to the standard library. > > Attached is a draft PEP. Feedback is requested. > > (I'm going to only be intermittently at the keyboard for the next day or > so, so my responses may be rather slow.) > > > -- > Steve > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sat Sep 19 23:39:41 2015 From: random832 at fastmail.com (Random832) Date: Sat, 19 Sep 2015 17:39:41 -0400 Subject: [Python-ideas] Null coalescing operators References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FDB2FC.3080104@mail.de> <5FE190D3-B6E1-4CBE-9082-366A672D9D0A@gmail.com> Message-ID: Xavier Combelle writes: > I'm curious on which API returning None, a major bonus on using python > is that I pretty never stumbled upon the equivalent of > NullPointerException. It doesn't strictly have one; None is an object and you get the usual TypeError, AttributeError, etc, upon using it in a place it's not expected. From guido at python.org Sat Sep 19 23:47:52 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Sep 2015 14:47:52 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FDB2FC.3080104@mail.de> <5FE190D3-B6E1-4CBE-9082-366A672D9D0A@gmail.com> Message-ID: On Sat, Sep 19, 2015 at 2:39 PM, Random832 wrote: > Xavier Combelle > writes: > > > I'm curious on which API returning None, a major bonus on using python > > is that I pretty never stumbled upon the equivalent of > > NullPointerException. > > It doesn't strictly have one; None is an object and you get the usual > TypeError, AttributeError, etc, upon using it in a place it's not expected. > Most often AttributeError. It's pretty common in large Python systems. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sun Sep 20 00:00:47 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 19 Sep 2015 23:00:47 +0100 Subject: [Python-ideas] new format spec for iterable types In-Reply-To: <55F03BF3.50106@trueblade.com> References: <55F03BF3.50106@trueblade.com> Message-ID: On 09/09/2015 15:02, Eric V. Smith wrote: > At some point, instead of complicating how format works internally, you > should just write a function that does what you want. I realize there's > a continuum between '{}'.format(iterable) and > '{ to draw the line. But when the solution is to bake knowledge of > iterables into .format(), I think we've passed the point where we should > switch to a function: '{}'.format(some_function(iterable)). > > In any event, If you want to play with this, I suggest you write > some_function(iterable) that does what you want, first. > > Eric. > Something like this from Nick Coghlan https://code.activestate.com/recipes/577845-format_iter-easy-formatting-of-arbitrary-iterables ??? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From storchaka at gmail.com Sun Sep 20 00:59:01 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 20 Sep 2015 01:59:01 +0300 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150919181612.GT31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On 19.09.15 21:16, Steven D'Aprano wrote: > Following on to the discussions about changing the default random number > generator, I would like to propose an alternative: adding a secrets > module to the standard library. Python already has three secret modules: this and antigravity. From rosuav at gmail.com Sun Sep 20 01:00:25 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 20 Sep 2015 09:00:25 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On Sun, Sep 20, 2015 at 6:57 AM, Guido van Rossum wrote: > [in response to Steven D'Aprano's proto-PEP] > Hopefully someone on the peps team can commit your PEP in the repo. It's > probably going to be PEP 506. I think that's my cue! PEP 506 created and pushed. I've manually converted the original text to RST, but if that was a bad idea, I can revert to text. ChrisA From rosuav at gmail.com Sun Sep 20 01:02:26 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 20 Sep 2015 09:02:26 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On Sun, Sep 20, 2015 at 8:59 AM, Serhiy Storchaka wrote: > On 19.09.15 21:16, Steven D'Aprano wrote: >> >> Following on to the discussions about changing the default random number >> generator, I would like to propose an alternative: adding a secrets >> module to the standard library. > > > Python already has three secret modules: this and antigravity. *scratches head* Is this Pythonesque counting, or is there some way to "import and" that has escaped me? ChrisA From phd at phdru.name Sun Sep 20 01:07:01 2015 From: phd at phdru.name (Oleg Broytman) Date: Sun, 20 Sep 2015 01:07:01 +0200 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <20150919230701.GA19380@phdru.name> On Sun, Sep 20, 2015 at 09:02:26AM +1000, Chris Angelico wrote: > On Sun, Sep 20, 2015 at 8:59 AM, Serhiy Storchaka wrote: > > On 19.09.15 21:16, Steven D'Aprano wrote: > >> > >> Following on to the discussions about changing the default random number > >> generator, I would like to propose an alternative: adding a secrets > >> module to the standard library. > > > > Python already has three secret modules: this and antigravity. > > *scratches head* Is this Pythonesque counting, or is there some way to > "import and" that has escaped me? Or, BTW, I always wanted "import this or that"! > ChrisA Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From storchaka at gmail.com Sun Sep 20 01:07:05 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 20 Sep 2015 02:07:05 +0300 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On 20.09.15 02:02, Chris Angelico wrote: > On Sun, Sep 20, 2015 at 8:59 AM, Serhiy Storchaka wrote: >> On 19.09.15 21:16, Steven D'Aprano wrote: >>> Following on to the discussions about changing the default random number >>> generator, I would like to propose an alternative: adding a secrets >>> module to the standard library. >> Python already has three secret modules: this and antigravity. > > *scratches head* Is this Pythonesque counting, or is there some way to > "import and" that has escaped me? The name of the third module is secret. From steve at pearwood.info Sun Sep 20 01:13:09 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 20 Sep 2015 09:13:09 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <20150919231309.GU31152@ando.pearwood.info> On Sun, Sep 20, 2015 at 09:02:26AM +1000, Chris Angelico wrote: > On Sun, Sep 20, 2015 at 8:59 AM, Serhiy Storchaka wrote: > > On 19.09.15 21:16, Steven D'Aprano wrote: > >> > >> Following on to the discussions about changing the default random number > >> generator, I would like to propose an alternative: adding a secrets > >> module to the standard library. > > > > > > Python already has three secret modules: this and antigravity. > > *scratches head* Is this Pythonesque counting, or is there some way to > "import and" that has escaped me? We could tell you what the third secret module is, but then we'd have to kill you. -- Steve From tim.peters at gmail.com Sun Sep 20 01:40:32 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 19 Sep 2015 18:40:32 -0500 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: [Guido] > Thanks! I'd accept this (and I'd reject 504 at the same time). I like the > secrets name. I wonder though, should the PEP propose a specific set of > functions? (With the understanding that we might add more later.) The bikeshedding on that will be far more tedious than the implementation. I'll get it started :-) No attempt to be minimal here. More-than-less "obvious" is more important: Bound methods of a SystemRandom instance .randrange() .randint() .randbits() renamed from .getrandbits() .randbelow(exclusive_upper_bound) renamed from private ._randbelow() .choice() Token functions .token_bytes(nbytes) another name for os.urandom() .token_hex(nbytes) same, but return string of ASCII hex digits .token_url(nbytes) same, but return URL-safe base64-encoded ASCII .token_alpha(alphabet, nchars) string of `nchars` characters drawn uniformly from `alphabet` From gokoproject at gmail.com Sun Sep 20 01:50:55 2015 From: gokoproject at gmail.com (John Wong) Date: Sat, 19 Sep 2015 19:50:55 -0400 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On Sat, Sep 19, 2015 at 7:07 PM, Serhiy Storchaka wrote: > On 20.09.15 02:02, Chris Angelico wrote: > >> On Sun, Sep 20, 2015 at 8:59 AM, Serhiy Storchaka >> wrote: >> >>> On 19.09.15 21:16, Steven D'Aprano wrote: >>> >>>> Following on to the discussions about changing the default random number >>>> generator, I would like to propose an alternative: adding a secrets >>>> module to the standard library. >>>> >>> Python already has three secret modules: this and antigravity. >>> >> >> *scratches head* Is this Pythonesque counting, or is there some way to >> "import and" that has escaped me? >> > > The name of the third module is secret. > > is "secret" or is a secret? Why not "secure"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Sep 20 02:11:31 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 19 Sep 2015 17:11:31 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FDB2FC.3080104@mail.de> <5FE190D3-B6E1-4CBE-9082-366A672D9D0A@gmail.com> Message-ID: <6F3E8475-5D26-4D9A-9AD9-7F463E91CBFF@yahoo.com> On Sep 19, 2015, at 14:47, Guido van Rossum wrote: > >> On Sat, Sep 19, 2015 at 2:39 PM, Random832 wrote: >> Xavier Combelle >> writes: >> >> > I'm curious on which API returning None, a major bonus on using python >> > is that I pretty never stumbled upon the equivalent of >> > NullPointerException. >> >> It doesn't strictly have one; None is an object and you get the usual >> TypeError, AttributeError, etc, upon using it in a place it's not expected. > > Most often AttributeError. It's pretty common in large Python systems. The TypeErrors usually come from novices. There are many of StackOverflow questions asking why they can't add spam.get_text() + "\n" where they don't show you the implementation of get_text, or the exception they got, but you just know they forgot a return statement at the end and the exception was a TypeError about adding NoneType and str. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Sun Sep 20 02:13:08 2015 From: random832 at fastmail.com (Random832) Date: Sat, 19 Sep 2015 20:13:08 -0400 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: (Serhiy Storchaka's message of "Sun, 20 Sep 2015 01:59:01 +0300") References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: Serhiy Storchaka writes: > Python already has three secret modules: this and antigravity. Even its secrets have secrets. Show of hands, who here knew about antigravity.geohash? From rosuav at gmail.com Sun Sep 20 02:14:02 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 20 Sep 2015 10:14:02 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On Sun, Sep 20, 2015 at 9:40 AM, Tim Peters wrote: > Token functions > .token_bytes(nbytes) > another name for os.urandom() > .token_hex(nbytes) > same, but return string of ASCII hex digits > .token_url(nbytes) > same, but return URL-safe base64-encoded ASCII > .token_alpha(alphabet, nchars) > string of `nchars` characters drawn uniformly > from `alphabet` token_bytes "obviously" should return a bytes, and token_alpha equally obviously should be returning a str. (Or maybe it should return the same type as alphabet, which could be either?) What about the other two? Also, if you ask for 4 bytes from token_hex, do you get 4 hex digits or 8 (four bytes of entropy)? ChrisA From rosuav at gmail.com Sun Sep 20 02:15:09 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 20 Sep 2015 10:15:09 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On Sun, Sep 20, 2015 at 10:13 AM, Random832 wrote: > Serhiy Storchaka writes: >> Python already has three secret modules: this and antigravity. > > Even its secrets have secrets. Show of hands, who here knew about > antigravity.geohash? I did, but I'm an XKCD wonk. ChrisA From bussonniermatthias at gmail.com Sun Sep 20 02:16:39 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sat, 19 Sep 2015 17:16:39 -0700 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: You forgot : from __future__ import braces -- M On Sat, Sep 19, 2015 at 4:50 PM, John Wong wrote: > > > On Sat, Sep 19, 2015 at 7:07 PM, Serhiy Storchaka > wrote: >> >> On 20.09.15 02:02, Chris Angelico wrote: >>> >>> On Sun, Sep 20, 2015 at 8:59 AM, Serhiy Storchaka >>> wrote: >>>> >>>> On 19.09.15 21:16, Steven D'Aprano wrote: >>>>> >>>>> Following on to the discussions about changing the default random >>>>> number >>>>> generator, I would like to propose an alternative: adding a secrets >>>>> module to the standard library. >>>> >>>> Python already has three secret modules: this and antigravity. >>> >>> >>> *scratches head* Is this Pythonesque counting, or is there some way to >>> "import and" that has escaped me? >> >> >> The name of the third module is secret. >> > > is "secret" or is a secret? Why not "secure"? > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From tim.peters at gmail.com Sun Sep 20 02:19:27 2015 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 19 Sep 2015 19:19:27 -0500 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: [Tim Peters] >> Token functions >> .token_bytes(nbytes) >> another name for os.urandom() >> .token_hex(nbytes) >> same, but return string of ASCII hex digits >> .token_url(nbytes) >> same, but return URL-safe base64-encoded ASCII >> .token_alpha(alphabet, nchars) >> string of `nchars` characters drawn uniformly >> from `alphabet` [Chris Angelico ] > token_bytes "obviously" should return a bytes, Which os.urandom() does in Python 3. I'm not writing docs, just suggesting the functions. > and token_alpha equally obviously should be returning a str. Which part of "string" doesn't suggest "str"? > (Or maybe it should return the same type as alphabet, which > could be either?) > >: What about the other two? Which part of "ASCII" is ambiguous? > Also, if you ask for 4 bytes from token_hex, do you get 4 hex > digits or 8 (four bytes of entropy)? And which part of "same"? ;-) Bikeshed away.; I'm outta this now ;-) From rosuav at gmail.com Sun Sep 20 02:27:42 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 20 Sep 2015 10:27:42 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On Sun, Sep 20, 2015 at 10:19 AM, Tim Peters wrote: > [Chris Angelico ] >> token_bytes "obviously" should return a bytes, > > Which os.urandom() does in Python 3. I'm not writing docs, just > suggesting the functions. > >> and token_alpha equally obviously should be returning a str. > > Which part of "string" doesn't suggest "str"? > >> (Or maybe it should return the same type as alphabet, which >> could be either?) >> >>: What about the other two? > > Which part of "ASCII" is ambiguous? > >> Also, if you ask for 4 bytes from token_hex, do you get 4 hex >> digits or 8 (four bytes of entropy)? > > And which part of "same"? ;-) > > Bikeshed away.; I'm outta this now ;-) Heh :) My personal preference for shed colour: token_bytes returns a bytestring, its length being the number provided. All the others return Unicode strings, their lengths again being the number provided. So they're all text bar the one that explicitly says it's in bytes. But I'm aware others may disagree, and while "ASCII" might not be ambiguous, Py3 does still distinguish between b"asdf" and u"asdf" :) ChrisA From stephen at xemacs.org Sun Sep 20 06:45:36 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 20 Sep 2015 13:45:36 +0900 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <871tdtvgun.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > My personal preference for shed colour: token_bytes returns a > bytestring, its length being the number provided. All the others > return Unicode strings, their lengths again being the number provided. > So they're all text bar the one that explicitly says it's in bytes. I think that token_url may need a bytes mode, for the same reasons that bytes needs __mod__: such tokens will often be created and parsed by programs that never leave the "ASCII-compatible bytes" world. From storchaka at gmail.com Sun Sep 20 08:00:08 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 20 Sep 2015 09:00:08 +0300 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On 20.09.15 02:40, Tim Peters wrote: > No attempt to be minimal here. More-than-less "obvious" is more important: > > Bound methods of a SystemRandom instance > .randrange() > .randint() > .randbits() > renamed from .getrandbits() > .randbelow(exclusive_upper_bound) > renamed from private ._randbelow() > .choice() randbelow() is just an alias for randrange() with single argument. randint(a, b) == randrange(a, b+1). These functions are redundant and they have non-zero cost. Would not renaming getrandbits be confused? > Token functions > .token_bytes(nbytes) > another name for os.urandom() > .token_hex(nbytes) > same, but return string of ASCII hex digits > .token_url(nbytes) > same, but return URL-safe base64-encoded ASCII > .token_alpha(alphabet, nchars) > string of `nchars` characters drawn uniformly > from `alphabet` token_hex(nbytes) == token_alpha('0123456789abcdef', nchars) ? token_url(nbytes) == token_alpha( 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', nchars) ? From storchaka at gmail.com Sun Sep 20 08:10:32 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 20 Sep 2015 09:10:32 +0300 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: On 19.09.15 07:21, Guido van Rossum wrote: > I do, but at least the '?' is part of an operator, not part of the name > (as it is in Ruby?). What to do with the "in" operator? From random832 at fastmail.com Sun Sep 20 08:45:31 2015 From: random832 at fastmail.com (Random832) Date: Sun, 20 Sep 2015 02:45:31 -0400 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library References: <20150919181612.GT31152@ando.pearwood.info> <871tdtvgun.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: "Stephen J. Turnbull" writes: > Chris Angelico writes: > > > My personal preference for shed colour: token_bytes returns a > > bytestring, its length being the number provided. All the others > > return Unicode strings, their lengths again being the number provided. > > So they're all text bar the one that explicitly says it's in bytes. > > I think that token_url may need a bytes mode, for the same reasons > that bytes needs __mod__: such tokens will often be created and parsed > by programs that never leave the "ASCII-compatible bytes" world. For token_alpha the obvious answer is to return the same type as alphabet, which there's no reason not to allow be either. From random832 at fastmail.com Sun Sep 20 08:46:33 2015 From: random832 at fastmail.com (Random832) Date: Sun, 20 Sep 2015 02:46:33 -0400 Subject: [Python-ideas] Null coalescing operators References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: Serhiy Storchaka writes: > On 19.09.15 07:21, Guido van Rossum wrote: >> I do, but at least the '?' is part of an operator, not part of the name >> (as it is in Ruby?). > > What to do with the "in" operator? This is one of those things where we've got to decide which side it applies to. None can be in a list, but not a string. And nothing can be in None. From storchaka at gmail.com Sun Sep 20 09:28:03 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 20 Sep 2015 10:28:03 +0300 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: On 20.09.15 09:46, Random832 wrote: > Serhiy Storchaka > writes: >> On 19.09.15 07:21, Guido van Rossum wrote: >>> I do, but at least the '?' is part of an operator, not part of the name >>> (as it is in Ruby?). >> What to do with the "in" operator? > This is one of those things where we've got to decide which side it > applies to. None can be in a list, but not a string. And nothing can be > in None. All operators are either identifiers ("in", "is", "not"), or nonalphabetic. Not mixes. There is no the "in=" operator and I guess shouldn't be "?in". From steve at pearwood.info Sun Sep 20 09:31:58 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 20 Sep 2015 17:31:58 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> Message-ID: <20150920073157.GV31152@ando.pearwood.info> On Sun, Sep 20, 2015 at 09:10:32AM +0300, Serhiy Storchaka wrote: > On 19.09.15 07:21, Guido van Rossum wrote: > >I do, but at least the '?' is part of an operator, not part of the name > >(as it is in Ruby?). > > What to do with the "in" operator? Absolutely nothing. I'm not convinced that we should generalise this beyond the three original examples of attribute access, item lookup and function call. I think that applying ? to arbitrary operators is a case of "YAGNI". Or perhaps, "You Shouldn't Need It". Mark's original motivating use-case strikes me as both common and unexceptional. We might write: # spam may be None, or some object result = spam or spam.attr # better, as it doesn't misbehave when spam is falsey result = None if spam is None else spam.attr and it seems reasonable to me to want a short-cut for that use-case. But the generalisations to arbitrary operators suggested by Guido strike me as too much, too far. As he says, going downhill, and quickly. Consider these two hypotheticals: spam ?+ eggs # None if spam is None or eggs is None else spam + eggs needle ?in haystack # None if needle is None or haystack is None else needle in haystack Briefer (more concise) is not necessarily better. At the point you have *two* objects in the one term that both need to be checked for None, that is in my opinion a code smell and we shouldn't provide a short-cut disguising that. Technically, x.y x[y] and x(y) aren't operators, but for the sake of convenience I'll call them such. Even though these are binary operators, the ? only shortcuts according to the x, not the y. So we can call these ?. ?[] ?() operators "pseudo-unary" operators rather than binary operators. Are there any actual unary operators we might want to apply this uptalk/shrug operator to? There are (if I remember correctly) only three unary operators: + - and ~. I don't think there are any reasonable use-cases for writing (say): value = ?-x that justifies making this short-cut available. So as far as I am concerned, the following conditions should apply: - the uptalk/shrug ? "operator" should not apply to actual binary operators where both operands need to be checked for None-ness (e.g. arithmetic operators, comparison operators) - it should not apply to arithmetic unary operators + - and ~ - it might apply to pseudo-operators where only the lefthand argument is checked for None-ness, that is, x.y x[y] and x(y), written as x?.y x?[y] and x?(y). If I had to choose between generalising this to all operators, or not having it at all, I'd rather not have it at all. A little bit of uptalk goes a long way, once we have ? appearing all over the place in all sorts of expressions, I think it's too much. -- Steve From steve at pearwood.info Sun Sep 20 09:34:23 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 20 Sep 2015 17:34:23 +1000 Subject: [Python-ideas] [Python-Dev] Make stacklevel=2 by default in warnings.warn() In-Reply-To: References: Message-ID: <20150920073423.GW31152@ando.pearwood.info> On Sat, Sep 19, 2015 at 11:55:44PM -0700, Nathaniel Smith wrote: > I don't have enough fingers to count how many times I've had to > explain how stacklevel= works to maintainers of widely-used packages > -- they had no idea that this was even a thing they were getting > wrong. Count me in that. I had no idea it was even a thing. -- Steve From bruce at leban.us Sun Sep 20 09:50:09 2015 From: bruce at leban.us (Bruce Leban) Date: Sun, 20 Sep 2015 00:50:09 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC96A8.605@lucidity.plus.com> <87d1xfyoqn.fsf@uwakimon.sk.tsukuba.ac.jp> <55FD2F51.6020303@mail.de> <8761368thn.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sat, Sep 19, 2015 at 9:21 AM, Guido van Rossum wrote: > I forgot to think about the scope of the uptalk operator (i.e. what is > skipped when it finds a None). There are some clear cases (the actual > implementation should avoid double evaluation of the tested expression, of > course): > > a.b?.c.d[x, y](p, q) === None if a.b is None else a.b.c.d[x, y](p, q) > a.b?[x, y].c.d(p, q) === None if a.b is None else a.b[x, y].c.d(p, q) > a.b?(p, q).c.d[x, y] === None if a.b is None else a.b(p, q).c.d[x, y] > > This makes sense to me. > But what about its effect on other operators in the same expression? I > think this is reasonable: > > a?.b + c.d === None if a is None else a.b + c.d > This is a bit weird to me. Essentially ?. takes precedence over a following +. But would you also expect it to take precedence over a preceding one as well? That's inconsistent. c.d + a?.b === None if a is None else c.d + a.b or c.d + a?.b === c.d + None if a is None else c.d + a.b I think that ?. ?[] and ?() should affect other operators at the same precedence level only, i.e., each other and . [] and (). This seems the most logical to me. And I just looked up the C# documentation on MSDN and it does the same thing: https://msdn.microsoft.com/en-us/library/dn986595.aspx > It also shouldn't escape out of comma-separated lists, argument lists, > etc.: > > (a?.b, x) === ((None if a is None else a.b), x) > f(a?.b) === f((None if a is None else a.b)) > Agree. It also should not escape grouping parenthesis even though that might not be useful. It would be very weird if a parenthesized expression did something other than evaluate the expression inside it, period. (a?.b).c === None.c if a is None else (a.b).c === temp = a?.b; temp.c (x or a?.b).c === (x or (None if a is none else a.b)).c Yes, None.c is going to raise an exception. That's better than just getting None IMHO. --- Bruce Check out my new puzzle book: http://J.mp/ingToConclusions Get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Sep 20 11:28:53 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 20 Sep 2015 02:28:53 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150920073157.GV31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <6CD00D68-2B91-4318-9301-740800A6EC0B@yahoo.com> On Sep 20, 2015, at 00:31, Steven D'Aprano wrote: > >> On Sun, Sep 20, 2015 at 09:10:32AM +0300, Serhiy Storchaka wrote: >>> On 19.09.15 07:21, Guido van Rossum wrote: >>> I do, but at least the '?' is part of an operator, not part of the name >>> (as it is in Ruby?). >> >> What to do with the "in" operator? > > Absolutely nothing. > > I'm not convinced that we should generalise this beyond the three > original examples of attribute access, item lookup and function call. I > think that applying ? to arbitrary operators is a case of "YAGNI". Or > perhaps, "You Shouldn't Need It". I agree. Seeing how far you can generalize something and whether you can come up with a simple rule that makes all of your use cases follow naturally can be fun, but it isn't necessarily the best design. Also, by not trying to generalize uptalk-combined operators (or uptalk as a postfix unary operator of its own, which I was earlier arguing for...), the question of how we deal with ?? or ?= (if we want them) can be "the same way every other language does", rather than seeing what follows from the general rule and then convincing ourselves that's what we wanted. Also, I think trying to generalize to all operators is a false generalization, since the things we're generalizing from aren't actually operators (and not just syntactically--e.g., stylistically, they're never surrounded by spaces--which makes a pretty big difference in the readability impact of a character as heavy as "?") in the first place. Personally, I think ?? is the second most obviously useful after ?. (there's a reason it's the one with the oldest and widest pedigree); we need ?() because Python, unlike C# and friends, unifies member and method access; ?[] doesn't seem as necessary but it's such an obvious parallel to ?() that I think people will expect it; ?= is potentially as confusing as it is helpful. So, my suggestion would be just the first four. And keeping them simple, and consistent with other languages, no trying to extend the protection to other operators/accesses, no extra short-circuiting, nothing. So: spam ?? eggs === spam if spam is not None else eggs spam?.eggs === spam.eggs if spam is not None else None spam?(eggs) === spam(eggs) if spam is not None else None spam?[eggs] === spam[eggs] if spam is not None else None That's easy to define, easy to learn and remember, and pretty consistent with other languages. The one big difference is that what you write as "spam?.eggs(cheese)" in C# has to be "spam?.eggs?(cheese)" in Python, but I don't think that's a big problem. After all, in Python, spam.eggs is a first-class object, and one that's commonly passed or stored, so the obvious way to look at "spam.eggs(cheese)" is as explicitly chaining two separate things together (a __getattr__ with a descriptor __get__, and a __call__), so why shouldn't uptalking both operations be explicit? From rosuav at gmail.com Sun Sep 20 11:38:18 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 20 Sep 2015 19:38:18 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150920073157.GV31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: On Sun, Sep 20, 2015 at 5:31 PM, Steven D'Aprano wrote: > Technically, x.y x[y] and x(y) aren't operators, but for the sake of > convenience I'll call them such. Even though these are binary operators, > the ? only shortcuts according to the x, not the y. So we can call > these ?. ?[] ?() operators "pseudo-unary" operators rather than binary > operators. That's how all Python's short-circuiting works - based on the value of what's on the left, decide whether or not to evaluate what's on the right. (Well, nearly all - if/else evaluates the middle first, but same difference.) This is another form of short-circuiting; "x[y]" evaluates x, then if that's None, doesn't bother evaluating y because it can't affect the result. ChrisA From p.f.moore at gmail.com Sun Sep 20 12:56:06 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 20 Sep 2015 11:56:06 +0100 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On 20 September 2015 at 00:40, Tim Peters wrote: > [Guido] >> Thanks! I'd accept this (and I'd reject 504 at the same time). I like the >> secrets name. I wonder though, should the PEP propose a specific set of >> functions? (With the understanding that we might add more later.) > > The bikeshedding on that will be far more tedious than the > implementation. I'll get it started :-) > > No attempt to be minimal here. More-than-less "obvious" is more important: > > Bound methods of a SystemRandom instance > .randrange() > .randint() > .randbits() > renamed from .getrandbits() > .randbelow(exclusive_upper_bound) > renamed from private ._randbelow() > .choice() > > Token functions > .token_bytes(nbytes) > another name for os.urandom() > .token_hex(nbytes) > same, but return string of ASCII hex digits > .token_url(nbytes) > same, but return URL-safe base64-encoded ASCII > .token_alpha(alphabet, nchars) > string of `nchars` characters drawn uniformly > from `alphabet` Given where this started, I'd suggest renaming token_alpha as "password". Beginners wouldn't necessarily associate the term "token" with the problem "I want to generate a random password" [1]. Maybe add a short recipe showing how to meet constraints like "at least 2 digits" by simply generating repeatedly until a valid password is found. For a bit of extra bikeshedding, I'd make alphabet the second, optional, parameter and default it to string.ascii_letters+string.digits+string.punctuation, as that's often what password constraints require. Or at the very least, document how to use the module functions for the common tasks we see people getting wrong. But I thought the idea here was to make doing things the right way obvious, for people who don't read documentation, so I'd prefer to see the functions exposed by the module named based on the problems they solve, not on the features they provide. (Even if that involves a little duplication, and/or a split between "high level" and "low level" APIs). Paul. [1] I'd written a spec for password() before I spotted that it was the same as token_alpha :-( From p.f.moore at gmail.com Sun Sep 20 13:05:52 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 20 Sep 2015 12:05:52 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150920073157.GV31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: On 20 September 2015 at 08:31, Steven D'Aprano wrote: > I'm not convinced that we should generalise this beyond the three > original examples of attribute access, item lookup and function call. I > think that applying ? to arbitrary operators is a case of "YAGNI". Or > perhaps, "You Shouldn't Need It". Agreed. Does this need to be an operator? How about the following: class Maybe: def __getattr__(self, attr): return None def __getitem__(self, idx): return None def __call__(self, *args, **kw): return None def maybe(obj): return Maybe() if obj is None else obj attr = maybe(obj).spam elt = maybe(obj)[n] result = maybe(callback)(args) The Maybe class could be hidden, and the Maybe() object a singleton (making my poor naming a non-issue :-)) and if it's felt sufficiently useful, the maybe() function could be a builtin. Usage of the result of maybe() outside of the above 3 contexts should simply be "not supported" - don't worry about trying to stop people doing weird things, just make it clear that the intent is only to support the 3 given idiomatic usages. Paul. From phd at phdru.name Sun Sep 20 13:29:26 2015 From: phd at phdru.name (Oleg Broytman) Date: Sun, 20 Sep 2015 13:29:26 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <20150920112926.GA5178@phdru.name> On Sun, Sep 20, 2015 at 12:05:52PM +0100, Paul Moore wrote: > On 20 September 2015 at 08:31, Steven D'Aprano wrote: > > I'm not convinced that we should generalise this beyond the three > > original examples of attribute access, item lookup and function call. I > > think that applying ? to arbitrary operators is a case of "YAGNI". Or > > perhaps, "You Shouldn't Need It". > > Agreed. > > Does this need to be an operator? How about the following: > > class Maybe: > def __getattr__(self, attr): return None > def __getitem__(self, idx): return None > def __call__(self, *args, **kw): return None > > def maybe(obj): > return Maybe() if obj is None else obj > > attr = maybe(obj).spam > elt = maybe(obj)[n] > result = maybe(callback)(args) > > The Maybe class could be hidden, and the Maybe() object a singleton > (making my poor naming a non-issue :-)) and if it's felt sufficiently > useful, the maybe() function could be a builtin. > > Usage of the result of maybe() outside of the above 3 contexts should > simply be "not supported" - don't worry about trying to stop people > doing weird things, just make it clear that the intent is only to > support the 3 given idiomatic usages. PyMaybe - a Python implementation of the Maybe pattern. Seems to be quite elaborated. https://github.com/ekampf/pymaybe > Paul. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Sun Sep 20 13:54:48 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Sep 2015 21:54:48 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On 20 September 2015 at 09:00, Chris Angelico wrote: > On Sun, Sep 20, 2015 at 6:57 AM, Guido van Rossum wrote: >> [in response to Steven D'Aprano's proto-PEP] >> Hopefully someone on the peps team can commit your PEP in the repo. It's >> probably going to be PEP 506. > > I think that's my cue! > > PEP 506 created and pushed. I've manually converted the original text > to RST, but if that was a bad idea, I can revert to text. And I've now withdrawn PEP 504 in favour of Steven's approach in this PEP. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Sep 20 14:26:42 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Sep 2015 22:26:42 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On 20 September 2015 at 20:56, Paul Moore wrote: > Given where this started, I'd suggest renaming token_alpha as > "password". Beginners wouldn't necessarily associate the term "token" > with the problem "I want to generate a random password" [1]. Maybe add > a short recipe showing how to meet constraints like "at least 2 > digits" by simply generating repeatedly until a valid password is > found. > > For a bit of extra bikeshedding, I'd make alphabet the second, > optional, parameter and default it to > string.ascii_letters+string.digits+string.punctuation, as that's often > what password constraints require. > > Or at the very least, document how to use the module functions for the > common tasks we see people getting wrong. But I thought the idea here > was to make doing things the right way obvious, for people who don't > read documentation, so I'd prefer to see the functions exposed by the > module named based on the problems they solve, not on the features > they provide. (Even if that involves a little duplication, and/or a > split between "high level" and "low level" APIs). Right, I'd suggest the following breakdown. * Arbitrary password generation (also covers passphrase generation from a word list): secrets.password(result_len: int, alphabet=string.ascii_letters+string.digits+string.punctuation: T) -> T * Binary token generation ("num_random_bytes" is the arg to os.urandom, not the length of result): secrets.token(num_random_bytes: int) -> bytes secrets.token_hex(num_random_bytes: int) -> bytes secrets.token_urlsafe_base64(num_random_bytes: int) -> bytes * Serial number generation ("num_random_bytes" is the arg to os.urandom, not the length of result): secrets.serial_number(num_random_bytes: int) -> int * Constant time secret comparison (aka hmac.compare_digest): secrets.equal(a: T, b: T) -> bool * Lower level building blocks: secrets.choice(container) # Hold off on other SystemRandom methods? (I don't have a strong opinion on that last point, as it's the higher level APIs that I think are the important aspect of this proposal) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From luciano at ramalho.org Sun Sep 20 15:59:54 2015 From: luciano at ramalho.org (Luciano Ramalho) Date: Sun, 20 Sep 2015 10:59:54 -0300 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: <55FDAC74.7050001@mail.de> References: <55FDAC74.7050001@mail.de> Message-ID: Chris, I don't think students should be worrying about writing code that is Python 2 and Python 3 compatible. That's a concern only for people who write libraries, tools and frameworks for others to use, and I do not think these are the kinds of programs students usually do. Even if they are doing something along those lines, they should be focusing on other more important features of the programs rather than whether they run on Python 2 and on Python 3. Having said that, I'd also like to add that I don't think ``from __future__ import unicode_literals`` is a great idea for making code 2/3 compatible nowadays. It was necessary before the u'' prefix was reinstated in Python 3.3, but since u'' is back it's much better to be explicit in your literals rather than dealing with runtime errors because of the blanket effect of the unicode_literals import. Anyone who cares about 2/3 compatibility should mark every single literal with a u'' or a b'' prefix. But students should not be distracted by this. They should be using Python 3 only ;-). Cheers, Luciano On Sat, Sep 19, 2015 at 3:41 PM, Sven R. Kunze wrote: > I totally agree here. > > > On 19.09.2015 19:50, Chris Barker wrote: > > Hi all, > > the common advise, these days, if you want to write py2/3 compatible code, > is to do: > > from __future__ import absolute_import > from __future__ import division > from __future__ import print_function > from __future__ import unicode_literals > > https://docs.python.org/2/howto/pyporting.html#prevent-compatibility-regressions > > I'm trying to do this in my code, and teaching my students to do it to. > > but that's actually a lot of code to write. > > It would be nice to have a: > > from __future__ import py3 > > or something like that, that would do all of those in one swipe. > > IIIC, l can't make a little module that does that, because the __future__ > imports only effect the module in which they are imported > > Sure, it's not a huge deal, but it would make it easier for folks wanting to > keep up this best practice. > > Of course, this wouldn't happen until 2.7.11, if an when there even is one, > but it would be nice to get it on the list.... > > -Chris > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg From abarnert at yahoo.com Sun Sep 20 23:34:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 20 Sep 2015 14:34:34 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <5DE97370-B0FD-49B1-A22F-08407951E68B@yahoo.com> On Sep 20, 2015, at 04:05, Paul Moore wrote: > > Does this need to be an operator? How about the following: > > class Maybe: > def __getattr__(self, attr): return None > def __getitem__(self, idx): return None > def __call__(self, *args, **kw): return None > > def maybe(obj): > return Maybe() if obj is None else obj > > attr = maybe(obj).spam > elt = maybe(obj)[n] > result = maybe(callback)(args) But try this for calling a method on a possibly-null object: result = maybe(maybe(spam).eggs)(cheese) From mertz at gnosis.cx Sun Sep 20 23:47:16 2015 From: mertz at gnosis.cx (David Mertz) Date: Sun, 20 Sep 2015 14:47:16 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: Paul Moore's idea is WAAYY better than the ugly ? pseudo-operator. `maybe()` reads just like a regular function (because it is), and we don't need to go looking for Perl (nor Haskell) in some weird extra syntax that will confuse beginners. On Sun, Sep 20, 2015 at 4:05 AM, Paul Moore wrote: > On 20 September 2015 at 08:31, Steven D'Aprano > wrote: > > I'm not convinced that we should generalise this beyond the three > > original examples of attribute access, item lookup and function call. I > > think that applying ? to arbitrary operators is a case of "YAGNI". Or > > perhaps, "You Shouldn't Need It". > > Agreed. > > Does this need to be an operator? How about the following: > > class Maybe: > def __getattr__(self, attr): return None > def __getitem__(self, idx): return None > def __call__(self, *args, **kw): return None > > def maybe(obj): > return Maybe() if obj is None else obj > > attr = maybe(obj).spam > elt = maybe(obj)[n] > result = maybe(callback)(args) > > The Maybe class could be hidden, and the Maybe() object a singleton > (making my poor naming a non-issue :-)) and if it's felt sufficiently > useful, the maybe() function could be a builtin. > > Usage of the result of maybe() outside of the above 3 contexts should > simply be "not supported" - don't worry about trying to stop people > doing weird things, just make it clear that the intent is only to > support the 3 given idiomatic usages. > > Paul. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Sep 21 00:07:43 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 20 Sep 2015 15:07:43 -0700 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On Sep 20, 2015, at 05:26, Nick Coghlan wrote: > >> On 20 September 2015 at 20:56, Paul Moore wrote: >> Given where this started, I'd suggest renaming token_alpha as >> "password". Beginners wouldn't necessarily associate the term "token" >> with the problem "I want to generate a random password" [1]. Maybe add >> a short recipe showing how to meet constraints like "at least 2 >> digits" by simply generating repeatedly until a valid password is >> found. >> >> For a bit of extra bikeshedding, I'd make alphabet the second, >> optional, parameter and default it to >> string.ascii_letters+string.digits+string.punctuation, as that's often >> what password constraints require. >> >> Or at the very least, document how to use the module functions for the >> common tasks we see people getting wrong. But I thought the idea here >> was to make doing things the right way obvious, for people who don't >> read documentation, so I'd prefer to see the functions exposed by the >> module named based on the problems they solve, not on the features >> they provide. (Even if that involves a little duplication, and/or a >> split between "high level" and "low level" APIs). > > Right, I'd suggest the following breakdown. > > * Arbitrary password generation (also covers passphrase generation > from a word list): > > secrets.password(result_len: int, > alphabet=string.ascii_letters+string.digits+string.punctuation: T) -> > T If T is a word list--that is, an Iterable of str or bytes--you want to return a str or a bytes, not a T. Also, making it work that generically will make the code much more complicated, to the point where it no longer serves as useful sample code to rank novices. You have to extract the first element of T, then do your choosing off chain([first], T) instead of off T, then type(first).join; all of that is more complicated than the actual logic, and will obscure the important part we want novices to learn if they read the source. Also, I think for word lists, I think you'd want a way to specify actual passphrases vs. the xkcd 936 idea of using passphrases as passwords even for sites that don't accept spaces, like "correcthorsebatterystaple". Maybe via a sep=' ' parameter? That would be very confusing if it's ignored when T is string-like but used when T is a non-string-like iterable of string-likes. I think it's better to require T to be string-like than to try to generalize it, and maybe add a separate passphrase function that takes (words: Sequence[T], sep: T) -> T. (Although I'm not sure how to default to ' ' vs b' ' based on the type of T... But maybe this does need to handle bytes, so Sequence[str] is fine?) > * Binary token generation ("num_random_bytes" is the arg to > os.urandom, not the length of result): > > secrets.token(num_random_bytes: int) -> bytes > secrets.token_hex(num_random_bytes: int) -> bytes > secrets.token_urlsafe_base64(num_random_bytes: int) -> bytes > > * Serial number generation ("num_random_bytes" is the arg to > os.urandom, not the length of result): > > secrets.serial_number(num_random_bytes: int) -> int > > * Constant time secret comparison (aka hmac.compare_digest): > > secrets.equal(a: T, b: T) -> bool > > * Lower level building blocks: > > secrets.choice(container) > # Hold off on other SystemRandom methods? > > (I don't have a strong opinion on that last point, as it's the higher > level APIs that I think are the important aspect of this proposal) I think randrange is definitely worth having. Even the OpenSSL and arc4random APIs provide something equivalent. If you're a novice, and following a blog post that says to use your language's equivalent of randbelow(1000000), are you going to think of choice(range(1000000))? And, if you do, are you going to convince yourself that this is reasonable and not going to create a slew of million-element lists? From guido at python.org Mon Sep 21 00:50:10 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Sep 2015 15:50:10 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: Actually if anything reminds me of Haskell it's a 'Maybe' type. :-( But I do side with those who find '?' too ugly to consider. On Sun, Sep 20, 2015 at 2:47 PM, David Mertz wrote: > Paul Moore's idea is WAAYY better than the ugly ? pseudo-operator. > `maybe()` reads just like a regular function (because it is), and we don't > need to go looking for Perl (nor Haskell) in some weird extra syntax that > will confuse beginners. > > On Sun, Sep 20, 2015 at 4:05 AM, Paul Moore wrote: > >> On 20 September 2015 at 08:31, Steven D'Aprano >> wrote: >> > I'm not convinced that we should generalise this beyond the three >> > original examples of attribute access, item lookup and function call. I >> > think that applying ? to arbitrary operators is a case of "YAGNI". Or >> > perhaps, "You Shouldn't Need It". >> >> Agreed. >> >> Does this need to be an operator? How about the following: >> >> class Maybe: >> def __getattr__(self, attr): return None >> def __getitem__(self, idx): return None >> def __call__(self, *args, **kw): return None >> >> def maybe(obj): >> return Maybe() if obj is None else obj >> >> attr = maybe(obj).spam >> elt = maybe(obj)[n] >> result = maybe(callback)(args) >> >> The Maybe class could be hidden, and the Maybe() object a singleton >> (making my poor naming a non-issue :-)) and if it's felt sufficiently >> useful, the maybe() function could be a builtin. >> >> Usage of the result of maybe() outside of the above 3 contexts should >> simply be "not supported" - don't worry about trying to stop people >> doing weird things, just make it clear that the intent is only to >> support the 3 given idiomatic usages. >> >> Paul. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mehaase at gmail.com Mon Sep 21 04:35:03 2015 From: mehaase at gmail.com (Mark E. Haase) Date: Sun, 20 Sep 2015 22:35:03 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: On the day I started this thread, I wrote a Python module that does what maybe() does. I hadn't seen PyMaybe yet, and I couldn't think of any good names for my module's functions, so my module was disappointingly ugly. PyMaybe is exactly what I *wish* I had written that day. For comparison, here's the code from my first post in this thread and it's maybe-ized version. response = json.dumps({ 'created': created?.isoformat(), 'updated': updated?.isoformat(), ... }) response = json.dumps({ 'created': maybe(created).isoformat(), 'updated': maybe(updated).isoformat(), ... }) Pros: 1. No modification to Python grammar. 2. More readable: it's easy to overlook ? when skimming quickly, but "maybe()" is easy to spot. 3. More intuitive: the name "maybe" gives a hint at what it might do, whereas if you've never seen "?." you would need to google it. (Googling punctuation is obnoxious.) Cons: 1. Doesn't short circuit: "maybe(family_name).upper().strip()" will fail if family_name is None.[1] You might try "maybe(maybe(family_name).upper()).strip()", but that is tricky to read and still isn't quite right: if family_name is not None, then it *should* be an error if "upper" is not an attribute of it. The 2-maybes form covers up that error. I'm sure there will be differing opinions on whether this type of operation should short circuit. Some will say that we shouldn't be writing code that way: if you need to chain calls, then use some other syntax. But I think the example of upper case & strip is a good example of a perfectly reasonable thing to do. These kinds of operations are pretty common when you're interfacing with some external system or external data that has a concept of null (databases, JSON, YAML, argparse, any thin wrapper around C library, etc.). This conversation has really focused on the null aware attribute access, but the easier and more defensible use case is the null coalesce operator, spelled "??" in C# and Dart. It's easy to find popular packages that use something like "retries = default if default is not None else cls.DEFAULT" to supply default instances.[2] Other packages do something like "retries = default or cls.DEFAULT"[3], which is worse because it easy to overlook the implicit coalescing of the left operand. In fact, the top hit for "python null coalesce" is StackOverflow, and the top-voted answer says to use "or".[4] (The answer goes on to explain the nuance of using "or" to coalesce, but how many developers read that far?) *In the interest of finding some common ground, I'd like to get some feedback on the coalesce operator.* Maybe that conversation will yield some insight into the other "None aware" operators. A) Is coalesce a useful feature? (And what are the use cases?) B) If it is useful, is it important that it short circuits? (Put another way, could a function suffice?) C) If it should be an operator, is "??" an ugly spelling? >>> retries = default ?? cls.DEFAULT D) If it should be an operator, are any keywords more aesthetically pleasing? (I expect zero support for adding a new keyword.) >>> retries = default else cls.DEFAULT >>> retries = try default or cls.DEFAULT >>> retries = try default else cls.DEFAULT >>> retries = try default, cls.DEFAULT >>> retries = from default or cls.DEFAULT >>> retries = from default else cls.DEFAULT >>> retries = from default, cls.DEFAULT My answers: A) It's useful: supplying default instances for optional values is an obvious and common use case. B) It should short circuit, because the patterns it replaces (using ternary operator or "or") also do. C) It's too restrictive to cobble a new operator out of existing keywords; "??" isn't hard to read when it is separated by whitespace, as Pythonistas typically do between a binary operator and its operands. D) I don't find any of these easier to read or write than "??". [1] I say "should", but actually PyMaybe does something underhanded so that this expression does not fail: "maybe(foo).upper()" returns a "Nothing" instance, not "None". But Nothing has "def __repr__(self): return repr(None)". So if you try to print it out, you'll think you have a None instance, but it won't behave like one. If you try to JSON serialize it, you get a hideously confusing error: "TypeError: None is not JSON serializable". For those not familiar: the JSON encoder can definitely serialize None: it becomes a JSON "null". A standard implementation of maybe() should _not_ work this way. [2] https://github.com/shazow/urllib3/blob/master/urllib3/util/retry.py#L148 [3] https://github.com/kennethreitz/requests/blob/46ff1a9a543cc4d33541aa64c94f50f0a698736e/requests/hooks.py#L25 [4] http://stackoverflow.com/a/4978745/122763 On Sun, Sep 20, 2015 at 6:50 PM, Guido van Rossum wrote: > Actually if anything reminds me of Haskell it's a 'Maybe' type. :-( > > But I do side with those who find '?' too ugly to consider. > > On Sun, Sep 20, 2015 at 2:47 PM, David Mertz wrote: > >> Paul Moore's idea is WAAYY better than the ugly ? pseudo-operator. >> `maybe()` reads just like a regular function (because it is), and we don't >> need to go looking for Perl (nor Haskell) in some weird extra syntax that >> will confuse beginners. >> >> On Sun, Sep 20, 2015 at 4:05 AM, Paul Moore wrote: >> >>> On 20 September 2015 at 08:31, Steven D'Aprano >>> wrote: >>> > I'm not convinced that we should generalise this beyond the three >>> > original examples of attribute access, item lookup and function call. I >>> > think that applying ? to arbitrary operators is a case of "YAGNI". Or >>> > perhaps, "You Shouldn't Need It". >>> >>> Agreed. >>> >>> Does this need to be an operator? How about the following: >>> >>> class Maybe: >>> def __getattr__(self, attr): return None >>> def __getitem__(self, idx): return None >>> def __call__(self, *args, **kw): return None >>> >>> def maybe(obj): >>> return Maybe() if obj is None else obj >>> >>> attr = maybe(obj).spam >>> elt = maybe(obj)[n] >>> result = maybe(callback)(args) >>> >>> The Maybe class could be hidden, and the Maybe() object a singleton >>> (making my poor naming a non-issue :-)) and if it's felt sufficiently >>> useful, the maybe() function could be a builtin. >>> >>> Usage of the result of maybe() outside of the above 3 contexts should >>> simply be "not supported" - don't worry about trying to stop people >>> doing weird things, just make it clear that the intent is only to >>> support the 3 given idiomatic usages. >>> >>> Paul. >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Mark E. Haase 202-815-0201 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Sep 21 05:50:16 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 21 Sep 2015 13:50:16 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <20150921035016.GX31152@ando.pearwood.info> On Sun, Sep 20, 2015 at 07:38:18PM +1000, Chris Angelico wrote: > On Sun, Sep 20, 2015 at 5:31 PM, Steven D'Aprano wrote: > > Technically, x.y x[y] and x(y) aren't operators, but for the sake of > > convenience I'll call them such. Even though these are binary operators, > > the ? only shortcuts according to the x, not the y. So we can call > > these ?. ?[] ?() operators "pseudo-unary" operators rather than binary > > operators. > > That's how all Python's short-circuiting works - based on the value of > what's on the left, decide whether or not to evaluate what's on the > right. (Well, nearly all - if/else evaluates the middle first, but > same difference.) This is another form of short-circuiting; "x[y]" > evaluates x, then if that's None, doesn't bother evaluating y because > it can't affect the result. I think you are mistaken about x[y]: py> None[print("side effect")] side effect Traceback (most recent call last): File "", line 1, in TypeError: 'NoneType' object is not subscriptable That's why x?[y] is a proposal. -- Steve From rosuav at gmail.com Mon Sep 21 06:05:15 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 21 Sep 2015 14:05:15 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150921035016.GX31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <20150921035016.GX31152@ando.pearwood.info> Message-ID: On Mon, Sep 21, 2015 at 1:50 PM, Steven D'Aprano wrote: > On Sun, Sep 20, 2015 at 07:38:18PM +1000, Chris Angelico wrote: >> On Sun, Sep 20, 2015 at 5:31 PM, Steven D'Aprano wrote: >> > Technically, x.y x[y] and x(y) aren't operators, but for the sake of >> > convenience I'll call them such. Even though these are binary operators, >> > the ? only shortcuts according to the x, not the y. So we can call >> > these ?. ?[] ?() operators "pseudo-unary" operators rather than binary >> > operators. >> >> That's how all Python's short-circuiting works - based on the value of >> what's on the left, decide whether or not to evaluate what's on the >> right. (Well, nearly all - if/else evaluates the middle first, but >> same difference.) This is another form of short-circuiting; "x[y]" >> evaluates x, then if that's None, doesn't bother evaluating y because >> it can't affect the result. > > I think you are mistaken about x[y]: > > py> None[print("side effect")] > side effect > Traceback (most recent call last): > File "", line 1, in > TypeError: 'NoneType' object is not subscriptable > > That's why x?[y] is a proposal. Oops, that was a typo in my statement. I meant "x?[y]" should behave that way - once it's discovered that x is None, the evaluation of y can't affect the result, and so it doesn't get evaluated (as per the normal short-circuiting rules). Yes, x[y] has to evaluate both x and y (after all, the value of y is passed to __getitem__). Sorry for the confusion. ChrisA From steve at pearwood.info Mon Sep 21 06:06:19 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 21 Sep 2015 14:06:19 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <20150921040619.GY31152@ando.pearwood.info> On Sun, Sep 20, 2015 at 12:05:52PM +0100, Paul Moore wrote: > On 20 September 2015 at 08:31, Steven D'Aprano wrote: > > I'm not convinced that we should generalise this beyond the three > > original examples of attribute access, item lookup and function call. I > > think that applying ? to arbitrary operators is a case of "YAGNI". Or > > perhaps, "You Shouldn't Need It". > > Agreed. > > Does this need to be an operator? How about the following: Sadly, I think it does. Guido has (I think) ruled out the Null object design pattern, which makes me glad because I think it is horrid. But your Maybe class below is a watered down, weak version that (in my opinion) isn't worth bothering with. See below. class Maybe: def __getattr__(self, attr): return None def __getitem__(self, idx): return None def __call__(self, *args, **kw): return None def maybe(obj): return Maybe() if obj is None else obj And in action: py> maybe("spam").upper() # Works fine. 'SPAM' py> maybe(None).upper() Traceback (most recent call last): File "", line 1, in TypeError: 'NoneType' object is not callable It also fails for chained lookups: maybe(obj).spam['id'].ham will fail for the same reason. You could write this: maybe(maybe(obj).upper)() maybe(maybe(maybe(obj).spam)['id']).ham but that's simply awful. Avoiding that problem is why the Null object returns itself, but we've rightly ruled that out. This is why I think that if this is worth doing, it has to be some sort of short-circuiting operator or pseudo-operator: expression ? .spam.eggs.cheese can short-circuit the entire chain .spam.eggs.cheese, not just the first component. Otherwise, I don't think it's worth doing. -- Steve From anthony at xtfx.me Mon Sep 21 06:17:00 2015 From: anthony at xtfx.me (C Anthony Risinger) Date: Sun, 20 Sep 2015 23:17:00 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150919120624.GS31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150919120624.GS31152@ando.pearwood.info> Message-ID: On Sat, Sep 19, 2015 at 7:06 AM, Steven D'Aprano wrote: > On Sat, Sep 19, 2015 at 03:17:07AM -0500, C Anthony Risinger wrote: > > > I really liked this whole thread, and I largely still do -- I?think -- > but > > I'm not sure I like how `?` suddenly prevents whole blocks of code from > > being evaluated. Anything within the (...) or [...] is now skipped (IIUC) > > just because a `?` was added, which seems like it could have side effects > > on the surrounding state, especially since I expect people will use it > for > > squashing/silencing or as a convenient trick after the fact, possibly in > > code they did not originally write. > > I don't think this is any different from other short-circuiting > operators, particularly `and` and the ternary `if` operator: > > result = obj and obj.method(expression) > > result = obj.method(expression) if obj else default > > In both cases, `expression` is not evaluated if obj is falsey. That's > the whole point. Sure, but those all have white space and I can read what's happening. The `?` could appear anywhere without break. I don't like that, but, opinion. > > If the original example included a `?` like so: > > > > response = json.dumps?({ > > 'created': created?.isoformat(), > > 'updated': updated?.isoformat(), > > ... > > }) > > > > should "dumps" be None, the additional `?` (although though you can > barely > > see it) prevents *everything else* from executing. > > We're still discussing the syntax and semantics of this, so I could be > wrong, but my understanding of this is that the *first* question mark > prevents the expressions in the parens from being executed: > > json.dumps?( ... ) > > evaluates as None if json.dumps is None, otherwise it evaluates the > arguments and calls the dumps object. In other words, rather like this: > > _temp = json.dumps # temporary value > if _temp is None: > response = None > else: > response = _temp({ > 'created': None if created is None else created.isoformat(), > 'updated': None if updated is None else updated.isoformat(), > ... > }) > del _temp > > > except the _temp name isn't actually used. The whole point is to avoid > evaluating an expression (attribute looking, index/key lookup, function > call) which will fail if the object is None, and if you're not going to > call the function, why evaluate the arguments to the function? > Yes that is how I understand it as well. I'm suggesting it's hard to see. I understand the concept as "None cancellation", because if the left is None, the right is cancelled. This lead me here: * This is great, want to use all the time! * First-level language support, shouldn't I use? Does feels useful/natural * How can I make my APIs cancellation-friendly? * I can write None-centric APIs, that often collapse to None * Now maybe user code does stuff like `patient.visits?.september?.records` to get all records in September (if any, else None) * Since both `?` points would *prefer* None, if the result is None, I now have to jump around looking for who done it * If I don't have debugger ATM, I'm breaking it up a lot for good 'ol print(...), only way * I don't think I like this any more :( I especially don't like the idea of seeing it multiple times quickly, and the potential impact to debugging. The truth is I want to like this but I feel like it opens a can of worms (as seen by all the wild operators this proposal "naturally" suggests). > > Usually when I want to use this pattern, I find I just need to write > things > > out more. The concept itself vaguely reminds me of PHP's use of `@` for > > squashing errors. > > I had to look up PHP's @ and I must say I'm rather horrified. According > to the docs, all it does is suppress the error reporting, it does > nothing to prevent or recover from errors. There's not really an > equivalent in Python, but I suppose this is the closest: > > # similar to PHP's $result = @(expression); > try: > result = expression > except: > result = None > > > This is nothing like this proposal. It doesn't suppress arbitrary > errors. It's more like a conditional: > > # result = obj?(expression) > if obj is None: > result = None > else: > result = obj(expression) > > > If `expression` raises an exception, it will still be raised, but only > if it is actually evaluated, just like anything else protected by an > if...else or short-circuit operator. I did say vaguely :) but it is extremely hideous I agree. The part that made me think of this is the would be desire for things to become None (so, or example, wanting to avoid throwing typed/informative exceptions if possible) so they'd then be more useful with `?`. -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Sep 21 06:20:53 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 21 Sep 2015 06:20:53 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150919120624.GS31152@ando.pearwood.info> Message-ID: <55FF85A5.8000909@mail.de> On 21.09.2015 06:17, C Anthony Risinger wrote: > Yes that is how I understand it as well. I'm suggesting it's hard to > see. I understand the concept as "None cancellation", because if the > left is None, the right is cancelled. This lead me here: > > * This is great, want to use all the time! > * First-level language support, shouldn't I use? Does feels useful/natural > * How can I make my APIs cancellation-friendly? > * I can write None-centric APIs, that often collapse to None > * Now maybe user code does stuff like > `patient.visits?.september?.records` to get all records in September > (if any, else None) > * Since both `?` points would *prefer* None, if the result is None, I > now have to jump around looking for who done it > * If I don't have debugger ATM, I'm breaking it up a lot for good 'ol > print(...), only way > * I don't think I like this any more :( > > I especially don't like the idea of seeing it multiple times quickly, > and the potential impact to debugging. The truth is I want to like > this but I feel like it opens a can of worms (as seen by all the wild > operators this proposal "naturally" suggests). > It's interesting to see that everybody who ponders more than a minute about it, really fast comes to the same conclusion. Best, Sven From stephen at xemacs.org Mon Sep 21 06:22:24 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 21 Sep 2015 13:22:24 +0900 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: <55FDAC74.7050001@mail.de> Message-ID: <878u80xuyn.fsf@uwakimon.sk.tsukuba.ac.jp> Luciano Ramalho writes: > I don't think students should be worrying about writing code that is > Python 2 and Python 3 compatible. I suppose Chris's students, as for many of those who post RFEs to aid in teaching Python programming (vs. using Python to teach programming), are professional programmers, not full-time students. I suspect it's their job to write such code. One thing that I've learned in over a decade on this list is that the "consenting adults" attitude is very practical in focusing discussions here. If some posts "I have this use case , that I'm addressing with this code: ", it's perfectly reasonable and often useful to reply, "Don't use that code: in Python the TOOWTDI is ." But most of the time "that use case is invalid" isn't any help. The use case may even be "stupid", but mandated by employer or by contract with client, or by existing code that nobody knows how to maintain. YMMV, but I've been emparrassed every time I've written something to the effect of "you should make your use case go away." The OP usually cannot make it go away. The most that usually should be said is "it's very difficult to serve that use case elegantly in Python, and here's why." From greg.ewing at canterbury.ac.nz Mon Sep 21 06:28:08 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 21 Sep 2015 16:28:08 +1200 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: Message-ID: <55FF8758.70406@canterbury.ac.nz> Chris Barker wrote: > It would be nice to have a: > > from __future__ import py3 Or maybe from __future__ import * should work? -- Greg From alexander.belopolsky at gmail.com Mon Sep 21 06:32:14 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Sep 2015 00:32:14 -0400 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: <55FF8758.70406@canterbury.ac.nz> References: <55FF8758.70406@canterbury.ac.nz> Message-ID: On Mon, Sep 21, 2015 at 12:28 AM, Greg Ewing wrote: > Chris Barker wrote: > >> It would be nice to have a: >> >> from __future__ import py3 >> > > Or maybe > > from __future__ import * > > should work? +1 (with all the admonitions against the "from whatever import *" construct being still applicable) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Sep 21 06:35:37 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 21 Sep 2015 14:35:37 +1000 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: <55FF8758.70406@canterbury.ac.nz> References: <55FF8758.70406@canterbury.ac.nz> Message-ID: On Mon, Sep 21, 2015 at 2:28 PM, Greg Ewing wrote: > Chris Barker wrote: >> >> It would be nice to have a: >> >> from __future__ import py3 > > > Or maybe > > from __future__ import * > > should work? Hah! Even if it were made to work, though, it'd mean you suddenly and unexpectedly get backward-incompatible changes when you run your code on a new version. Effectively, that directive would say "hey, you know that __future__ feature, well, I'd rather just not bother - get the breakage right away". Kinda defeats the purpose :) ChrisA From srkunze at mail.de Mon Sep 21 06:52:13 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 21 Sep 2015 06:52:13 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <55FF8CFD.9070906@mail.de> On 21.09.2015 04:35, Mark E. Haase wrote: > A) Is coalesce a useful feature? (And what are the use cases?) I limit myself to materializing default arguments as in: def a(b=None): b = b or {} ... Because its a well known theme (and issue) of the mutability of default arguments of Python. > B) If it is useful, is it important that it short circuits? (Put > another way, could a function suffice?) > C) If it should be an operator, is "??" an ugly spelling? > > >>> retries = default ?? cls.DEFAULT > The only difference between "or" and "??" is that "??" is None only, right? At least to me, the given use case above does not justify the introduction of "??". > D) If it should be an operator, are any keywords more aesthetically > pleasing? (I expect zero support for adding a new keyword.) > > >>> retries = default else cls.DEFAULT > >>> retries = try default or cls.DEFAULT > >>> retries = try default else cls.DEFAULT > >>> retries = try default, cls.DEFAULT > >>> retries = from default or cls.DEFAULT > >>> retries = from default else cls.DEFAULT > >>> retries = from default, cls.DEFAULT > > > My answers: > > A) It's useful: supplying default instances for optional values is an > obvious and common use case. Yes, "or" suffices in that case. > B) It should short circuit, because the patterns it replaces (using > ternary operator or "or") also do. They look ugly and unpleasant because they remind you to reduce the usage of None; not to make dealing with it more pleasant. > C) It's too restrictive to cobble a new operator out of existing > keywords; "??" isn't hard to read when it is separated by whitespace, > as Pythonistas typically do between a binary operator and its operands. > D) I don't find any of these easier to read or write than "??". "or" is easier to type (no special characters), I don't need to explain it to new staff, and it's more pleasant to the eye. I remember my missis telling me, after I showed her some C# code, that programmers tend to like weird special characters. Well, that might certainly be true. Special characters increase the visual noise and the mental strain when reading. They make the lines they are in special. I don't see anything special with "or" and with the single use case I have for it. :) Best, Sven From rosuav at gmail.com Mon Sep 21 06:57:14 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 21 Sep 2015 14:57:14 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <55FF8CFD.9070906@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On Mon, Sep 21, 2015 at 2:52 PM, Sven R. Kunze wrote: > I limit myself to materializing default arguments as in: > > def a(b=None): > b = b or {} > ... As long as you never need to pass in a specific empty dictionary, that's fine. That's the trouble with using 'or' - it's not checking for None, it's checking for falsiness. ChrisA From srkunze at mail.de Mon Sep 21 07:11:05 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 21 Sep 2015 07:11:05 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: <55FF9169.1080905@mail.de> On 21.09.2015 06:57, Chris Angelico wrote: > On Mon, Sep 21, 2015 at 2:52 PM, Sven R. Kunze wrote: >> I limit myself to materializing default arguments as in: >> >> def a(b=None): >> b = b or {} >> ... > As long as you never need to pass in a specific empty dictionary, > that's fine. That's the trouble with using 'or' - it's not checking > for None, it's checking for falsiness. True. Although I rarely pass a dynamic value to parameters with default arguments. But you are right, so what does this mean for "??" ? From random832 at fastmail.com Mon Sep 21 08:21:21 2015 From: random832 at fastmail.com (Random832) Date: Mon, 21 Sep 2015 02:21:21 -0400 Subject: [Python-ideas] add a single __future__ for py3? References: <55FF8758.70406@canterbury.ac.nz> Message-ID: Chris Angelico writes: > Even if it were made to work, though, it'd mean you suddenly and > unexpectedly get backward-incompatible changes when you run your code > on a new version. Effectively, that directive would say "hey, you know > that __future__ feature, well, I'd rather just not bother - get the > breakage right away". Kinda defeats the purpose :) Yeah, well, that won't be a problem for this use case until Python 2.8 comes out. Or do we expect *new* __future__ features to be added to maintenance releases of Python 2.7? From rosuav at gmail.com Mon Sep 21 10:02:34 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 21 Sep 2015 18:02:34 +1000 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: <55FF8758.70406@canterbury.ac.nz> Message-ID: On Mon, Sep 21, 2015 at 4:21 PM, Random832 wrote: > Chris Angelico writes: >> Even if it were made to work, though, it'd mean you suddenly and >> unexpectedly get backward-incompatible changes when you run your code >> on a new version. Effectively, that directive would say "hey, you know >> that __future__ feature, well, I'd rather just not bother - get the >> breakage right away". Kinda defeats the purpose :) > > Yeah, well, that won't be a problem for this use case until Python 2.8 > comes out. Or do we expect *new* __future__ features to be added to > maintenance releases of Python 2.7? The whole point of this is to be compatible also with Python 3, and new future directives can be added there. So your from __future__ import * would trigger generator_stop on 3.5, for instance. ChrisA From p.f.moore at gmail.com Mon Sep 21 10:10:03 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 21 Sep 2015 09:10:03 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: On 20 September 2015 at 23:50, Guido van Rossum wrote: > Actually if anything reminds me of Haskell it's a 'Maybe' type. :-( I warned you my choice of names was poor :-) Paul From greg.ewing at canterbury.ac.nz Mon Sep 21 07:15:53 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 21 Sep 2015 17:15:53 +1200 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: <55FF8758.70406@canterbury.ac.nz> Message-ID: <55FF9289.6050202@canterbury.ac.nz> Chris Angelico wrote: > On Mon, Sep 21, 2015 at 2:28 PM, Greg Ewing wrote: > >> from __future__ import * > > Even if it were made to work, though, it'd mean you suddenly and > unexpectedly get backward-incompatible changes when you run your code > on a new version. Properly implemented, it would use the time machine module to find every feature that will ever be implemented in Python. So once you had updated your code to be compatible with all of them, it would *never* break again! The neat thing is that it would take just one use of the time machine to backport this feature, and it would then bootstrap itself into existence. -- Greg From stephen at xemacs.org Mon Sep 21 10:48:28 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 21 Sep 2015 17:48:28 +0900 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <877fnkxin7.fsf@uwakimon.sk.tsukuba.ac.jp> Mark E. Haase writes: > This conversation has really focused on the null aware attribute access, > but the easier and more defensible use case is the null coalesce operator, > spelled "??" in C# and Dart. It's easy to find popular packages that use > something like "retries = default if default is not None else cls.DEFAULT" To me, it's less defensible. Eg, currently TOOWTDI for "??" is the idiom quoted above. I sorta like the attribute access, attribute fetch, and function call versions, though I probably won't use them. Also some functions need to accept None as an actual argument, and the module defines a module-specific sentinel. The inability to handle such sentinels is a lack of generality that the "x if x is not sentinel else y" idiom doesn't suffer from, so "??" itself can't become TOOWTDI. I don't write "def foo(default):" (ever that I can recall), so using "default" in retries = default if default is not None else cls.DEFAULT confuses me. Realistically, I would be writing retries = retries if retries is not None else cls.RETRIES (or perhaps the RHS would be "self.retries"). That doesn't look that bad to me (perhaps from frequent repetition). It's verbose, but I don't see a need to chain it, unlike "?.". For "?.", some Pythonistas would say "just don't", but I agree that often it's natural to chain. > to supply default instances.[2] Other packages do something like > "retries = default or cls.DEFAULT"[3], which is worse because it > easy to overlook the implicit coalescing of the left operand. Worse? It's true that it's more risky because it's all falsies, not just the None sentinel, but I think "consenting adults" applies here. I don't know about the packages you looked at, but I often use "x = s or y" where I really want to trap the falsey value of the expected type, perhaps as well as None, and I use the "x if s is not sentinel else y" idiom to substitute default values. I also use "or" in scripty applications and unit test setup functions where I want compact expression and I don't expect long-lived objects to be passed so I can easily figure out where the non-None falsey came from anyway. > A) Is coalesce a useful feature? (And what are the use cases?) Yes, for the whole group of operators. Several use cases for the other operators have already been proposed, but I wouldn't use them myself in current or past projects, and don't really foresee that changing. -0 for the group on the IAGNI principle. But for "??" specifically, it's just more compact AFAICS. I don't see where I would use x ?? y ?? z, so the compactness doesn't seem like that great a benefit. In practice, I think the use cases for "??" would be a strict subset of the use cases for the ternary operator, so you have to argue that "this special case *is* special enough" to have its own way to do it. I don't think it is. -1 > C) If it should be an operator, is "??" an ugly spelling? > > >>> retries = default ?? cls.DEFAULT Looks like metasyntax from pseudo-code that didn't get fixed to me. That would probably change if other ?x operators were added though. I have no comment on short-circuiting (no relevant experience), or keyword vs. punctuation spellings. On second thought: > D) If it should be an operator, are any keywords more aesthetically > pleasing? (I expect zero support for adding a new keyword.) > > >>> retries = default else cls.DEFAULT I kinda like this if-less else syntax for the symmetry with else-less if. But on second thought I think it would persistently confuse me when reading, because it would be extremely natural to expect it to be another way of spelling "default or cls.DEFAULT". "try ... else ..." also has its attraction, but I suppose that would fail for the same reasons that the ternary operator is spelled "x if y else z" rather than "if y then x else z". From abarnert at yahoo.com Mon Sep 21 11:05:33 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 21 Sep 2015 02:05:33 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <877fnkxin7.fsf@uwakimon.sk.tsukuba.ac.jp> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <877fnkxin7.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <6E2CAA1D-F35C-4BD5-B314-B2E0291AE019@yahoo.com> On Sep 21, 2015, at 01:48, Stephen J. Turnbull wrote: >>>>> retries = default else cls.DEFAULT > > I kinda like this if-less else syntax for the symmetry with else-less > if. How do you parse this: a if b else c else d Feel free to answer either as a human reader or as CPython's LL(1) parser. From ncoghlan at gmail.com Mon Sep 21 11:25:00 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Sep 2015 19:25:00 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: On 21 September 2015 at 08:07, Andrew Barnert wrote: > If T is a word list--that is, an Iterable of str or bytes--you want to return a str or a bytes, not a T. > > Also, making it work that generically will make the code much more complicated, to the point where it no longer serves as useful sample code to rank novices. You have to extract the first element of T, then do your choosing off chain([first], T) instead of off T, then type(first).join; all of that is more complicated than the actual logic, and will obscure the important part we want novices to learn if they read the source. > > Also, I think for word lists, I think you'd want a way to specify actual passphrases vs. the xkcd 936 idea of using passphrases as passwords even for sites that don't accept spaces, like "correcthorsebatterystaple". Maybe via a sep=' ' parameter? That would be very confusing if it's ignored when T is string-like but used when T is a non-string-like iterable of string-likes. > > I think it's better to require T to be string-like than to try to generalize it, and maybe add a separate passphrase function that takes (words: Sequence[T], sep: T) -> T. (Although I'm not sure how to default to ' ' vs b' ' based on the type of T... But maybe this does need to handle bytes, so Sequence[str] is fine?) Simpler is better here, so I'll revise the text based suggestions to: secrets.password(result_len: int, alphabet=string.ascii_letters+string.digits+string.punctuation: str) -> str secrets.passphrase(result_len: int, words: Sequence[str], sep=' ') -> str >> * Lower level building blocks: >> >> secrets.choice(container) >> # Hold off on other SystemRandom methods? >> >> (I don't have a strong opinion on that last point, as it's the higher >> level APIs that I think are the important aspect of this proposal) > > I think randrange is definitely worth having. Even the OpenSSL and arc4random APIs provide something equivalent. If you're a novice, and following a blog post that says to use your language's equivalent of randbelow(1000000), are you going to think of choice(range(1000000))? And, if you do, are you going to convince yourself that this is reasonable and not going to create a slew of million-element lists? Sure, that makes sense, while still keeping the secrets module focused on integers. getrandbits() is an interesting one, as it opens up the option of "secrets.getrandbits(128).to_bytes()" as a pointlessly slower alternative to "secrets.token(128 // 8)", while "secrets.getrandbits(128)" itself would be directly equivalent to the proposed "secrets.serial_number(128 // 8)" So perhaps it makes sense to just drop the serial_number() idea and have getrandbits() instead. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Sep 21 11:29:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Sep 2015 19:29:24 +1000 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: Message-ID: On 20 September 2015 at 03:50, Chris Barker wrote: > Hi all, > > the common advise, these days, if you want to write py2/3 compatible code, > is to do: > > from __future__ import absolute_import > from __future__ import division > from __future__ import print_function > from __future__ import unicode_literals > > https://docs.python.org/2/howto/pyporting.html#prevent-compatibility-regressions > > I'm trying to do this in my code, and teaching my students to do it to. > > but that's actually a lot of code to write. For folks using IPython Notebook, I've been suggesting to various folks that a "Python 2/3 compatible" kernel that enables these features by default may be desirable. Ed Schofield of python-future.org was the last person I suggested that to, and he was interested in taking a look at the idea, but wasn't sure when he'd be able to find the time. So, if anyone's interested in exploring the creation of new Project Jupyter kernels... :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Mon Sep 21 11:56:56 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 21 Sep 2015 02:56:56 -0700 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: <55FF9289.6050202@canterbury.ac.nz> References: <55FF8758.70406@canterbury.ac.nz> <55FF9289.6050202@canterbury.ac.nz> Message-ID: On Sep 20, 2015, at 22:15, Greg Ewing wrote: > > Chris Angelico wrote: >>> On Mon, Sep 21, 2015 at 2:28 PM, Greg Ewing wrote: >>> from __future__ import * >> >> Even if it were made to work, though, it'd mean you suddenly and >> unexpectedly get backward-incompatible changes when you run your code >> on a new version. > > Properly implemented, it would use the time > machine module to find every feature that will > ever be implemented in Python. So once you had > updated your code to be compatible with all of > them, it would *never* break again! > > The neat thing is that it would take just one > use of the time machine to backport this feature, > and it would then bootstrap itself into existence. Well, I just tested it with 2.7.0, and it doesn't give me any future flags at all. Which proves that Guido is going to reject the feature (because otherwise he will would have useding the time machine, and he hasn't doinged), so there's no point discussing it any further. I thought maybe many-worlds could help, so I tried "from __alternate_timeline__ import *" first, but then I got "parse error on input\nFailed, modules loaded: none", and then my kernel panicked with a type error (needs more monads)". From p.f.moore at gmail.com Mon Sep 21 12:55:55 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 21 Sep 2015 11:55:55 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: On 21 September 2015 at 03:35, Mark E. Haase wrote: > A) Is coalesce a useful feature? (And what are the use cases?) There seem to be a few main use cases: 1. Dealing with functions that return a useful value or None to signal "no value". I suspect the right answer here is actually to rewrite the function to not do that in the first place. "Useful value or None" seems like a reasonable example of an anti-pattern in Python. 2. People forgetting a return at the end of the function. In that case, the error, while obscure, is reasonable, and should be fixed by fixing the function, not by working around it in the caller. 3. Using a library (or other function outside your control) that uses the "useful value or None" idiom. You have to make the best of a bad job here, but writing an adapter function that hides the complexity doesn't seem completely unreasonable. Nor does just putting the test inline and accepting that you're dealing with a less than ideal API. 4. Any others? I can't think of anything. Overall, I don't think coalesce is *that* useful, given that it seems like it'd mainly be used in situations where I'd recommend a more strategic fix to the code. > B) If it is useful, is it important that it short circuits? (Put another > way, could a function suffice?) Short circuiting is important, but to me that simply implies that the "useful value or None" approach is flawed *because* it needs short-circuiting to manage. In lazy languages like Haskell, the Maybe type is reasonable because short-circuiting is a natural consequence of laziness, and so not a problem. In languages like C#, the use of null as a sentinel probably goes back to C usage of NULL (i.e., it may not be a good approach there either, but history and common practice make it common enough that a fix is needed). > C) If it should be an operator, is "??" an ugly spelling? > > >>> retries = default ?? cls.DEFAULT Arbitrary punctuation as operators is not natural in Python, something like this should be a keyword IMO. > D) If it should be an operator, are any keywords more aesthetically > pleasing? (I expect zero support for adding a new keyword.) > > >>> retries = default else cls.DEFAULT > >>> retries = try default or cls.DEFAULT > >>> retries = try default else cls.DEFAULT > >>> retries = try default, cls.DEFAULT > >>> retries = from default or cls.DEFAULT > >>> retries = from default else cls.DEFAULT > >>> retries = from default, cls.DEFAULT Reusing existing keywords (specifically, all of the above) looks clumsy and forced to me. I agree that proposals to add a new keyword will probably never get off the ground, but none of the above suggestions look reasonable to me, and I can't think of anything else that does (particularly if you add "must be parseable" as a restriction!) Overall, I'm -0.5 on a "coalesce" operator. I can't see it having sufficient value, and I can't think of a syntax I'd consider justifying it. But if someone were to come up with a Guido-like blindingly obvious way to spell the operation, I would be fine with that (and may even start using it more often than I think). Paul From rosuav at gmail.com Mon Sep 21 15:02:27 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 21 Sep 2015 23:02:27 +1000 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: <55FF8758.70406@canterbury.ac.nz> <55FF9289.6050202@canterbury.ac.nz> Message-ID: On Mon, Sep 21, 2015 at 7:56 PM, Andrew Barnert via Python-ideas wrote: > I thought maybe many-worlds could help, so I tried "from __alternate_timeline__ import *" first, but then I got "parse error on input\nFailed, modules loaded: none", and then my kernel panicked with a type error (needs more monads)". > Your kernel isn't multitimeline compliant. Try recompiling it with the --with-polyads option, and make sure you don't use an ad-blocker. Dragging this thread back to some semblance of serious discussion... An alias like "py3" could be well-defined, but still rather not - and definitely not the star-import. Even adding an alias would be a problem for compatibility, because there would be Python versions that suddenly fail. Currently, future features monotonically increase as Python versions increase, so if "from __future__ import barry_as_FLUFL" works on 3.3.6, I would expect it to work on 3.4.1 as well. Adding an alias to 2.7.11 would mean adding it also to bugfix releases in the 3.x line, so "from __future__ import py3" would break on certain bugfix releases of all versions of Python until 3.6, at which point it would be available. Do you really want your code to run fine on 3.5.1 and 3.4.3, but not on 3.5.0? That would be a nightmare to deal with, unless you're writing code for 2.7.11+/3.6+. ChrisA From steve at pearwood.info Mon Sep 21 15:08:15 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 21 Sep 2015 23:08:15 +1000 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: <55FF8758.70406@canterbury.ac.nz> Message-ID: <20150921130813.GZ31152@ando.pearwood.info> On Mon, Sep 21, 2015 at 02:21:21AM -0400, Random832 wrote: > Chris Angelico writes: > > Even if it were made to work, though, it'd mean you suddenly and > > unexpectedly get backward-incompatible changes when you run your code > > on a new version. Effectively, that directive would say "hey, you know > > that __future__ feature, well, I'd rather just not bother - get the > > breakage right away". Kinda defeats the purpose :) > > Yeah, well, that won't be a problem for this use case until Python 2.8 > comes out. There will not be an official Python 2.8. https://www.python.org/dev/peps/pep-0404/ > Or do we expect *new* __future__ features to be added to > maintenance releases of Python 2.7? No. -- Steve From rosuav at gmail.com Mon Sep 21 15:27:24 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 21 Sep 2015 23:27:24 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: On Mon, Sep 21, 2015 at 8:55 PM, Paul Moore wrote: > There seem to be a few main use cases: > > 1. Dealing with functions that return a useful value or None to signal > "no value". I suspect the right answer here is actually to rewrite the > function to not do that in the first place. "Useful value or None" > seems like a reasonable example of an anti-pattern in Python. The alternative being to raise an exception? It's generally easier, when you can know in advance what kind of object you're expecting, to have a None return when there isn't one. For example, SQLAlchemy has .get(id) to return the object for a given primary key value, and it returns None if there's no such row in the database table - having to wrap that with try/except would be a pain. This isn't an error condition, and it's not like the special case of iteration (since an iterator could yield any value, it's critical to have a non-value way of signalling "end of iteration"). I don't want to see everything forced to "return or raise" just because someone calls this an anti-pattern. ChrisA From random832 at fastmail.com Mon Sep 21 15:45:38 2015 From: random832 at fastmail.com (Random832) Date: Mon, 21 Sep 2015 09:45:38 -0400 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: <20150921130813.GZ31152@ando.pearwood.info> References: <55FF8758.70406@canterbury.ac.nz> <20150921130813.GZ31152@ando.pearwood.info> Message-ID: <1442843138.3321719.389368849.52DEB79B@webmail.messagingengine.com> On Mon, Sep 21, 2015, at 09:08, Steven D'Aprano wrote: > On Mon, Sep 21, 2015 at 02:21:21AM -0400, Random832 wrote: > > Yeah, well, that won't be a problem for this use case until Python 2.8 > > comes out. > > There will not be an official Python 2.8. Well, yes, "until Python 2.8 comes out" was meant to be a synonym for "never". But Chris Angelico since pointed out that it would be a problem for Python 3 since it's intended to be used for 2/3 compatible scripts. I think I'd originally read the use case as "make Python 2 as similar to Python 3 as possible so that people learning on Python 2 won't learn as many bad habits", without anything about explicitly running the same scripts on Python 3. From p.f.moore at gmail.com Mon Sep 21 16:27:01 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 21 Sep 2015 15:27:01 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: On 21 September 2015 at 14:27, Chris Angelico wrote: > On Mon, Sep 21, 2015 at 8:55 PM, Paul Moore wrote: >> There seem to be a few main use cases: >> >> 1. Dealing with functions that return a useful value or None to signal >> "no value". I suspect the right answer here is actually to rewrite the >> function to not do that in the first place. "Useful value or None" >> seems like a reasonable example of an anti-pattern in Python. > > The alternative being to raise an exception? It's generally easier, > when you can know in advance what kind of object you're expecting, to > have a None return when there isn't one. For example, SQLAlchemy has > .get(id) to return the object for a given primary key value, and it > returns None if there's no such row in the database table - having to > wrap that with try/except would be a pain. This isn't an error > condition, and it's not like the special case of iteration (since an > iterator could yield any value, it's critical to have a non-value way > of signalling "end of iteration"). I don't want to see everything > forced to "return or raise" just because someone calls this an > anti-pattern. Agreed, that's not what should happen. It's hard to give examples without going into specific cases, but as an example, look at dict.get. The user can supply a "what to return if the key doesn't exist" argument. OK, many people leave it returning the default None, but they don't *have* to - dict.get itself doesn't do "useful value or None", it does "useful value or user-supplied default". All I'm saying is that people should look at *why* their functions return None instead of a useful result, and see if they can do better. My contention is that (given free rein) many times they can. Of course not all code has free rein, not all developers have the time to look for perfect APIs, etc. But in that case, returning a placeholder None (and accepting a little ugliness at the call site) isn't an impossible price to pay. Nothing more than "I don't think the benefit justifies adding a new operator to Python". Paul From steve at pearwood.info Mon Sep 21 16:41:31 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 00:41:31 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <20150921144130.GB31152@ando.pearwood.info> On Mon, Sep 21, 2015 at 11:55:55AM +0100, Paul Moore wrote: > On 21 September 2015 at 03:35, Mark E. Haase wrote: > > A) Is coalesce a useful feature? (And what are the use cases?) > > There seem to be a few main use cases: > > 1. Dealing with functions that return a useful value or None to signal > "no value". I suspect the right answer here is actually to rewrite the > function to not do that in the first place. "Useful value or None" > seems like a reasonable example of an anti-pattern in Python. I think that's a bit strong. Or perhaps much too strong. There are times where you can avoid the "None or value" pattern, since there is a perfectly decent empty value you can use instead of None. E.g. if somebody doesn't have a name, you can use "" instead of None, and avoid special treatment. But that doesn't always work. Suppose you want an optional (say) Dog object. There isn't such a thing as an empty Dog, so you have to use some other value to represent the lack of Dog. One could, I suppose, subclass Dog and build a (singleton? borg?) NoDog object, but that's overkill and besides it doesn't scale well if you have multiple types that need the same treatment. So I don't think it is correct, or helpful, to say that we should avoid the "None versus value" pattern. Sometimes we can naturally avoid it, but it also has perfectly reasonable uses. > Overall, I don't think coalesce is *that* useful, given that it seems > like it'd mainly be used in situations where I'd recommend a more > strategic fix to the code. Go back to the original use-case given, which, paraphrasing, looks something like this: result = None if value is None else value['id'].method() I don't think we can reject code like the above out of hand as un-Pythonic or an anti-pattern. It's also very common, and a little verbose. It's not bad when the value is a name, but sometimes it's an expression, in which case it's both verbose and inefficient: result = None if spam.eggs(cheese) is None else spam.eggs(cheese)['id'].method() Contrast: result = spam.eggs(cheese)?['id'].method() which only calculates the expression to the left of the ?[ once. An actual real-life example where we work around this by using a temporary name that otherwise isn't actually used for anything: mo = re.match(needle, haystack) if mo: substr = mo.group() else: substr = None I think it is perfectly reasonable to ask for syntactic sugar to avoid having to write code like the above: substr = re.match(needle, haystack)?.group() That's not to say we necessarily should add sugar for this, since there is no doubt there are disadvantages as well (mostly that many people dislike the ? syntax), but in principle at least it would certainly be nice to have and useful. > > B) If it is useful, is it important that it short circuits? (Put another > > way, could a function suffice?) > > Short circuiting is important, but to me that simply implies that the > "useful value or None" approach is flawed *because* it needs > short-circuiting to manage. Nothing needs short-circuiting, at least in a language with imperative assignment statements. You can always avoid the need for short-circuits with temporary variables, and sometimes that's the right answer: not everything needs to be a one-liner, or an expression. But sometimes it is better if it could be. > > C) If it should be an operator, is "??" an ugly spelling? > > > > >>> retries = default ?? cls.DEFAULT I assume the ?? operator is meant as sugar for: retries = cls.DEFAULT if default is None else default I prefer to skip the "default" variable and use the standard idiom: if retries is None: retries = cls.DEFAULT I also worry about confusion caused by the asymmetry between ?? and the other three ? cases: # if the left side is None, return None, else evaluate the right side spam?.attr spam?['id'] spam?(arg) # if the left side is None, return the right side, else return the left spam ?? eggs but perhaps I'm worried over nothing. -- Steve From p.f.moore at gmail.com Mon Sep 21 16:56:44 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 21 Sep 2015 15:56:44 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150921144130.GB31152@ando.pearwood.info> References: <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <20150921144130.GB31152@ando.pearwood.info> Message-ID: On 21 September 2015 at 15:41, Steven D'Aprano wrote: > An actual real-life example where we work around this by using a > temporary name that otherwise isn't actually used for anything: > > mo = re.match(needle, haystack) > if mo: > substr = mo.group() > else: > substr = None > > > I think it is perfectly reasonable to ask for syntactic sugar to avoid > having to write code like the above: > > substr = re.match(needle, haystack)?.group() Well, (1) Mark had focused on the "coalesce" operator ??, not the ?. variant, and that is less obviously useful here, and (2) I find the former version more readable. YMMV on readability of course - which is why I added the proviso that if someone comes up with an "obviously right" syntax, I may well change my mind. But the options suggested so far are all far less readable than a simple multi-line if (maybe with a temporary variable) to me, at least. By the way, in your example you're passing on the "none or useful" property by making substr be either the matched value or None. In real life, I'd probably do something more like mo = re.match(needle, haystack) if mo: process(mo.group()) else: no_needle() possibly with inline code if process or no_needle were simple. But it is of course easy to pick apart examples - real code isn't always that tractable. For example we get (what seems to me like) a *lot* of bug reports about "None has not attribute foo" style errors in pip. My comments here are based on my inclinations about how I would fix them in pip - I'd always go back to *why* we got a None, and try to avoid getting the None in the first place. But that's not always easy to do. Again, of course, YMMV. Paul From jsbueno at python.org.br Mon Sep 21 17:10:58 2015 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Mon, 21 Sep 2015 12:10:58 -0300 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: <55FDAC74.7050001@mail.de> Message-ID: On 20 September 2015 at 10:59, Luciano Ramalho wrote: > Chris, > > I don't think students should be worrying about writing code that is > Python 2 and Python 3 compatible. > > That's a concern only for people who write libraries, tools and > frameworks for others to use, and I do not think these are the kinds > of programs students usually do. Even if they are doing something > along those lines, they should be focusing on other more important > features of the programs rather than whether they run on Python 2 and > on Python 3. > > Having said that, I'd also like to add that I don't think ``from > __future__ import unicode_literals`` is a great idea for making code > 2/3 compatible nowadays. It was necessary before the u'' prefix was > reinstated in Python 3.3, but since u'' is back it's much better to be > explicit in your literals rather than dealing with runtime errors > because of the blanket effect of the unicode_literals import. > > Anyone who cares about 2/3 compatibility should mark every single > literal with a u'' or a b'' prefix. > > But students should not be distracted by this. They should be using > Python 3 only ;-). I care to disagree. Anyone writting maintaonable, future-proof code that should sitll run on Python2 should add the "from __future__ import unicode_literals" - and write the code fully aware that each string is text, and not bytes. The "u" prefix is nice for quickly porting projects, without rethinking the flow of every single string found inside. > > Cheers, > > Luciano > > > > > On Sat, Sep 19, 2015 at 3:41 PM, Sven R. Kunze wrote: >> I totally agree here. >> >> >> On 19.09.2015 19:50, Chris Barker wrote: >> >> Hi all, >> >> the common advise, these days, if you want to write py2/3 compatible code, >> is to do: >> >> from __future__ import absolute_import >> from __future__ import division >> from __future__ import print_function >> from __future__ import unicode_literals >> >> https://docs.python.org/2/howto/pyporting.html#prevent-compatibility-regressions >> >> I'm trying to do this in my code, and teaching my students to do it to. >> >> but that's actually a lot of code to write. >> >> It would be nice to have a: >> >> from __future__ import py3 >> >> or something like that, that would do all of those in one swipe. >> >> IIIC, l can't make a little module that does that, because the __future__ >> imports only effect the module in which they are imported >> >> Sure, it's not a huge deal, but it would make it easier for folks wanting to >> keep up this best practice. >> >> Of course, this wouldn't happen until 2.7.11, if an when there even is one, >> but it would be nice to get it on the list.... >> >> -Chris >> >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Luciano Ramalho > | Author of Fluent Python (O'Reilly, 2015) > | http://shop.oreilly.com/product/0636920032519.do > | Professor em: http://python.pro.br > | Twitter: @ramalhoorg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From guido at python.org Mon Sep 21 17:18:13 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Sep 2015 08:18:13 -0700 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: <1442843138.3321719.389368849.52DEB79B@webmail.messagingengine.com> References: <55FF8758.70406@canterbury.ac.nz> <20150921130813.GZ31152@ando.pearwood.info> <1442843138.3321719.389368849.52DEB79B@webmail.messagingengine.com> Message-ID: It's just about these four imports, right? from __future__ import absolute_import from __future__ import division from __future__ import print_function from __future__ import unicode_literals I think the case is overblown. - absolute_import is rarely an issue; the only thing it does (despite the name) is give an error message when you attempt a relative import without using a "." in the import. A linter can find this easily for you, and a little discipline plus the right example can do a lot of good here. - division is important. - print_function is important. - unicode_literals is useless IMO. It breaks some things (yes there are still APIs that don't take unicode in 2.7) and it doesn't nearly as much as what would be useful -- e.g. repr() and .readline() still return 8-bit strings. I recommend just using u-literals and abandoning Python 3.2. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Mon Sep 21 17:28:22 2015 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 21 Sep 2015 10:28:22 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On 09/20/2015 11:57 PM, Chris Angelico wrote: > On Mon, Sep 21, 2015 at 2:52 PM, Sven R. Kunze wrote: >> >I limit myself to materializing default arguments as in: >> > >> >def a(b=None): >> > b = b or {} >> > ... > As long as you never need to pass in a specific empty dictionary, > that's fine. That's the trouble with using 'or' - it's not checking > for None, it's checking for falseness. From reading these, I think the lowest-level/purest change would be to accommodate testing for "not None". Something I've always thought Python should be able to do in a nicer more direct way. We could add a "not None" specific boolean operators just by appending ! to them. while! x: <--> while x != None: if! x: <--> if x != None: a or! b <--> b if a != None else a a and! b <--> a if a != None else b not! x <--> x if x != None else None Those expressions on the right are very common and are needed because of None, False, and 0, are all False values. It would make for much simpler expressions and statements where they are used and be more efficient as these are likely to be in loops going over *many* objects. So it may also result in a fairly nice speed improvement for many routines. While the consistency argument says "if!" should be equivalent to "if not", I feel the practicality argument leans towards it being specific to "if obj != None". I believe testing for "not None" is a lot more common than testing for "None". Usually the only difference is how the code is arranged. I like how it simplifies/clarifies the common cases above. It would be especially nice in comprehensions. Cheers, Ron From guido at python.org Mon Sep 21 17:40:17 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Sep 2015 08:40:17 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: Just to cut this thread short, I'm going to reject PEP 505, because ? is just too ugly to add to Python IMO. Sorry. I commend Mark for his clean write-up, without being distracted, giving some good use cases. I also like that he focused on a minimal addition to the language and didn't get distracted by hyper-generalizations. I also like that he left out f?(...) -- the use case is much weaker; usually it's the object whose method you're calling that might be None, as in title?.upper(). Some nits for the PEP: - I don't think it ever gives the priority for the ?? operator. What would "a ?? b or c" mean? - You don't explain why it's x ?? y but x ?= y. I would have expected either x ? y or x ??= y. - You don't explain or show how far ?. reaches; I assume x?y.z is equivalent to None if x is None else x.y.z, so you don't have to write x?.y?.z just to handle x.y.z if x is None. - The specification section is empty. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mehaase at gmail.com Mon Sep 21 17:58:22 2015 From: mehaase at gmail.com (Mark E. Haase) Date: Mon, 21 Sep 2015 11:58:22 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: PEP-505 isn't anywhere close to being finished. I only submitted the draft because somebody off list asked me to send a draft so I could get a PEP number assigned. So I literally sent him what I had open in my text editor, which was just a few minutes of brain dumping and had several mistakes (grammatical and technical). If there's absolutely no point in continuing to work on it, I'll drop it. But from the outset, I thought the plan was to present this in its best light (and similar to the ternary operator PEP, offer several alternatives) if for no other reason than to have a good record of the reasoning for rejecting it. I'm sorry if I misunderstood the PEP process; I would have kept it to myself longer if I knew the first submission was going to be reviewed critically. I thought this e-mail chain was more of an open discussion on the general idea, not specifically a referendum on the PEP itself. On Mon, Sep 21, 2015 at 11:40 AM, Guido van Rossum wrote: > Just to cut this thread short, I'm going to reject PEP 505, because ? is > just too ugly to add to Python IMO. Sorry. > > I commend Mark for his clean write-up, without being distracted, giving > some good use cases. I also like that he focused on a minimal addition to > the language and didn't get distracted by hyper-generalizations. > > I also like that he left out f?(...) -- the use case is much weaker; > usually it's the object whose method you're calling that might be None, as > in title?.upper(). > > Some nits for the PEP: > > - I don't think it ever gives the priority for the ?? operator. What would > "a ?? b or c" mean? > - You don't explain why it's x ?? y but x ?= y. I would have expected > either x ? y or x ??= y. > - You don't explain or show how far ?. reaches; I assume x?y.z is > equivalent to None if x is None else x.y.z, so you don't have to write > x?.y?.z just to handle x.y.z if x is None. > - The specification section is empty. > > -- > --Guido van Rossum (python.org/~guido) > -- Mark E. Haase 202-815-0201 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Sep 21 18:10:59 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 02:10:59 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <20150921161059.GC31152@ando.pearwood.info> On Sat, Sep 19, 2015 at 06:40:32PM -0500, Tim Peters wrote: > [Guido] > > Thanks! I'd accept this (and I'd reject 504 at the same time). I like the > > secrets name. I wonder though, should the PEP propose a specific set of > > functions? (With the understanding that we might add more later.) > > The bikeshedding on that will be far more tedious than the > implementation. I'll get it started :-) > > No attempt to be minimal here. More-than-less "obvious" is more important: > > Bound methods of a SystemRandom instance > .randrange() > .randint() > .randbits() > renamed from .getrandbits() > .randbelow(exclusive_upper_bound) > renamed from private ._randbelow() > .choice() While we're bike-shedding, I don't know that I like the name randbits, since that always makes me expect a sequence of 0, 1 bits. But that's a minor point. When would somebody use randbelow(n) rather than randrange(n)? Apart from the possible redundancy between rand[below|range], all the above seem reasonable to me. Are there use-cases for a strong random float between 0 and 1? If so, is it sufficient to say secrets.randbelow(sys.maxsize)/sys.maxsize, or should we offer secrets.random() and/or secrets.uniform(a, b)? > Token functions > .token_bytes(nbytes) > another name for os.urandom() > .token_hex(nbytes) > same, but return string of ASCII hex digits > .token_url(nbytes) > same, but return URL-safe base64-encoded ASCII I suggest adding a default length, say nbytes=32, with a note that the default length is expected to increase in the future. Otherwise, how will the naive user know what counts as a good, hard-to-attack length? All of the above look good to me. > .token_alpha(alphabet, nchars) > string of `nchars` characters drawn uniformly > from `alphabet` What is the intention for this function? To use as passwords? Other than that, it's not obvious to me what that would be used for. -- Steve From steve at pearwood.info Mon Sep 21 18:16:45 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 02:16:45 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <871tdtvgun.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150919181612.GT31152@ando.pearwood.info> <871tdtvgun.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150921161645.GD31152@ando.pearwood.info> On Sun, Sep 20, 2015 at 01:45:36PM +0900, Stephen J. Turnbull wrote: > Chris Angelico writes: > > > My personal preference for shed colour: token_bytes returns a > > bytestring, its length being the number provided. All the others > > return Unicode strings, their lengths again being the number provided. > > So they're all text bar the one that explicitly says it's in bytes. > > I think that token_url may need a bytes mode, for the same reasons > that bytes needs __mod__: such tokens will often be created and parsed > by programs that never leave the "ASCII-compatible bytes" world. I expect that token_url would return a string (Unicode), but since it's pure ASCII (being base64 encoded), if you want bytes, you can just call token_url().encode('ascii'). Or maybe it should return bytes, and if you want a string, you just say token_url().decode('ascii'). Out of the two, I'm very slightly leaning towards the first (Unicode by default, encode to ASCII if you want bytes) than the second. I'm very much not in favour of a "return_bytes=True" argument. -- Steve From srkunze at mail.de Mon Sep 21 18:21:28 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 21 Sep 2015 18:21:28 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <6E2CAA1D-F35C-4BD5-B314-B2E0291AE019@yahoo.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <877fnkxin7.fsf@uwakimon.sk.tsukuba.ac.jp> <6E2CAA1D-F35C-4BD5-B314-B2E0291AE019@yahoo.com> Message-ID: <56002E88.80903@mail.de> On 21.09.2015 11:05, Andrew Barnert via Python-ideas wrote: > On Sep 21, 2015, at 01:48, Stephen J. Turnbull wrote: > >>>>>> retries = default else cls.DEFAULT >> I kinda like this if-less else syntax for the symmetry with else-less >> if. That's cool. It reads nice (at least for a non-native speaker). Also chaining else reads nice: final_value = users_value else apps_value else systems_value > How do you parse this: > > a if b else c else d > > Feel free to answer either as a human reader or as CPython's LL(1) parser. Use parentheses if you mix up if-else and else. ;) Btw. the same applies for: a + b * c + d If you don't know from you education that b*c would have been evaluated first, then it's not obvious either. Best, Sven From steve at pearwood.info Mon Sep 21 18:22:26 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 02:22:26 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <20150921162226.GE31152@ando.pearwood.info> On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote: > On 20.09.15 02:40, Tim Peters wrote: > >No attempt to be minimal here. More-than-less "obvious" is more important: > > > >Bound methods of a SystemRandom instance > > .randrange() > > .randint() > > .randbits() > > renamed from .getrandbits() > > .randbelow(exclusive_upper_bound) > > renamed from private ._randbelow() > > .choice() > > randbelow() is just an alias for randrange() with single argument. > randint(a, b) == randrange(a, b+1). > > These functions are redundant and they have non-zero cost. But they already exist in the random module, so adding them to secrets doesn't cost anything extra. It's just a reference to the bound method of the private SystemRandom() instance: # suggested implementation import random _systemrandom = random.SystemRandom() randint= _systemrandom.randint randrange = _systemrandom.randrange etc. > Would not renaming getrandbits be confused? > > > Token functions > > .token_bytes(nbytes) > > another name for os.urandom() > > .token_hex(nbytes) > > same, but return string of ASCII hex digits > > .token_url(nbytes) > > same, but return URL-safe base64-encoded ASCII > > .token_alpha(alphabet, nchars) > > string of `nchars` characters drawn uniformly > > from `alphabet` > > token_hex(nbytes) == token_alpha('0123456789abcdef', nchars) ? > token_url(nbytes) == token_alpha( > 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', > nchars) ? They may be reasonable implementations for the functions, but simple as they are, I think we still want to provide them as named functions rather than expect the user to write things like the above. If they're doing it more than once, they'll want to write a helper function, we might as well provide that for them. -- Steve From srkunze at mail.de Mon Sep 21 18:27:48 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 21 Sep 2015 18:27:48 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> Message-ID: <56003004.1070807@mail.de> On 21.09.2015 15:27, Chris Angelico wrote: > On Mon, Sep 21, 2015 at 8:55 PM, Paul Moore wrote: >> There seem to be a few main use cases: >> >> 1. Dealing with functions that return a useful value or None to signal >> "no value". I suspect the right answer here is actually to rewrite the >> function to not do that in the first place. "Useful value or None" >> seems like a reasonable example of an anti-pattern in Python. > The alternative being to raise an exception? It's generally easier, > when you can know in advance what kind of object you're expecting, to > have a None return when there isn't one. For example, SQLAlchemy has > .get(id) to return the object for a given primary key value, and it > returns None if there's no such row in the database table - having to > wrap that with try/except would be a pain. This isn't an error > condition, and it's not like the special case of iteration (since an > iterator could yield any value, it's critical to have a non-value way > of signalling "end of iteration"). I don't want to see everything > forced to "return or raise" just because someone calls this an > anti-pattern. I don't think both approaches are mutual exclusive. They can both exist and provide whenever I need the right thing. Depending on the use-case, one needs to decide: If I know, the value definitely needs to be a dictionary, I use dict[...]. If I know, the value is definitely optional and I can't do anything about it, I use dict.get('key'[, default]). If I definitely don't know, I use dict[...] to get my hands on a real example with out that key if that every happens and don't waste time for special-handling a possible None return value. Best, Sven From robert.kern at gmail.com Mon Sep 21 18:29:08 2015 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 21 Sep 2015 17:29:08 +0100 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150921162226.GE31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921162226.GE31152@ando.pearwood.info> Message-ID: On 2015-09-21 17:22, Steven D'Aprano wrote: > On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote: >> On 20.09.15 02:40, Tim Peters wrote: >>> Token functions >>> .token_bytes(nbytes) >>> another name for os.urandom() >>> .token_hex(nbytes) >>> same, but return string of ASCII hex digits >>> .token_url(nbytes) >>> same, but return URL-safe base64-encoded ASCII >>> .token_alpha(alphabet, nchars) >>> string of `nchars` characters drawn uniformly >>> from `alphabet` >> >> token_hex(nbytes) == token_alpha('0123456789abcdef', nchars) ? >> token_url(nbytes) == token_alpha( >> 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', >> nchars) ? > > They may be reasonable implementations for the functions, but simple as > they are, I think we still want to provide them as named functions > rather than expect the user to write things like the above. If they're > doing it more than once, they'll want to write a helper function, we > might as well provide that for them. Actually, I don't think those are the semantics that Tim intended. Rather, token_hex(nbytes) would return a string twice as long as nbytes. The idea is that you want to get nbytes-worth of random bits, just encoded in a common "safe" format. Similarly, token_url(nbytes) would get nbytes of random bits then base64-encode it, not just pick nbytes characters from a URL-safe list of characters. This makes it easier to reason about how much entropy you are actually using. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stephen at xemacs.org Mon Sep 21 18:29:43 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Sep 2015 01:29:43 +0900 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <6E2CAA1D-F35C-4BD5-B314-B2E0291AE019@yahoo.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <877fnkxin7.fsf@uwakimon.sk.tsukuba.ac.jp> <6E2CAA1D-F35C-4BD5-B314-B2E0291AE019@yahoo.com> Message-ID: <87k2rjzqfc.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > On Sep 21, 2015, at 01:48, Stephen J. Turnbull wrote: > > >>>>> retries = default else cls.DEFAULT > > > > I kinda like this if-less else syntax for the symmetry with else-less > > if. > > How do you parse this: > > a if b else c else d > > Feel free to answer either as a human reader or as CPython's LL(1) > parser. I don't know what an LL(1) parser could do offhand. As a human, I would parse that greedily as (a if b else c) else d. But the point's actually moot, as I'm -1 on the "??" operator in any form in favor of the explicit "a if a is not None else b" existing syntax. And to be honest, the fact that a truly symmetric "if-less else" would have "or" semantics, not "??" semantics, bothers me more than the technical issue of whether anybody could actually parse it. From srkunze at mail.de Mon Sep 21 18:35:04 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 21 Sep 2015 18:35:04 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <56003004.1070807@mail.de> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <56003004.1070807@mail.de> Message-ID: <560031B8.5020903@mail.de> On 21.09.2015 18:27, Sven R. Kunze wrote: > If I know, the value definitely needs to be *IN* the dictionary, I use > dict[...]. typo -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Sep 21 18:50:56 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 22 Sep 2015 02:50:56 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150921161059.GC31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921161059.GC31152@ando.pearwood.info> Message-ID: On Tue, Sep 22, 2015 at 2:10 AM, Steven D'Aprano wrote: > Are there use-cases for a strong random float between 0 and 1? If > so, is it sufficient to say secrets.randbelow(sys.maxsize)/sys.maxsize, > or should we offer secrets.random() and/or secrets.uniform(a, b)? I would be leery of such a function, because it'd be hard to define it perfectly. Tell me, crypto wonks: If I have a function randfloat() that returns 0.0 <= x < 1.0, is it safe to use it like this: # Generate an integer 0 <= x < 12345, uniformly distributed uniform = int(randfloat() * 12345) # Ditto but on a logarithmic distribution log = math.exp(randfloat() * math.log(12345)) # Double-logarithmic loglog = math.exp(math.exp(randfloat() * math.log(math.log(12345)))) If it's producing a random *real number* 0 <= x < 1, then these should be valid. But given the differences between floats and reals, I would be worried that this kind of usage would introduce an unexpected bias. Obviously the first example is much better spelled randbelow or randrange, but for more complicated examples, grabbing a random float would look like the best way to do it. Will it? Always? Not being a crypto wonk myself, I can't know what's safe and what isn't. If Python is going to offer a new module with the (implicit or explicit) recommendation "use this for all your cryptographic entropy", it needs to be 100% reliable. ChrisA From random832 at fastmail.com Mon Sep 21 18:57:09 2015 From: random832 at fastmail.com (Random832) Date: Mon, 21 Sep 2015 12:57:09 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <87k2rjzqfc.fsf@uwakimon.sk.tsukuba.ac.jp> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <877fnkxin7.fsf@uwakimon.sk.tsukuba.ac.jp> <6E2CAA1D-F35C-4BD5-B314-B2E0291AE019@yahoo.com> <87k2rjzqfc.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1442854629.3367342.389575353.4BDB0154@webmail.messagingengine.com> On Mon, Sep 21, 2015, at 12:29, Stephen J. Turnbull wrote: > I don't know what an LL(1) parser could do offhand. As a human, I > would parse that greedily as (a if b else c) else d. That's not greedy. The greedy parsing is (a if b else (c else d)). From guido at python.org Mon Sep 21 19:07:16 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Sep 2015 10:07:16 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On Mon, Sep 21, 2015 at 8:58 AM, Mark E. Haase wrote: > PEP-505 isn't anywhere close to being finished. I only submitted the draft > because somebody off list asked me to send a draft so I could get a PEP > number assigned. So I literally sent him what I had open in my text editor, > which was just a few minutes of brain dumping and had several mistakes > (grammatical and technical). > > If there's absolutely no point in continuing to work on it, I'll drop it. > But from the outset, I thought the plan was to present this in its best > light (and similar to the ternary operator PEP, offer several alternatives) > if for no other reason than to have a good record of the reasoning for > rejecting it. > > I'm sorry if I misunderstood the PEP process; I would have kept it to > myself longer if I knew the first submission was going to be reviewed > critically. I thought this e-mail chain was more of an open discussion on > the general idea, not specifically a referendum on the PEP itself. > I apologize for having misunderstood the status of your PEP. I think it would be great if you finished the PEP. As you know the ? operator has its share of fans as well as detractors, and I will happily wait until more of a consensus appears. I hope you can also add a discussion to the PEP of ideas (like some of the hyper-generalizations) that were considered and rejected -- summarizing a discussion is often a very important goal of a PEP. I think you have made a great start already! --Guido > On Mon, Sep 21, 2015 at 11:40 AM, Guido van Rossum > wrote: > >> Just to cut this thread short, I'm going to reject PEP 505, because ? is >> just too ugly to add to Python IMO. Sorry. >> >> I commend Mark for his clean write-up, without being distracted, giving >> some good use cases. I also like that he focused on a minimal addition to >> the language and didn't get distracted by hyper-generalizations. >> >> I also like that he left out f?(...) -- the use case is much weaker; >> usually it's the object whose method you're calling that might be None, as >> in title?.upper(). >> >> Some nits for the PEP: >> >> - I don't think it ever gives the priority for the ?? operator. What >> would "a ?? b or c" mean? >> - You don't explain why it's x ?? y but x ?= y. I would have expected >> either x ? y or x ??= y. >> - You don't explain or show how far ?. reaches; I assume x?y.z is >> equivalent to None if x is None else x.y.z, so you don't have to write >> x?.y?.z just to handle x.y.z if x is None. >> - The specification section is empty. >> >> -- >> --Guido van Rossum (python.org/~guido) >> > > > > -- > Mark E. Haase > 202-815-0201 > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Sep 21 19:09:04 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Sep 2015 03:09:04 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921161059.GC31152@ando.pearwood.info> Message-ID: On 22 September 2015 at 02:50, Chris Angelico wrote: > On Tue, Sep 22, 2015 at 2:10 AM, Steven D'Aprano wrote: >> Are there use-cases for a strong random float between 0 and 1? If >> so, is it sufficient to say secrets.randbelow(sys.maxsize)/sys.maxsize, >> or should we offer secrets.random() and/or secrets.uniform(a, b)? > > I would be leery of such a function, because it'd be hard to define it > perfectly. Tell me, crypto wonks: If I have a function randfloat() > that returns 0.0 <= x < 1.0, is it safe to use it like this: Floating point numbers and crypto don't go together - crypto is all about integers, bits, bytes, and text. Folks dealing with floating point numbers are presumably handling modelling and simulation tasks, and will want the random module, not secrets. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rymg19 at gmail.com Mon Sep 21 19:10:04 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 21 Sep 2015 12:10:04 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: What about re-using try? Crystal does this ( http://play.crystal-lang.org/#/r/gf5): v = "ABC" puts nil == v.try &.downcase # prints true v = nil puts nil == v.try &.downcase # prints false Python could use something like: v = 'ABC' print(v try.downcase is None) # prints False v = None print(v try.downcase is None) # prints True (Of course, the syntax would be a little less...weird!) On Mon, Sep 21, 2015 at 10:40 AM, Guido van Rossum wrote: > Just to cut this thread short, I'm going to reject PEP 505, because ? is > just too ugly to add to Python IMO. Sorry. > > I commend Mark for his clean write-up, without being distracted, giving > some good use cases. I also like that he focused on a minimal addition to > the language and didn't get distracted by hyper-generalizations. > > I also like that he left out f?(...) -- the use case is much weaker; > usually it's the object whose method you're calling that might be None, as > in title?.upper(). > > Some nits for the PEP: > > - I don't think it ever gives the priority for the ?? operator. What would > "a ?? b or c" mean? > - You don't explain why it's x ?? y but x ?= y. I would have expected > either x ? y or x ??= y. > - You don't explain or show how far ?. reaches; I assume x?y.z is > equivalent to None if x is None else x.y.z, so you don't have to write > x?.y?.z just to handle x.y.z if x is None. > - The specification section is empty. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Sep 21 19:13:19 2015 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 21 Sep 2015 18:13:19 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <87k2rjzqfc.fsf@uwakimon.sk.tsukuba.ac.jp> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <877fnkxin7.fsf@uwakimon.sk.tsukuba.ac.jp> <6E2CAA1D-F35C-4BD5-B314-B2E0291AE019@yahoo.com> <87k2rjzqfc.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <56003AAF.6030100@mrabarnett.plus.com> On 2015-09-21 17:29, Stephen J. Turnbull wrote: > Andrew Barnert writes: > > On Sep 21, 2015, at 01:48, Stephen J. Turnbull wrote: > > > > >>>>> retries = default else cls.DEFAULT > > > > > > I kinda like this if-less else syntax for the symmetry with else-less > > > if. > > > > How do you parse this: > > > > a if b else c else d > > > > Feel free to answer either as a human reader or as CPython's LL(1) > > parser. > > I don't know what an LL(1) parser could do offhand. As a human, I > would parse that greedily as (a if b else c) else d. > 'else' is being used like 'or', except when it belongs to 'if'. I can't see a way of handling that. It would result in a syntax error. > But the point's actually moot, as I'm -1 on the "??" operator in any > form in favor of the explicit "a if a is not None else b" existing > syntax. And to be honest, the fact that a truly symmetric "if-less > else" would have "or" semantics, not "??" semantics, bothers me more > than the technical issue of whether anybody could actually parse it. > From steve at pearwood.info Mon Sep 21 19:47:58 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 03:47:58 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <20150921174758.GF31152@ando.pearwood.info> On Sun, Sep 20, 2015 at 11:56:06AM +0100, Paul Moore wrote: > On 20 September 2015 at 00:40, Tim Peters wrote: > > .token_alpha(alphabet, nchars) > > string of `nchars` characters drawn uniformly > > from `alphabet` > > Given where this started, I'd suggest renaming token_alpha as > "password". Beginners wouldn't necessarily associate the term "token" > with the problem "I want to generate a random password" [1]. Maybe add > a short recipe showing how to meet constraints like "at least 2 > digits" by simply generating repeatedly until a valid password is > found. I'm not entirely sure about including password generators, since there are so many password schemes around: http://thedailywtf.com/articles/Security-by-PostIt > For a bit of extra bikeshedding, I'd make alphabet the second, > optional, parameter and default it to > string.ascii_letters+string.digits+string.punctuation, as that's often > what password constraints require. If we're going to offer a simple, no-brainer password generator, my vote goes for: def password(nchars=10, alphabet=string.ascii_letters+string.digits): I wouldn't include punctuation by default, as too many places still prohibit some, or all, punctuation characters. If both my understanding and calculations are correct, using ascii_letters+digits+punctuation gives us log(94, 2) = 6.6 bits of (Shannon) entropy per character, while just using letters+digits gives us log(62, 2) = 6.0 bits per character. For short-ish passwords, up to 10 characters, the extra entropy from including punctuation is less than the extra from adding an extra character: password length of 8, without punctuation: 47.6 bits password length of 8, including punctuation: 52.4 bits password length of 9, without punctuation: 53.6 bits > Or at the very least, document how to use the module functions for the > common tasks we see people getting wrong. But I thought the idea here > was to make doing things the right way obvious, for people who don't > read documentation, so I'd prefer to see the functions exposed by the > module named based on the problems they solve, not on the features > they provide. (Even if that involves a little duplication, and/or a > split between "high level" and "low level" APIs). I agree that secrets should be providing ready-to-use functions, even if they don't solve all use-cases, not just primitive building blocks. -- Steve From tim.peters at gmail.com Mon Sep 21 19:51:13 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 21 Sep 2015 12:51:13 -0500 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150921161059.GC31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921161059.GC31152@ando.pearwood.info> Message-ID: [Tim] >> ... >> No attempt to be minimal here. More-than-less "obvious" is more important: >> >> Bound methods of a SystemRandom instance >> .randrange() >> .randint() >> .randbits() >> renamed from .getrandbits() >> .randbelow(exclusive_upper_bound) >> renamed from private ._randbelow() >> .choice() [Steven D'Aprano ] > While we're bike-shedding, I refuse to bikeshed on this. I posted a concrete proposal just to enrage others into it ;-) So I'll just sketch my thinking: > I don't know that I like the name randbits, since that always > makes me expect a sequence of 0, 1 bits. But that's a minor > point. Had in mind multiple audiences, including those who know a lot about Python, and those who know little. The _lack_ of randbits() would surprise the former. > When would somebody use randbelow(n) rather than randrange(n)? For the same reason they'd use randbits(n) instead of randrange(1 << n) ;-) That is, familiarity and obviousness. randrange() has a complicated signature, with 1 to 3 arguments, and endlessly surprises newbies who _expect_, e.g., randrange(3) to return 3 at times. That's why randint() was created. "randbelow(n)" has a dirt-simple signature, and its name makes it hard to mistakenly believe `n` is a possible return value. It's exactly what's needed most often to avoid _statistical_ bias (as opposed to security weaknesses) in higher-level functions - that's why _randbelow() is a fundamental primitive in Random. So, yes, it's redundant, but I don't care. randrange(n) itself is just a needlessly expensive way to call _randbelow(n) today. > Apart from the possible redundancy between rand[below|range], all the > above seem reasonable to me. If people want minimal, just expose os.urandom() under a friendlier name, and call it done ;-) > Are there use-cases for a strong random float between 0 and 1? If > so, is it sufficient to say secrets.randbelow(sys.maxsize)/sys.maxsize, > or should we offer secrets.random() and/or secrets.uniform(a, b)? I don't know of any "security use" for random floats. But if you want to add a recipe to the docs, point them to SystemRandom.random instead. That gets it right. `sys.maxsize` doesn't really have anything to do with floats, and the snippet you gave would produce poor-quality floats on a 32-bit box (wouldn't get anywhere near randomizing all 53 bits of float precision). On a 64-bit box, it could, e.g., return 1.0 (which random() should never return). >> Token functions >> .token_bytes(nbytes) >> another name for os.urandom() >> .token_hex(nbytes) >> same, but return string of ASCII hex digits >> .token_url(nbytes) >> same, but return URL-safe base64-encoded ASCII > > I suggest adding a default length, say nbytes=32, with a note that the > default length is expected to increase in the future. Otherwise, how > will the naive user know what counts as a good, hard-to-attack length? Fine by me! > All of the above look good to me. > > >> .token_alpha(alphabet, nchars) >> string of `nchars` characters drawn uniformly >> from `alphabet` > > What is the intention for this function? To use as passwords? Other than > that, it's not obvious to me what that would be used for. I just noted that several of the examples in the PHP paper appeared to want to use their own alphabet. But, since that paper was about exposing security holes in PHP apps, perhaps that wasn't such a good idea to begin with ;-) Fine by me if it's dropped. From steve at pearwood.info Mon Sep 21 19:55:40 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 03:55:40 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921161059.GC31152@ando.pearwood.info> Message-ID: <20150921175540.GG31152@ando.pearwood.info> On Tue, Sep 22, 2015 at 02:50:56AM +1000, Chris Angelico wrote: > On Tue, Sep 22, 2015 at 2:10 AM, Steven D'Aprano wrote: > > Are there use-cases for a strong random float between 0 and 1? If > > so, is it sufficient to say secrets.randbelow(sys.maxsize)/sys.maxsize, > > or should we offer secrets.random() and/or secrets.uniform(a, b)? > > I would be leery of such a function, because it'd be hard to define it > perfectly. Tell me, crypto wonks: If I have a function randfloat() > that returns 0.0 <= x < 1.0, is it safe to use it like this: > > # Generate an integer 0 <= x < 12345, uniformly distributed > uniform = int(randfloat() * 12345) > # Ditto but on a logarithmic distribution > log = math.exp(randfloat() * math.log(12345)) > # Double-logarithmic > loglog = math.exp(math.exp(randfloat() * math.log(math.log(12345)))) I'm satisfied by Nick's response to you, which also implies an answer to my question: there is no good use-case for a strong random float and no need for secrets.random(). The main reason I asked is because Ruby's SecureRandom.random_number() optionally returns a float between 0 and 1. -- Steve From steve at pearwood.info Mon Sep 21 20:05:08 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 04:05:08 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921161059.GC31152@ando.pearwood.info> Message-ID: <20150921180508.GH31152@ando.pearwood.info> On Mon, Sep 21, 2015 at 12:51:13PM -0500, Tim Peters wrote: > [Tim] > >> ... > >> No attempt to be minimal here. More-than-less "obvious" is more important: > >> > >> Bound methods of a SystemRandom instance > >> .randrange() > >> .randint() > >> .randbits() > >> renamed from .getrandbits() > >> .randbelow(exclusive_upper_bound) > >> renamed from private ._randbelow() > >> .choice() > > [Steven D'Aprano ] > > While we're bike-shedding, > > I refuse to bikeshed on this. I posted a concrete proposal just to > enrage others into it ;-) So I'll just sketch my thinking: Consider me enraged. Hulk smash puny humans! [...] > > When would somebody use randbelow(n) rather than randrange(n)? > > For the same reason they'd use randbits(n) instead of randrange(1 << > n) ;-) That is, familiarity and obviousness. Okay, that makes sense. > > Are there use-cases for a strong random float between 0 and 1? If > > so, is it sufficient to say secrets.randbelow(sys.maxsize)/sys.maxsize, > > or should we offer secrets.random() and/or secrets.uniform(a, b)? > > I don't know of any "security use" for random floats. But if you want > to add a recipe to the docs, point them to SystemRandom.random > instead. That gets it right. Good enough for me. -- Steve From greg at krypto.org Mon Sep 21 19:59:54 2015 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 21 Sep 2015 17:59:54 +0000 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: <55FF8758.70406@canterbury.ac.nz> <20150921130813.GZ31152@ando.pearwood.info> <1442843138.3321719.389368849.52DEB79B@webmail.messagingengine.com> Message-ID: I think people should stick with *from __future__ import absolute_import* regardless of what code they are writing. They will eventually create a file innocuously called something like calendar.py (the same name as a standard library module) in the same directory as their main binary and their debugging of the mysterious failures they just started getting from the tarfile module will suddenly require leveling up to be able to figure it out. ;) -gps On Mon, Sep 21, 2015 at 8:18 AM Guido van Rossum wrote: > It's just about these four imports, right? > > > from __future__ import absolute_import > from __future__ import division > from __future__ import print_function > from __future__ import unicode_literals > > I think the case is overblown. > > - absolute_import is rarely an issue; the only thing it does (despite the > name) is give an error message when you attempt a relative import without > using a "." in the import. A linter can find this easily for you, and a > little discipline plus the right example can do a lot of good here. > > - division is important. > > - print_function is important. > > - unicode_literals is useless IMO. It breaks some things (yes there are > still APIs that don't take unicode in 2.7) and it doesn't nearly as much as > what would be useful -- e.g. repr() and .readline() still return > 8-bit strings. I recommend just using u-literals and abandoning Python 3.2. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Sep 21 21:21:49 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 21 Sep 2015 12:21:49 -0700 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921161059.GC31152@ando.pearwood.info> Message-ID: <59D14571-7711-46DF-8ADF-817DA543E6CE@yahoo.com> On Sep 21, 2015, at 10:51, Tim Peters wrote: >> When would somebody use randbelow(n) rather than randrange(n)? > > For the same reason they'd use randbits(n) instead of randrange(1 << > n) ;-) That is, familiarity and obviousness. randrange() has a > complicated signature, with 1 to 3 arguments, and endlessly surprises > newbies who _expect_, e.g., randrange(3) to return 3 at times. That's > why randint() was created. Anyone who gets confused by randrange(3) also gets confused by range(3), and they have to learn pretty quickly. Also, randint wasn't created to allow people to put off learning that fact. It was created before randrange, because Adrian Baddeley didn't realize that Python consistently used half-open ranges, and Guido didn't notice. After 1.5 was out and someone complained that choice(range(...)) was inefficient, Guido added randrange. See the commit comment (61464037da53) which says "This addresses the problem that randint() was accidentally defined as taking an inclusive range (how unpythonic)".Also, some guy named Tim Peters convinced Guido that randint(0, 2.5) was surprisingly broken, so if he wasn't going to remove it he should reimplement it as randrange(a, b+1), which would give a clear error message. Later still (3.0), there was another discussion on removing randint, but the decision was to keep it as a "legacy alias", and change the docs to reflect that. I suppose randbelow could be implemented as an alias to randrange(a), or it could copy and paste the same type checks as randrange, but honestly, I don't think anyone needs it. From storchaka at gmail.com Mon Sep 21 22:12:28 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 21 Sep 2015 23:12:28 +0300 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150921162226.GE31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921162226.GE31152@ando.pearwood.info> Message-ID: On 21.09.15 19:22, Steven D'Aprano wrote: > On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote: >> On 20.09.15 02:40, Tim Peters wrote: >>> No attempt to be minimal here. More-than-less "obvious" is more important: >>> >>> Bound methods of a SystemRandom instance >>> .randrange() >>> .randint() >>> .randbits() >>> renamed from .getrandbits() >>> .randbelow(exclusive_upper_bound) >>> renamed from private ._randbelow() >>> .choice() >> >> randbelow() is just an alias for randrange() with single argument. >> randint(a, b) == randrange(a, b+1). >> >> These functions are redundant and they have non-zero cost. > > But they already exist in the random module, so adding them to secrets > doesn't cost anything extra. The main cost is learning and memorising cost. The fewer words you need to learn and keep in memory the better. >> Would not renaming getrandbits be confused? >> >>> Token functions >>> .token_bytes(nbytes) >>> another name for os.urandom() >>> .token_hex(nbytes) >>> same, but return string of ASCII hex digits >>> .token_url(nbytes) >>> same, but return URL-safe base64-encoded ASCII >>> .token_alpha(alphabet, nchars) >>> string of `nchars` characters drawn uniformly >>> from `alphabet` >> >> token_hex(nbytes) == token_alpha('0123456789abcdef', nchars) ? >> token_url(nbytes) == token_alpha( >> 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', >> nchars) ? > > They may be reasonable implementations for the functions, but simple as > they are, I think we still want to provide them as named functions > rather than expect the user to write things like the above. If they're > doing it more than once, they'll want to write a helper function, we > might as well provide that for them. But why these particular alphabets are special? I expect that every application will use the alphabet that matches its needs. One needs decimal digits ('0123456789'), other needs English letters ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'), or letters and digits and underscore, or letters, digits and punctuation, or all safe ASCII characters, or all well graphical distinguished characters. Why token_hex and token_url, but not token_digits, token_letters, token_identifier, token_base32, token_base85, token_html_safe, etc? From storchaka at gmail.com Mon Sep 21 22:16:52 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 21 Sep 2015 23:16:52 +0300 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921162226.GE31152@ando.pearwood.info> Message-ID: On 21.09.15 19:29, Robert Kern wrote: > On 2015-09-21 17:22, Steven D'Aprano wrote: >> On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote: >>> On 20.09.15 02:40, Tim Peters wrote: > >>>> Token functions >>>> .token_bytes(nbytes) >>>> another name for os.urandom() >>>> .token_hex(nbytes) >>>> same, but return string of ASCII hex digits >>>> .token_url(nbytes) >>>> same, but return URL-safe base64-encoded ASCII >>>> .token_alpha(alphabet, nchars) >>>> string of `nchars` characters drawn uniformly >>>> from `alphabet` >>> >>> token_hex(nbytes) == token_alpha('0123456789abcdef', nchars) ? >>> token_url(nbytes) == token_alpha( >>> 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', >>> nchars) ? >> >> They may be reasonable implementations for the functions, but simple as >> they are, I think we still want to provide them as named functions >> rather than expect the user to write things like the above. If they're >> doing it more than once, they'll want to write a helper function, we >> might as well provide that for them. > > Actually, I don't think those are the semantics that Tim intended. > Rather, token_hex(nbytes) would return a string twice as long as nbytes. > The idea is that you want to get nbytes-worth of random bits, just > encoded in a common "safe" format. Similarly, token_url(nbytes) would > get nbytes of random bits then base64-encode it, not just pick nbytes > characters from a URL-safe list of characters. This makes it easier to > reason about how much entropy you are actually using. Looks as the semantic of these functions is not so obvious. May be add generic function that encodes a sequence of bytes with specified alphabet? From tjreedy at udel.edu Mon Sep 21 22:23:03 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Sep 2015 16:23:03 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <20150921144130.GB31152@ando.pearwood.info> Message-ID: On 9/21/2015 10:56 AM, Paul Moore wrote: > By the way, in your example you're passing on the "none or useful" > property by making substr be either the matched value or None. I agree that dealing with None immediately is better. In real > life, I'd probably do something more like > > mo = re.match(needle, haystack) > if mo: > process(mo.group()) > else: > no_needle() try: process(re.match(needle, haystack).group()) except AttributeError: # no match no_needle() is equivalent unless process can also raise AttributeError. -- Terry Jan Reedy From random832 at fastmail.com Mon Sep 21 22:33:10 2015 From: random832 at fastmail.com (Random832) Date: Mon, 21 Sep 2015 16:33:10 -0400 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921162226.GE31152@ando.pearwood.info> Message-ID: <1442867590.3414435.389778129.2688BF5A@webmail.messagingengine.com> On Mon, Sep 21, 2015, at 16:12, Serhiy Storchaka wrote: > But why these particular alphabets are special? I expect that every > application will use the alphabet that matches its needs. One needs > decimal digits ('0123456789'), other needs English letters > ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'), or letters and digits and underscore, or > letters, digits and punctuation, or all safe ASCII characters, or all > well graphical distinguished characters. Why token_hex and token_url, > but not token_digits, token_letters, token_identifier, token_base32, > token_base85, token_html_safe, etc? Well, for one thing, they're trivial encodings of random bits, which is why passing in nbytes (number of random bytes) makes sense. Someone else pointed out that this makes it easier to reason about the amount of entropy involved. Token_base64 could actually, in principle, return a string with padding at the end according to base64 rules, if you ask for a number of bytes that is not a multiple of four. Base85 could likewise, for that matter, but base85 is a less common encoding. From tjreedy at udel.edu Mon Sep 21 22:44:18 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Sep 2015 16:44:18 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On 9/21/2015 11:28 AM, Ron Adam wrote: > We could add a "not None" specific boolean operators just by appending ! > to them. > > while! x: <--> while x != None: > if! x: <--> if x != None: > > a or! b <--> b if a != None else a > a and! b <--> a if a != None else b > not! x <--> x if x != None else None '!= None' should be 'is not None' in all examples. Since 'is not None' is a property of the object, so I think any abbreviation should be applied to the object, not the operator. "while x!", etcetera -- Terry Jan Reedy From tjreedy at udel.edu Mon Sep 21 23:23:42 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Sep 2015 17:23:42 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On 9/21/2015 1:07 PM, Guido van Rossum wrote: > I apologize for having misunderstood the status of your PEP. I think it > would be great if you finished the PEP. As you know the ? operator has > its share of fans as well as detractors, and I will happily wait until > more of a consensus appears. Add me to the detractors of what I have read so far ;-). In arithmetic, 1/0 and 0/0 both stop the calculation. My hand calculator literally freezes until I hit 'on' or 'all clear'. Early computers also stopped, maybe with an instruction address and core dump. Three orthogonal solutions are: test y before x/y, so one can do something else; introduce catchable exceptions, so one can do something else; introduce contagious special objects ('inf' and 'nan'), which at some point can be tested for, so one can do something else. Python introduced 'inf' and 'nan' but did not use them to replace ZeroDivisionError. Some languages lacking exceptions introduce a contagious null object. Call it Bottom. Any operation on Bottom yields Bottom. Python is not such a language. None is anti-contagious; most operations raise an exception. I agree with Paul Moore that propagating None is generally a bad idea. It merely avoids the inevitable exception. Or is it inevitable? Trying to avoid exceptions naturally leads to the hypergeneralization of allowing '?' everywhere. Instead of trying to turn None into Bottom, I think a better solution would be a new, contagious, singleton Bottom object with every possible special method, all returning Bottom. Anyone could write such for their one use. Someone could put it on pypi to see if there how useful it would be. I agree with Ron Adam that the narrow issue is that bool(x) is False is sometimes too broad and people dislike of spelling out 'x is not None'. So abbreviate that with a unary operator; 'is not None', is a property of objects, not operators. I think 'x!' or 'x?', either meaning 'x is not None', might be better than a new binary operator. The former, x!, re-uses ! in something close to its normal meaning: x really exists. -- Terry Jan Reedy From tjreedy at udel.edu Mon Sep 21 23:28:54 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Sep 2015 17:28:54 -0400 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150921162226.GE31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921162226.GE31152@ando.pearwood.info> Message-ID: On 9/21/2015 12:22 PM, Steven D'Aprano wrote: > On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote: >> On 20.09.15 02:40, Tim Peters wrote: >>> No attempt to be minimal here. More-than-less "obvious" is more important: >>> >>> Bound methods of a SystemRandom instance >>> .randrange() >>> .randint() >>> .randbits() >>> renamed from .getrandbits() >>> .randbelow(exclusive_upper_bound) >>> renamed from private ._randbelow() >>> .choice() >> >> randbelow() is just an alias for randrange() with single argument. >> randint(a, b) == randrange(a, b+1). >> >> These functions are redundant and they have non-zero cost. > > But they already exist in the random module, so adding them to secrets > doesn't cost anything extra. It's just a reference to the bound method > of the private SystemRandom() instance: > > # suggested implementation > import random > _systemrandom = random.SystemRandom() > > randint= _systemrandom.randint > randrange = _systemrandom.randrange > > etc. > > >> Would not renaming getrandbits be confused? >> >>> Token functions >>> .token_bytes(nbytes) >>> another name for os.urandom() >>> .token_hex(nbytes) >>> same, but return string of ASCII hex digits >>> .token_url(nbytes) >>> same, but return URL-safe base64-encoded ASCII >>> .token_alpha(alphabet, nchars) >>> string of `nchars` characters drawn uniformly >>> from `alphabet` >> >> token_hex(nbytes) == token_alpha('0123456789abcdef', nchars) ? >> token_url(nbytes) == token_alpha( >> 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', >> nchars) ? > > They may be reasonable implementations for the functions, but simple as > they are, I think we still want to provide them as named functions > rather than expect the user to write things like the above. If they're > doing it more than once, they'll want to write a helper function, we > might as well provide that for them. > > -- Terry Jan Reedy From tjreedy at udel.edu Mon Sep 21 23:32:44 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Sep 2015 17:32:44 -0400 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150921162226.GE31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921162226.GE31152@ando.pearwood.info> Message-ID: On 9/21/2015 12:22 PM, Steven D'Aprano wrote: > On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote: >> randbelow() is just an alias for randrange() with single argument. >> randint(a, b) == randrange(a, b+1). >> >> These functions are redundant and they have non-zero cost. > > But they already exist in the random module, so adding them to secrets > doesn't cost anything extra. I think the redundancy in random is a mistake. The cost is confusion and extra memory load, and there need to more ofter refer to the manual, for essentially zero gain. When I read two names, I expect them to do two different things. The question is whether to propagate the mistake to a new module. -- Terry Jan Reedy From guido at python.org Mon Sep 21 23:48:38 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Sep 2015 14:48:38 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On Mon, Sep 21, 2015 at 2:23 PM, Terry Reedy wrote: > Add me to the detractors of what I have read so far ;-). > > In arithmetic, 1/0 and 0/0 both stop the calculation. My hand calculator > literally freezes until I hit 'on' or 'all clear'. Early computers also > stopped, maybe with an instruction address and core dump. Three orthogonal > solutions are: test y before x/y, so one can do something else; introduce > catchable exceptions, so one can do something else; introduce contagious > special objects ('inf' and 'nan'), which at some point can be tested for, > so one can do something else. Python introduced 'inf' and 'nan' but did > not use them to replace ZeroDivisionError. > > Some languages lacking exceptions introduce a contagious null object. Call > it Bottom. Any operation on Bottom yields Bottom. Python is not such a > language. None is anti-contagious; most operations raise an exception. > > I agree with Paul Moore that propagating None is generally a bad idea. It > merely avoids the inevitable exception. Or is it inevitable? Trying to > avoid exceptions naturally leads to the hypergeneralization of allowing '?' > everywhere. > > Instead of trying to turn None into Bottom, I think a better solution > would be a new, contagious, singleton Bottom object with every possible > special method, all returning Bottom. Anyone could write such for their one > use. Someone could put it on pypi to see if there how useful it would be. > I think this is the PyMaybe solution. What I don't like about it is that it is dynamic -- when used incorrectly (or even correctly?) Bottom could end up being passed into code that doesn't expect it. That's bad -- "if x is None" returns False when x is Bottom, so code that isn't prepared for Bottom may well misbehave. In contrast, PEP 505 only affects code that is lexically near the ? operator. (You may see a trend here. PEP 498 is also carefully designed to be locally-scoped.) > I agree with Ron Adam that the narrow issue is that bool(x) is False is > sometimes too broad and people dislike of spelling out 'x is not None'. So > abbreviate that with a unary operator; 'is not None', is a property of > objects, not operators. I think 'x!' or 'x?', either meaning 'x is not > None', might be better than a new binary operator. The former, x!, re-uses > ! in something close to its normal meaning: x really exists. I don't think the big issue is bool(x) being too broad. That's what the binary ?? operator is trying to fix, but to me the more useful operators are x?.y and x?[y], both of which would still require repetition of the part on the left when spelled using ??. This is important when x is a more complex expression that is either expensive or has a side-effect. E.g. d.get(key)?.upper() would currently have to be spelled as (some variant of) "None if d.get(key) is None else d.get(key).upper()" and the ?? operator doesn't really help for the repetition -- it would still be "d.get(key) ?? d.get(key).upper()". In general to avoid this repetition you have to introduce a local variable, but that's often awkward and interrupts the programmer's "flow". The ? solves that nicely. The key issue with this proposal to me is how it affects readability of code that uses it, given that there isn't much uniformity across languages in what ? means -- it could be part of a method name indicating a Boolean return value (Ruby) or a conditional operator (C and most of its descendents) or some kind of shortcut. So this is the issue I have to deal with (and thought I had dealt with by prematurely rejecting the PEP, but I've had a change of heart and am now waiting for the PEP to be finished). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Sep 22 00:03:41 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 21 Sep 2015 15:03:41 -0700 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: Message-ID: On Sat, Sep 19, 2015 at 11:21 AM, Brett Cannon wrote: > It would be nice to have a: >> >> from __future__ import py3 >> >> > While in hindsight having a python3 __future__ statement that just turned >> on everything would be handy, this runs the risk of breaking code by >> introducing something that only works in a bugfix release and we went down >> that route with booleans in 2.2.1 and came to regret it. >> > That may well kill the idea then, yes. Guido wrote: > It's just about these four imports, right? > from __future__ import absolute_import > from __future__ import division > from __future__ import print_function > from __future__ import unicode_literals yup, but that is enough to be a able to remember and type... and will there be more? probably not, but ..... But you are right, if we can redude that to a couple, maybe a smaller deal. > I think the case is overblown. > - absolute_import is rarely an issue; the only thing it does (despite the > name) is give an error message when you attempt a relative import without > using a "." in the import. A linter can find this easily for you, and a > little discipline plus the right example can do a lot of good here. > sure -- but this one is more for the learners -- these things are confusing -- and getting something that works in one version of python but not another will be more confusing, still. And much as I wish everyone would use a good linter.... > - division is important. > - print_function is important. > so maybe those two are enough.... > - unicode_literals is useless IMO. It breaks some things (yes there are > still APIs that don't take unicode in 2.7) and it doesn't nearly as much as > what would be useful -- e.g. repr() and .readline() still return > 8-bit strings. I recommend just using u-literals and abandoning Python 3.2. hmm -- I find myself doing an unholy mess of u"" and "". And I tried teaching an intro class where I used u"" everywhere -- the students were pretty confused about why they were typing all those u-s, particularly since it didn't seem to make any difference. Sure there is breakage, but there is breakage on some of these between py2 and py3 anway -- APIs that return py2 strings on py2... So unicode_literals is still useful to me. But Brett was probably right -- somethign minor but useful but only works on the very latest bug-fix release is probably an attractive nuisance more than anything else. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Tue Sep 22 00:08:42 2015 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 21 Sep 2015 17:08:42 -0500 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <59D14571-7711-46DF-8ADF-817DA543E6CE@yahoo.com> References: <20150919181612.GT31152@ando.pearwood.info> <20150921161059.GC31152@ando.pearwood.info> <59D14571-7711-46DF-8ADF-817DA543E6CE@yahoo.com> Message-ID: [Steven] >>> When would somebody use randbelow(n) rather than randrange(n)? [Tim] >> For the same reason they'd use randbits(n) instead of randrange(1 << >> n) ;-) That is, familiarity and obviousness. randrange() has a >> complicated signature, with 1 to 3 arguments, and endlessly surprises >> newbies who _expect_, e.g., randrange(3) to return 3 at times. That's >> why randint() was created. [Andrew Barnert ] > Anyone who gets confused by randrange(3) also gets > confused by range(3), True! > and they have to learn pretty quickly. And they do. And then, in a rush, they slip up. > Also, randint wasn't created to allow people to put off > learning that fact. It was created before randrange, > because Adrian Baddeley didn't realize that Python consistently > used half-open ranges, and Guido didn't notice. After 1.5 was > out and someone complained that choice(range(...)) was > inefficient, Guido added randrange. See the commit comment > (61464037da53) which says "This addresses the problem that > randint() was accidentally defined as taking an inclusive range > (how unpythonic)".Also, some guy named Tim Peters > convinced Guido that randint(0, 2.5) was surprisingly broken, > so if he wasn't going to remove it he should reimplement it as > randrange(a, b+1), which would give a clear error message. Goodness - you seem to believe there's virtue in remembering things in the order they actually happened. Hmm. I'll try that sometime, but I'm dubious ;-) > ... > I suppose randbelow could be implemented as an alias to > randrange(a), or it could copy and paste the same type > checks as randrange, randbelow() is already implemented, in current Pythons, although as a class-private method (Random._randbelow()). It's randrange() that's implemented by calling ._randbelow() now. To expose it on its own, it should grow a check that its argument is an integer > 0 (as a private method, it currently assumes it won't be called with an insane argument). > but honestly, I don't think anyone needs it. Of the four {randbelow, randint, randrange, randbits}, any can be implemented via any of the other three. You chopped what I considered to be "the real" point: >> "randbelow(n)" has a dirt-simple signature, and its name makes >> it hard to mistakenly believe `n` is a possible return value. That's what gives it value. Indeed, if minimality crusaders are determined to root out redundancy, randbelow is the only one of the four I'd keep. From tjreedy at udel.edu Tue Sep 22 00:45:04 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Sep 2015 18:45:04 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On 9/21/2015 5:48 PM, Guido van Rossum wrote: > On Mon, Sep 21, 2015 at 2:23 PM, Terry Reedy > > wrote: > I agree with Paul Moore that propagating None is generally a bad > idea. It merely avoids the inevitable exception. To me, this is the key idea in opposition to proposals that make propagating None easier. > I don't think the big issue is bool(x) being too broad. That's what the > binary ?? operator is trying to fix, but to me the more useful operators > are x?.y and x?[y], both of which would still require repetition of the > part on the left when spelled using ??. > > This is important when x is a more complex expression that is either > expensive or has a side-effect. E.g. d.get(key)?.upper() would currently > have to be spelled as (some variant of) > "None if d.get(key) is None else d.get(key).upper()" > and the ?? operator doesn't really help for the > repetition -- it would still be "d.get(key) ?? d.get(key).upper()". > > In general to avoid this repetition you have to introduce a local > variable, but that's often awkward and interrupts the programmer's > "flow". try: x = d.get(key).upper() except AttributeError: x = None is also a no-repeat equivalent when d.values are all strings. I agree than "x = d.get(key)?.upper()" is a plausible abbreviation. But I am much more likely to want "x = ''" or another exception as the alternative. I guess some other pythonistas like keeping None around more than I do ;-). -- Terry Jan Reedy From guido at python.org Tue Sep 22 00:54:32 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Sep 2015 15:54:32 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On Mon, Sep 21, 2015 at 3:45 PM, Terry Reedy wrote: > On 9/21/2015 5:48 PM, Guido van Rossum wrote: > >> On Mon, Sep 21, 2015 at 2:23 PM, Terry Reedy >> > > wrote: >> > > I agree with Paul Moore that propagating None is generally a bad >> idea. It merely avoids the inevitable exception. >> > > To me, this is the key idea in opposition to proposals that make > propagating None easier. > (I didn't write that, you [Terry] did. It looks like our mailers don't understand each other's quoting conventions. :-( ) > I don't think the big issue is bool(x) being too broad. That's what the >> binary ?? operator is trying to fix, but to me the more useful operators >> are x?.y and x?[y], both of which would still require repetition of the >> part on the left when spelled using ??. >> >> This is important when x is a more complex expression that is either >> expensive or has a side-effect. E.g. d.get(key)?.upper() would currently >> have to be spelled as (some variant of) >> > > "None if d.get(key) is None else d.get(key).upper()" > > and the ?? operator doesn't really help for the > >> repetition -- it would still be "d.get(key) ?? d.get(key).upper()". >> >> In general to avoid this repetition you have to introduce a local >> variable, but that's often awkward and interrupts the programmer's >> "flow". >> > > try: > x = d.get(key).upper() > except AttributeError: > x = None > > is also a no-repeat equivalent when d.values are all strings. I agree > than "x = d.get(key)?.upper()" is a plausible abbreviation. But I am much > more likely to want "x = ''" or another exception as the alternative. I > guess some other pythonistas like keeping None around more than I do ;-). > Eew. That try/except is not only very distracting and interrupts the flow of both the writer and the reader, it may also catch errors, e.g. what if the method being called raises an exception (not a problem with upper(), but definitely with user-defined methods). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Tue Sep 22 00:56:03 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 21 Sep 2015 16:56:03 -0600 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: <56008B03.6020704@oddbird.net> On 09/21/2015 04:45 PM, Terry Reedy wrote: > On 9/21/2015 5:48 PM, Guido van Rossum wrote: >> On Mon, Sep 21, 2015 at 2:23 PM, Terry Reedy >> > > wrote: > >> I agree with Paul Moore that propagating None is generally a bad >> idea. It merely avoids the inevitable exception. > > To me, this is the key idea in opposition to proposals that make > propagating None easier. [...] > I guess some other pythonistas like keeping None around > more than I do ;-). I think it's one of those things that depends on what you're doing. From a web-development perspective, you rarely keep _anything_ around for very long, so there's rarely an issue of `None` sneaking in somewhere unexpectedly and then causing a surprise exception way down the line. Typical use cases are things like: "If this database query returns a User, I want to get their name and return that in the JSON dict from my API, otherwise I want None, which will be serialized to a JSON null, clearly indicating that there is no user here." My jaw dropped a bit when I saw it asserted in this thread that functions returning "useful value or None" is an anti-pattern. I write functions like that all the time, and I consider it a useful and necessary Python idiom. I would hate to rewrite all that code to either deal with exceptions or add default-value-argument boilerplate to all of them; when "no result" is an expected and normal possibility from a function, letting the calling code deal with None however it chooses is much nicer than either of those options. I don't love the ? syntax, but I would certainly use the feature discussed here happily and frequently. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From chris.barker at noaa.gov Mon Sep 21 23:55:24 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 21 Sep 2015 14:55:24 -0700 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: References: Message-ID: On Mon, Sep 21, 2015 at 2:29 AM, Nick Coghlan wrote: > For folks using IPython Notebook, I've been suggesting to various > folks that a "Python 2/3 compatible" kernel that enables these > features by default may be desirable. That would be nice, yes. And not had to do. But in a way, I struggle with getting new-to-programming scientists to make the transitions from interactive code in a notebook to re-usable module -- one more thing that would break when they did that would be too bad. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Tue Sep 22 01:47:13 2015 From: random832 at fastmail.com (Random832) Date: Mon, 21 Sep 2015 19:47:13 -0400 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: <1442879233.2295250.389924553.305406F9@webmail.messagingengine.com> On Mon, Sep 21, 2015, at 17:48, Guido van Rossum wrote: > This is important when x is a more complex expression that is either > expensive or has a side-effect. E.g. d.get(key)?.upper() would currently > have to be spelled as (some variant of) "None if d.get(key) is None else > d.get(key).upper()" and the ?? operator doesn't really help for the > repetition -- it would still be "d.get(key) ?? d.get(key).upper()". ?? is meant to use the right if the left *is* null, as I understand it. So this isn't a problem it solves at all. From guido at python.org Tue Sep 22 01:51:50 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Sep 2015 16:51:50 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <1442879233.2295250.389924553.305406F9@webmail.messagingengine.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> <1442879233.2295250.389924553.305406F9@webmail.messagingengine.com> Message-ID: On Mon, Sep 21, 2015 at 4:47 PM, Random832 wrote: > On Mon, Sep 21, 2015, at 17:48, Guido van Rossum wrote: > > This is important when x is a more complex expression that is either > > expensive or has a side-effect. E.g. d.get(key)?.upper() would currently > > have to be spelled as (some variant of) "None if d.get(key) is None else > > d.get(key).upper()" and the ?? operator doesn't really help for the > > repetition -- it would still be "d.get(key) ?? d.get(key).upper()". > > ?? is meant to use the right if the left *is* null, as I understand it. > So this isn't a problem it solves at all. > Sorry, my bad. Indeed, x ?? y tries to fix the issue that "x or y" uses y if x is falsey. Still this seems a lesser problem to me than the problem solved by x?.a and x?[y]. Most of the time in my code it is actually fine to use the default if the LHS is an empty string. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Sep 22 01:56:24 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Sep 2015 08:56:24 +0900 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150921174758.GF31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> Message-ID: <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I wouldn't include punctuation [in the password alphabet] by > default, as too many places still prohibit some, or all, > punctuation characters. Do you really expect users to choose their own random passwords using this function? I would expect that this function would be used for initial system-generated passwords (or system-enforced random passwords), and the system would have control over the admissible set. But users who have to conform to somebody else's rules much prefer obfuscated passwords that pass strength tests to random passwords in my experience. BTW, the last time I had to set a password that didn't allow the full set of 94 printable ASCII characters, uppercase letters were forbidden (silently -- it was documented in the help but not on the password change form, I had no idea why my first three suggestions were rejected). Go figure. From bruce at leban.us Tue Sep 22 02:16:47 2015 From: bruce at leban.us (Bruce Leban) Date: Mon, 21 Sep 2015 17:16:47 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <56008B03.6020704@oddbird.net> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> <56008B03.6020704@oddbird.net> Message-ID: On Mon, Sep 21, 2015 at 3:56 PM, Carl Meyer wrote: > > My jaw dropped a bit when I saw it asserted in this thread that > functions returning "useful value or None" is an anti-pattern. I write > functions like that all the time, and I consider it a useful and > necessary Python idiom. I would hate to rewrite all that code to either > deal with exceptions or add default-value-argument boilerplate to all of > them; when "no result" is an expected and normal possibility from a > function, letting the calling code deal with None however it chooses is > much nicer than either of those options. > +1 Some language features are "prescriptive," designed to encourage particular ways of writing things. Others are "respective," recognizing the variety of ways people write things and respecting that variety. Python has None and generally respects use of it. To say that using None is an anti-pattern is something I would strongly disagree with. Yes, NPE errors are a problem, but eliminating null/None does not eliminate those errors. It merely replaces one common error with an assortment of other errors. I like the feature. I have been asking for features like this for years and the number of times I have written the longer forms is too many to count. I like the ?. ?[] ?() ?? syntax. I think: (1) it's strongly related to the . [] () syntax; (2) any syntax that uses a keyword is either not syntactically related to . [] () or mixes a keyword and punctuation, both of which I dislike; (3) it's the same syntax as used in other languages (yes, Python is not C# or Dart but there's a good reason Python uses ^ for xor, ** for power, += for add to, etc.) --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Tue Sep 22 02:34:43 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 21 Sep 2015 17:34:43 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On Mon, Sep 21, 2015 at 2:48 PM, Guido van Rossum wrote: > > In general to avoid this repetition you have to introduce a local variable, > but that's often awkward and interrupts the programmer's "flow". The ? > solves that nicely. The key issue with this proposal to me is how it affects > readability of code that uses it, given that there isn't much uniformity > across languages in what ? means -- it could be part of a method name > indicating a Boolean return value (Ruby) or a conditional operator (C and > most of its descendents) or some kind of shortcut. > > So this is the issue I have to deal with (and thought I had dealt with by > prematurely rejecting the PEP, but I've had a change of heart and am now > waiting for the PEP to be finished). > As we are in the process of writing a PEP and uses of ?/?? in other languages, why not speak about thecurrent usage of `?` / `??` in the Python community ? (I'll try to state only facts, excuse any passage that might seem like a personal opinion) Can the PEP include the fact that `? and `??` have been in use in the Scientific Python community for 10 to 14 years now, and that any Scientific Python user who have touched IPython will tell you that ? and ?? are for getting help. (?? try to pull the source, while ? does not, but let's not get into details). This include the fact that any IDE (like spyder) which use IPython under the hood have this feature. The usage of `?` is even visible on Python.org main page [3] (imgur screenshot), which invite the user to launch an interactive console saying: > object? -> Details about 'object', use 'object??' for extra details. > In [1]: leading the casual user thinking that this is a Python feature. This fact is even including in Books and introduction to python. Sometime without mentioning that the feature is IPython Specific, and does not work in Python repl/scripts. Examples in Cyrile's rossant "Learning IPython for Interactive Computing and Data Visualization"[1] introduce Python with the second code/repl example beeing about `?`. Book extract : > Some of these commands let you get some help or information about any > Python function or object. For instance, have you ever had a doubt about how > to use the super function to access parent methods in a derived class? Just type > `super?` and you?ll find out. Appending `?` to any command or variable gives you all > the information you need about it. > In [1]: super? > Type: type > String Form: > Namespace: Python builtin > ... A google search also give for eaxample: Python for beginners online tutorial[2] which does rapidly the same: Tuto snippet: > The "?" is very useful. If you type in `?` after a `len?`, you will see the > documentation about the function len. > Typing `?` after a name will give you information about the object attached to that name. Doing even worse as they replace the IPython prompt `In[x]:` with `>>>` literally showing that `>>> len?` works. Which imply that it should work on a plain Python REPL. As someone that have to regularly teach Python, and interact with new Python users, it will be hard to explain that `?` and `??` have different meaning depending on the context, and that most book on Scientific Python are wrong/inaccurate. >From the current state of the PEP/proposal I'm guessing we should be able to distinguish Null Coalescing operation (or whatever name you want to give them) in Python 3.6+ from actual help request, and that this will allow us to keep backward compatibility with 10+ years of code/user habits, but the result will most likely be confusing. It will be even harder if we have to remove the usage of `?`/`??`[4]. I also want to note that the use of `?`/`??` is not present to just being or end of identifiers as it can also be used use to search for names: > In [1]: *int*? > FloatingPointError > int > print But this usage is not as widespread as extracting help about objects, and seem less relevant, though I'm not sure: > In [10]: ?Float*Error > FloatingPointError > > In [12]: Uni*Error? > UnicodeDecodeError > UnicodeEncodeError > UnicodeError > UnicodeTranslateError Please take these fact into consideration when making a decision/writing the Pep. Thanks, -- M [1]: That's one of the only book for which I have (legally) the sources, and that I bother to grepped through. [2]: http://www.pythonforbeginners.com/basics/ipython-a-short-introduction [3]: http://imgur.com/d0Vs7Xr [4]: I'll have to hire a body guard to prevent people to pursue me to the end of the earth with a chainsaw. I'm sure you know that feeling. From rosuav at gmail.com Tue Sep 22 02:44:13 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 22 Sep 2015 10:44:13 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On Tue, Sep 22, 2015 at 10:34 AM, Matthias Bussonnier wrote: > On Mon, Sep 21, 2015 at 2:48 PM, Guido van Rossum wrote: > As we are in the process of writing a PEP and uses of ?/?? in other languages, > why not speak about thecurrent usage of `?` / `??` in the Python community ? > > Can the PEP include the fact that `? and `??` have been in use in the Scientific > Python community for 10 to 14 years now, and that any Scientific Python user who > have touched IPython will tell you that ? and ?? are for getting help. > (?? try to pull the source, while ? does not, but let's not get into details). > > This include the fact that any IDE (like spyder) which use IPython under > the hood have this feature. > > I also want to note that the use of `?`/`??` is not present to just being or end > of identifiers as it can also be used use to search for names: > >> In [1]: *int*? >> FloatingPointError >> int >> print > > But this usage is not as widespread as extracting help about objects, > and seem less relevant, though I'm not sure: > >> In [10]: ?Float*Error >> FloatingPointError >> >> In [12]: Uni*Error? >> UnicodeDecodeError >> UnicodeEncodeError >> UnicodeError >> UnicodeTranslateError > > Please take these fact into consideration when making a > decision/writing the Pep. Are there any uses like this that would involve a question mark followed by some other punctuation? The main proposal would be for x?.y and similar; if "x ?? y" can't be used because of a conflict with ipython, I'm sure it could be changed ("x ?! y" would be cute). ChrisA From bussonniermatthias at gmail.com Tue Sep 22 04:00:26 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 21 Sep 2015 19:00:26 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: Hi Chris, On Mon, Sep 21, 2015 at 5:44 PM, Chris Angelico wrote: > > Are there any uses like this that would involve a question mark > followed by some other punctuation? The main proposal would be for > x?.y and similar; if "x ?? y" can't be used because of a conflict with > ipython, As far as I can tell, no `?`/`??` behave in IPython like a unary operator[1] and don't conflict. We could distinguish from any use case so far mentioned in this discussion (as far as I can tell). I just hope that the PEP will not slide in the direction of `foo?` alone being valid and equivalent to `maybe(x)`, or returning an object. I'm concern about teaching/semantics as for me `x?.y` read like "The attribute y of the help of x" so roughly `help(x).y`. The `x ?? y` is less ambiguous (imho) as it is clearly an operator with the space on each side. > I'm sure it could be changed ("x ?! y" would be cute). Is that the WTF operator ? [2] Joke aside, ?! should not conflict either, but `!` and `!!` also have their own meaning in IPython Like the following is valid (which is a useless piece of code, but to show the principles) if you are interested in the Python syntax extension we have. my_files = ! ls ~/*.txt for i,file in enumerate(my_files): raw = !cat $file !cat $raw > {'%s-%s'%(i,file.upper())} Thanks, -- M [1]: but is not really an operator, I'm not sure what a=print? ; a =?print or ?a=print would do. [2]: http://stackoverflow.com/questions/7825055/what-does-the-c-operator-do > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Tue Sep 22 04:15:46 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 12:15:46 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <20150920073157.GV31152@ando.pearwood.info> <20150921144130.GB31152@ando.pearwood.info> Message-ID: <20150922021546.GK31152@ando.pearwood.info> On Mon, Sep 21, 2015 at 04:23:03PM -0400, Terry Reedy wrote: > try: > process(re.match(needle, haystack).group()) > except AttributeError: # no match > no_needle() > > is equivalent unless process can also raise AttributeError. It is difficult to guarantee what exceptions a function will, or won't, raise. Even if process() is documented as "only raising X", I wouldn't be confident that the above might not disguise a bug in process as "no needle". This is why the standard idiom for using re.match is to capture the result first, then test for truthiness (a MatchObject), or None-ness, before processing it. -- Steve From ron3200 at gmail.com Tue Sep 22 04:18:33 2015 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 21 Sep 2015 21:18:33 -0500 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: On 09/21/2015 03:44 PM, Terry Reedy wrote: > On 9/21/2015 11:28 AM, Ron Adam wrote: > >> We could add a "not None" specific boolean operators just by appending ! >> to them. >> >> while! x: <--> while x != None: >> if! x: <--> if x != None: >> >> a or! b <--> b if a != None else a >> a and! b <--> a if a != None else b >> not! x <--> x if x != None else None > > '!= None' should be 'is not None' in all examples. Yes > Since 'is not None' > is a property of the object, so I think any abbreviation should be > applied to the object, not the operator. "while x!", etcetera My observation is that because None is the default return value from functions (and methods), it is already a special case. While that isn't directly related to None values in general, I think it does lend weight to treating it specially. Having None specific bool-type operators is both cleaner and more efficient and avoids issues with false values. The byte code might look like this... >>> def value_or(x, y): ... return x or! y ... >>> dis(value_or) 2 0 LOAD_FAST 0 (x) 3 JUMP_IF_NONE_OR_POP 9 6 LOAD_FAST 1 (y) >> 9 RETURN_VALUE It would not be sensitive to False values. Applying the op to the object wouldn't quite work as expected. What would x! return? True, or the object if not None? And if it returns the object, what does this do when the value is 0 or False, or an empty container. result = x! or y # not the same as result = x or! y The maybe(x) function would work the same as x! in this case. I also think a trailing unary operator is kind of weird. But I get the feeling from Guido response about over generalizations that it may be too big of a change, and I agree it is a new concept I haven't seen anywhere else. Maybe one to think about over time, and not to rush into. Cheers, Ron From stephen at xemacs.org Tue Sep 22 04:34:45 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Sep 2015 11:34:45 +0900 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> Message-ID: <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> Matthias Bussonnier writes: > Can the PEP include the fact that `? and `??` have been in use in > the Scientific Python community for 10 to 14 years now, and that > any Scientific Python user who have touched IPython will tell you > that ? and ?? are for getting help. But the syntax is extremely restrictive. Both "None? 1 + 2" and "None ?" are SyntaxErrors, as are "a?.attr" and even "*int*? ". Prefixing the help operator also gives help, and in that case whitespace may separate the operator from the word (apparently defined as non-whitespace, and any trailing detritus is ignored). Perhaps the prefix form (a little less natural for people coming directly from natural languages, I guess) should be emphasized -- there are no proposals for unary prefix use of "?" or "??". So, this is the kind of DWIM that I doubt will confuse many users, at least not for very long. Do you envision a problem keeping IPython facilities separate from Python language syntax? Technically, I think the current rules that divide valid IPython requests for help from Python syntax (valid or invalid) should continue to work. Whether it would confuse users, I doubt, but there are possible surprises for some (many?) users, I suppose. Definitely, it should be mentioned in the PEP, but Python syntax is something that Python defines; shells and language variants have to be prepared to deal with Python syntax changes. > leading the casual user thinking that this is a Python feature. Casual users actually expect software to DWIM in my experience. The rule "leading help or stuck-right-on-the-end help works, elsewhere it means something else" (including no meaning == SyntaxError) is intuitively understood by them already, I'm sure. Also, I rather doubt that "casual users" will encounter "?." or "??" until they're not so casual anymore. > As someone that have to regularly teach Python, and interact with > new Python users, it will be hard to explain that `?` and `??` have > different meaning depending on the context, I've never had a question about the context-sensitivity of "%" in IPython. Have you? > and that most book on Scientific Python are wrong/inaccurate. I'm afraid that's Scientific Python's cross to bear, not Python's. > It will be even harder if we have to remove the usage of > `?`/`??`[4]. Not to worry about that. IPython can define its own syntax for parsing out help requests vs. Python syntax. I doubt you'll have to modify the current rules in any way. > I also want to note that the use of `?`/`??` is not present to just > being or end of identifiers as it can also be used use to search > for names: > > > In [1]: *int*? But again "*int* ?" is a SyntaxError. This is the kind of thing most casual users can easily work with. (At least speakers of American English. In email text, my Indian students love to separate trailing punctuation from the preceding word for some reason, but they would certainly learn quickly that you can't do that in IPython.) Again, I agree it would be useful to mention this in the PEP, but as far as I can see there really isn't a conflict. The main thing I'd want to know to convince me there's a risk would be if a lot of users are confused by "%quickref" (an IPython command) vs. "3 % 2" (a Python expression). From steve at pearwood.info Tue Sep 22 05:15:08 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 13:15:08 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <55FF8CFD.9070906@mail.de> Message-ID: <20150922031507.GL31152@ando.pearwood.info> On Mon, Sep 21, 2015 at 05:23:42PM -0400, Terry Reedy wrote: > I agree with Paul Moore that propagating None is generally a bad idea. As I understand it, you and Paul are describing a basic, simple idiom which is ubiquitous across Python code: using None to stand in for "no such value" when the data type normally used doesn't otherwise have something suitable. Consequently I really don't understand what you and Paul have against it. > It merely avoids the inevitable exception. I think you meant to say it merely *postpones* the inevitable exception. But that's wrong, there's nothing inevitable about an exception here. It's not *hard* to deal with "value-or-None". It's just tedious, which is why a bit of syntactic sugar may appeal. [...] > Instead of trying to turn None into Bottom, I think a better solution > would be a new, contagious, singleton Bottom object with every possible > special method, all returning Bottom. Anyone could write such for their > one use. Someone could put it on pypi to see if there how useful it > would be. In one of my earlier posts, I discussed this Null object design pattern. I think it is an anti-pattern. If people want to add one to their own code, it's their foot, but I certainly don't want to see it as a built-in. Thank goodness Guido has already ruled that out :-) > I agree with Ron Adam that the narrow issue is that bool(x) is False is > sometimes too broad and people dislike of spelling out 'x is not None'. I don't think that is the motivation of the original proposal, nor is it one I particularly care about. I think that there is a level of inconvenience below which it's not worth adding yet more syntax just to save a few characters. That inconvenience is not necessarily just to do with the typing, it may be conceptual, e.g. we have "x != y" rather than "not x == y". I think that x is not None fails to reach that minimum level of inconvenience to justify syntactic sugar, but obj.method() if x is not None else None does exceed the level. So I am mildly interested in null-coalescing versions of attribute and item/key lookup, but not at all interested in making the "x is not None" part *alone* shorter. > So abbreviate that with a unary operator; 'is not None', is a property > of objects, not operators. I think 'x!' or 'x?', either meaning 'x is > not None', might be better than a new binary operator. The former, x!, > re-uses ! in something close to its normal meaning: x really exists. Bring it back to the original post's motivating use-case. Slightly paraphrased, it was something like: value = obj.method() if obj is not None else None Having x! as a short-cut for "x is not None" makes this a bit shorter to write: value = obj.method() if obj! else None but it is still very boilerplatey and verbose compared to the suggested: value = obj?.method() -- Steve From bussonniermatthias at gmail.com Tue Sep 22 05:21:49 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 21 Sep 2015 20:21:49 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Hi Stephen, Thanks for the response and the time you took to investigate, > On Sep 21, 2015, at 19:34, Stephen J. Turnbull wrote: > > But the syntax is extremely restrictive. Both "None? 1 + 2" and > "None ?" are SyntaxErrors, as are "a?.attr" and even "*int*? ". > Prefixing the help operator also gives help, and in that case > whitespace may separate the operator from the word (apparently defined > as non-whitespace, and any trailing detritus is ignored). Perhaps the > prefix form (a little less natural for people coming directly from > natural languages, I guess) should be emphasized -- there are no > proposals for unary prefix use of "?" or "???. Yes I?m not worrying for the time being, the current syntax proposal does not conflict, and I just wanted to describe a few usage and reference to consideration in the PEP. I prefer to give the PEP authors all the cards so that they can work out a proposal that fits majority of the people. I don?t want to let people write a pep, go through iteration and once they are happy with it complain that it does not fit my needs. > So, this is the kind of DWIM that I doubt will confuse many users, at > least not for very long. I do not think we are in contact with the same users. Yes, I see users confused by % syntax, just last Thursday, amy neighbor reported my that IPython was printing only for element of tuple, but was working for list: In [1]: print((1)) 1 In [2]: print([1]) [1] Yes people do get confused for %magics, vs modulo vs %-format, less because module is number, % for strings. But it gets betterwith time. > Do you envision a problem keeping IPython > facilities separate from Python language syntax? Technically, I think > the current rules that divide valid IPython requests for help from > Python syntax (valid or invalid) should continue to work. For no with current proposal, no no problem to keep them separate. > Whether it > would confuse users, I doubt, but there are possible surprises for > some (many?) users, I suppose. I can see the Python `??`/`?` vs IPython `?`/`??` being one explicit point in our docs/teaching/tutorial. I cannot say how much confusion this will be into our user head, I think the greater confusion will be the double meaning plus the fact that?s a Python 3.6+ only feature. So I doubt it will be taught before a few years. Though it is still another small difficulty. I guess we will start to get this king of experience with 3.5 now that @ is there both for decorator and __matmul__ (thanks for 3.5 in general BTW) > Definitely, it should be mentioned in the PEP, Thanks, > but Python syntax is > something that Python defines; shells and language variants have to be > prepared to deal with Python syntax changes. Yes, we are prepared, but our user don?t always understand :-) I?m wondering if there wouldn?t be a way for interpreter to a actually help Python beta-test some syntax changes, at least at the REPL level. Like website do user testing. > >> leading the casual user thinking that this is a Python feature. > > Casual users actually expect software to DWIM in my experience. The > rule "leading help or stuck-right-on-the-end help works, elsewhere it > means something else" (including no meaning == SyntaxError) is > intuitively understood by them already, I'm sure. Also, I rather > doubt that "casual users" will encounter "?." or "??" until they're > not so casual anymore. In my domain people get confronted to advance syntax really rapidly, one feedback that I have for such weird syntax, especially when you are new to python, is that you don?t even now how to Google for this kind of thing (especially for non english speaker). Trying to put a name on *arg and **kwarg is hard, google is starting to get better, but still ignore chars like ?,+,- The google search for `Python` and `Python ??` seem to be identical. >> As someone that have to regularly teach Python, and interact with >> new Python users, it will be hard to explain that `?` and `??` have >> different meaning depending on the context, > > I've never had a question about the context-sensitivity of "%" in > IPython. Have you? Cf above yes, but more form the side where people don?t get what modulo mean, but fair enough it was non native english, and they were confused by indent/implement/increment being roughly the same word with 3 completely different meaning. Though the number of people I see using modulo is low, then they use numpy.mod on arrays once we get to numerics. And in string formatting we push for .format(*args, **kwargs). But I?ll try to gather some statistics around. > >> and that most book on Scientific Python are wrong/inaccurate. > > I'm afraid that's Scientific Python's cross to bear, not Python's. > >> It will be even harder if we have to remove the usage of >> `?`/`??`[4]. > > Not to worry about that. IPython can define its own syntax for > parsing out help requests vs. Python syntax. I doubt you'll have to > modify the current rules in any way. I hope that won?t change in the final PEP, and I?m not too worry, worse case we use more reg-ex, and shift the problem elsewhere :-) > >> I also want to note that the use of `?`/`??` is not present to just >> being or end of identifiers as it can also be used use to search >> for names: >> >>> In [1]: *int*? > > But again "*int* ?" is a SyntaxError. This is the kind of thing most > casual users can easily work with. (At least speakers of American > English. In email text, my Indian students love to separate trailing > punctuation from the preceding word for some reason, but they would > certainly learn quickly that you can't do that in IPython.) French also separate punctuation, I?m still torn on that. But if we have to slightly change the rules, so be it. > > Again, I agree it would be useful to mention this in the PEP, but as > far as I can see there really isn't a conflict. Happy you agree on point 1, and that you confirm point 2. > The main thing I'd > want to know to convince me there's a risk would be if a lot of users > are confused by "%quickref" (an IPython command) vs. "3 % 2" (a Python > expression). Will try to get more qualitative info. Thanks, -- M From steve at pearwood.info Tue Sep 22 05:23:02 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 13:23:02 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921162226.GE31152@ando.pearwood.info> Message-ID: <20150922032302.GM31152@ando.pearwood.info> On Mon, Sep 21, 2015 at 05:32:44PM -0400, Terry Reedy wrote: > On 9/21/2015 12:22 PM, Steven D'Aprano wrote: > >On Sun, Sep 20, 2015 at 09:00:08AM +0300, Serhiy Storchaka wrote: > > >>randbelow() is just an alias for randrange() with single argument. > >>randint(a, b) == randrange(a, b+1). > >> > >>These functions are redundant and they have non-zero cost. > > > >But they already exist in the random module, so adding them to secrets > >doesn't cost anything extra. > > I think the redundancy in random is a mistake. The cost is confusion > and extra memory load, and there need to more ofter refer to the manual, > for essentially zero gain. Sorry, I don't understand what you mean. Do you mean that it is a mistake for the random module to have randint and randrange? Or that it is a mistake for the secrets module to include functions that the random module includes? > When I read two names, I expect them to do > two different things. The question is whether to propagate the mistake > to a new module. If you are referring to randint versus randrange, they do do different things. Look at their signatures. randint(a, b) follows the ubiquitous API of "generate a random integer from the closed range a through b inclusive". randrange([start,] end [, step]) follows the Python practice of specifying a half-open interval, and has a more complex signature. Even though randrange is more Pythonic, I've never actually used it. randint is always what I've wanted. E.g. def die(): # Roll a die. return randint(1, 6) is far more natural than randrange(1, 7), Pythonic half-open intervals or not. But I'm satisfied that others may think differently, and by Tim's argument that excluding one or the other will be more confusing than including them both. -- Steve From steve at pearwood.info Tue Sep 22 05:34:36 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 13:34:36 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150919181612.GT31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <20150922033436.GN31152@ando.pearwood.info> I have discovered that there is already a "secrets" module on PyPI: https://pypi.python.org/pypi/secrets (Thanks to Robert Collins who has brought this to my attention.) Personally, I don't think we should necessarily rule out re-using the name in the standard library. Does anyone have strong feelings either way? -- Steve From steve at pearwood.info Tue Sep 22 05:40:44 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Sep 2015 13:40:44 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150922034044.GO31152@ando.pearwood.info> On Tue, Sep 22, 2015 at 08:56:24AM +0900, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I wouldn't include punctuation [in the password alphabet] by > > default, as too many places still prohibit some, or all, > > punctuation characters. > > Do you really expect users to choose their own random passwords using > this function? I don't know. Perhaps they will. I'm not entirely sure what the use-case of this password generator is, since I'm pretty sure that "real" password generators have to deal with far more complicated rules. > I would expect that this function would be used for > initial system-generated passwords (or system-enforced random > passwords), and the system would have control over the admissible set. Perhaps so. But then how does the application get the password to the user? Via unencypted email, like mailman does? I expect that the only use-case for an application generating a password for the user would be "low security" applications where the password has low value. But maybe others disagree. I don't really have a strong opinion one way or another. -- Steve From brenbarn at brenbarn.net Tue Sep 22 05:40:59 2015 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Mon, 21 Sep 2015 20:40:59 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <56008B03.6020704@oddbird.net> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> <56008B03.6020704@oddbird.net> Message-ID: <5600CDCB.1010707@brenbarn.net> On 2015-09-21 15:56, Carl Meyer wrote: > My jaw dropped a bit when I saw it asserted in this thread that > functions returning "useful value or None" is an anti-pattern. I write > functions like that all the time, and I consider it a useful and > necessary Python idiom. I would hate to rewrite all that code to either > deal with exceptions or add default-value-argument boilerplate to all of > them; when "no result" is an expected and normal possibility from a > function, letting the calling code deal with None however it chooses is > much nicer than either of those options. I agree that it's a fine thing. The thing is, it's an API choice. If your API is "return such-and-such or None", then anyone who calls your function knows they have to check for None and do the right thing. I think this is fine if None really does indicate something like "no result". (The re module uses None return values this way.) It seems to me that a lot of the "problem" that these null-coalescing proposals are trying to solve is dealing with APIs that return None when they really ought to be raising an exception or returning some kind of context-appropriate empty value. If you're doing result = someFunction() and then result.attr.upper() and it's failing because result.attr is None, to me that's often a sign that the API is fragile, and the result object that someFunction returns should have its attr set to an empty string, not None. In other words, if you really want "a null result that I can call all kinds of string methods on and treat it like a string", you should be returning an empty string. If you want "a null result I can subscript and get an integer", you should be returning some kind of defaultdict-like object that has a default zero value. Or whatever. There isn't really such a thing as "an object to which I want to be able to do absolutely anything and have it work", because there's no type-general notion of what "work" means. From a duck-typing perspective, if you expect users to try to do anything with a value you return, what they might reasonably want to do should be a clue as to what kind of value you should return. That still leaves the use-case where you're trying to interoperate with some external system that may have missing values, but I don't see that as super compelling. Getting an exception when you do some['big']['json']['object']['value'] and one of the intermediate ones isn't there is a feature; the bug is the JavaScripty mentality of just silently passing around "undefined". To my mind, Python APIs that wrap such external data sources should ideally take the opportunity to improve on them and make them more Pythonic, by providing sensible, context-relevant defaults instead of propagating a generic "null" value willy-nilly. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From abarnert at yahoo.com Tue Sep 22 05:59:07 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 21 Sep 2015 20:59:07 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > On Sep 21, 2015, at 19:34, Stephen J. Turnbull wrote: > > Definitely, it should be mentioned in the PEP, but Python syntax is > something that Python defines; shells and language variants have to be > prepared to deal with Python syntax changes. IIRC, back when the ternary conditional was suggested, and the C-style ?: was proposed, Guido declared that no way was his language ever going to use ? an operator. So if IPython took that as a promise that they can use ? without fear of ambiguity, you can't blame them too much.... I'm not saying they have a right to expect/demand that Guido never change his mind about anything anywhere ever, just that maybe they get a little extra consideration on backward compatibility with their use of ? than with their use of ! or % (which have been in use as operators or parts of operators for decades). From p.f.moore at gmail.com Tue Sep 22 10:01:23 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 22 Sep 2015 09:01:23 +0100 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150922033436.GN31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150922033436.GN31152@ando.pearwood.info> Message-ID: On 22 September 2015 at 04:34, Steven D'Aprano wrote: > I have discovered that there is already a "secrets" module on PyPI: > > https://pypi.python.org/pypi/secrets > > > (Thanks to Robert Collins who has brought this to my attention.) > > Personally, I don't think we should necessarily rule out re-using the > name in the standard library. Does anyone have strong feelings either > way? The package appears to have no releases, and the home page gave me a 404. I would say it's OK to reuse the name. Paul From p.f.moore at gmail.com Tue Sep 22 10:12:00 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 22 Sep 2015 09:12:00 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <55FC9812.2090503@mrabarnett.plus.com> <20150919034112.GQ31152@ando.pearwood.info> <20150920073157.GV31152@ando.pearwood.info> <55FF8CFD.9070906@mail.de> <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 21 September 2015 at 23:56, Carl Meyer wrote: > My jaw dropped a bit when I saw it asserted in this thread that > functions returning "useful value or None" is an anti-pattern. I write > functions like that all the time, and I consider it a useful and > necessary Python idiom. I would hate to rewrite all that code to either > deal with exceptions or add default-value-argument boilerplate to all of > them; when "no result" is an expected and normal possibility from a > function, letting the calling code deal with None however it chooses is > much nicer than either of those options. Maybe my use of the phrase "anti-pattern" was too strong (i thought it implied a relatively mild "this causes problems"). Having the caller deal with problems isn't bad, but in my experience, too often the caller *doesn't* deal with the possibility None return. It feels rather like C's practice of returning error codes which never get checked. But as I said, YMMV, and my experience is clearly different from yours. > I don't love the ? syntax, but I would certainly use the feature > discussed here happily and frequently. If we're back to discussing indexing and attribute access rather than ??, maybe -> would work? obj->attr meaning None if obj is None else obj.attr obj->[n] meaning None if obj is None else obj[attr] obj->(args) meaning None if obj is None else obj(args) I think Matthias Bussonnier's point that ? and ?? is heavily used in IPython is a good one. Python traditionally doesn't introduce new punctuation (@ for decorators was AFAIK the last one). I thought that was precisely to leave the space of unused characters available for 3rd party tools. Paul From j.wielicki at sotecware.net Tue Sep 22 10:26:13 2015 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Tue, 22 Sep 2015 10:26:13 +0200 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <560110A5.8010503@sotecware.net> On 20.09.2015 02:27, Chris Angelico wrote: > On Sun, Sep 20, 2015 at 10:19 AM, Tim Peters wrote: >> [Chris Angelico ] >>> token_bytes "obviously" should return a bytes, >> >> Which os.urandom() does in Python 3. I'm not writing docs, just >> suggesting the functions. >> >>> and token_alpha equally obviously should be returning a str. >> >> Which part of "string" doesn't suggest "str"? >> >>> (Or maybe it should return the same type as alphabet, which >>> could be either?) >>> >>> : What about the other two? >> >> Which part of "ASCII" is ambiguous? >> >>> Also, if you ask for 4 bytes from token_hex, do you get 4 hex >>> digits or 8 (four bytes of entropy)? >> >> And which part of "same"? ;-) >> >> Bikeshed away.; I'm outta this now ;-) > > Heh :) > > My personal preference for shed colour: token_bytes returns a > bytestring, its length being the number provided. All the others > return Unicode strings, their lengths again being the number provided. > So they're all text bar the one that explicitly says it's in bytes. My personal preference would be for the number of bytes to rather reflect the entropy in the result. This would be a safer use when migrating from using e.g. token_url to token_alpha with the base32 alphabet [1], for example because you want to have better readable tokens. Speaking of which, a token_base32 would probably make sense, too. regards, jwi [1]: https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt From ncoghlan at gmail.com Tue Sep 22 13:56:02 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Sep 2015 21:56:02 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 22 September 2015 at 09:56, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I wouldn't include punctuation [in the password alphabet] by > > default, as too many places still prohibit some, or all, > > punctuation characters. > > Do you really expect users to choose their own random passwords using > this function? I would expect that this function would be used for > initial system-generated passwords (or system-enforced random > passwords), and the system would have control over the admissible set. > But users who have to conform to somebody else's rules much prefer > obfuscated passwords that pass strength tests to random passwords in > my experience. Right, the primary use case here is "web developer creating a default password for an automatically created admin account" (for example), not "end user creating a password for an arbitrary service". We don't want to overgeneralise the canned recipes - keep them dirt simple, and if folks want something slightly different, we can go the itertools path and have recipes in the documentation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Sep 22 14:03:06 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Sep 2015 22:03:06 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <560110A5.8010503@sotecware.net> References: <20150919181612.GT31152@ando.pearwood.info> <560110A5.8010503@sotecware.net> Message-ID: On 22 September 2015 at 18:26, Jonas Wielicki wrote: > On 20.09.2015 02:27, Chris Angelico wrote: >> My personal preference for shed colour: token_bytes returns a >> bytestring, its length being the number provided. All the others >> return Unicode strings, their lengths again being the number provided. >> So they're all text bar the one that explicitly says it's in bytes. > > My personal preference would be for the number of bytes to rather > reflect the entropy in the result. This would be a safer use when > migrating from using e.g. token_url to token_alpha with the base32 > alphabet [1], for example because you want to have better readable tokens. This isn't something to decide by personal preference, it's something to be decide by considering the consequences of someone misunderstanding the API and not noticing that the result isn't what they expected. Scenario 1: API specifies bytes of entropy Consequence of misunderstanding: result is twice as long as expected, with more entropy than expected Scenario 2: API specifies length of result Consequence of misunderstanding: result is half as long as expected, with less entropy than expected Scenario 1 fails safe, scenario 2 doesn't, so for the APIs that are just reversible data transforms around os.urandom, it makes the most sense to specify the number of bytes of entropy you want. Building a password from an alphabet is different, as that involves repeated applications of secrets.choice() to the given alphabet, so you need to specify the result length directly. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Tue Sep 22 14:07:59 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Sep 2015 21:07:59 +0900 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150922034044.GO31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150922034044.GO31152@ando.pearwood.info> Message-ID: <87lhbylkrk.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > On Tue, Sep 22, 2015 at 08:56:24AM +0900, Stephen J. Turnbull wrote: > I don't know. Perhaps they will. I'm not entirely sure what the > use-case of this password generator is, since I'm pretty sure that > "real" password generators have to deal with far more complicated > rules. Actually, I think they'll do what randrange does: take a seed from urandom() and values from a (CS)PRNG based on that seed, and throw away an out-of-range subset. Ie, they'll just generate passwords based on a simple rule about the alphabet and keep trying until they get one that passes the strength tester. > > I would expect that this function would be used for > > initial system-generated passwords (or system-enforced random > > passwords), and the system would have control over the admissible set. > > Perhaps so. But then how does the application get the password to the > user? Via unencypted email, like mailman does? Well, I hand them out to my students in class on business cards. But an HTTPS connection could also work. > I expect that the only use-case for an application generating a > password for the user would be "low security" applications where > the password has low value. That could very well be true. From random832 at fastmail.com Tue Sep 22 14:51:10 2015 From: random832 at fastmail.com (Random832) Date: Tue, 22 Sep 2015 08:51:10 -0400 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <560110A5.8010503@sotecware.net> Message-ID: <1442926270.3642748.390398377.68DC4B29@webmail.messagingengine.com> On Tue, Sep 22, 2015, at 08:03, Nick Coghlan wrote: > Building a password from an alphabet is different, as that involves > repeated applications of secrets.choice() to the given alphabet, so > you need to specify the result length directly. Well, in principle, the length could be calculated from the number of bytes of entropy desired by using ceil(nbytes*log(256)/log(len(alphabet))), if all that matters is to "fail safe" [i.e. longer] rather than to not be surprising. Being calculated by repeated application of choice rather than some other algorithm is an implementation detail. From eric at trueblade.com Tue Sep 22 17:04:55 2015 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 22 Sep 2015 11:04:55 -0400 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921162226.GE31152@ando.pearwood.info> Message-ID: <56016E17.4000509@trueblade.com> Sorry to jump in with replying to a random message, but I can't find the message where this originally showed up: >>>> Bound methods of a SystemRandom instance >>>> .randrange() >>>> .randint() >>>> .randbits() >>>> renamed from .getrandbits() >>>> .randbelow(exclusive_upper_bound) >>>> renamed from private ._randbelow() >>>> .choice() While we're bikeshedding, can we pick better names than randXXX? How about random_range(), etc.? I'd rather have clarity than save a few chars. I think it's more approachable for new users to a new module. Eric. From brett at python.org Tue Sep 22 18:01:51 2015 From: brett at python.org (Brett Cannon) Date: Tue, 22 Sep 2015 16:01:51 +0000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, 22 Sep 2015 at 04:56 Nick Coghlan wrote: > On 22 September 2015 at 09:56, Stephen J. Turnbull > wrote: > > Steven D'Aprano writes: > > > > > I wouldn't include punctuation [in the password alphabet] by > > > default, as too many places still prohibit some, or all, > > > punctuation characters. > > > > Do you really expect users to choose their own random passwords using > > this function? I would expect that this function would be used for > > initial system-generated passwords (or system-enforced random > > passwords), and the system would have control over the admissible set. > > But users who have to conform to somebody else's rules much prefer > > obfuscated passwords that pass strength tests to random passwords in > > my experience. > > Right, the primary use case here is "web developer creating a default > password for an automatically created admin account" (for example), > not "end user creating a password for an arbitrary service". > > We don't want to overgeneralise the canned recipes - keep them dirt > simple, and if folks want something slightly different, we can go the > itertools path and have recipes in the documentation. > Out of this whole proposal, this password function is the one I'm most worried about. As someone who has a project whose entire job is to generate consistent passwords, I can tell you it's a messy business that will just lead to never-ending complaints about "why didn't you include this as part of password alphabet" or "why did you choose that length". It just isn't worth the hassle when it isn't going to impact a majority of Python users. This can be something that web frameworks and other folks worry about. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Sep 22 18:25:46 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 22 Sep 2015 18:25:46 +0200 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <5601810A.3070302@egenix.com> On 22.09.2015 18:01, Brett Cannon wrote: > On Tue, 22 Sep 2015 at 04:56 Nick Coghlan wrote: > >> On 22 September 2015 at 09:56, Stephen J. Turnbull >> wrote: >>> Steven D'Aprano writes: >>> >>> > I wouldn't include punctuation [in the password alphabet] by >>> > default, as too many places still prohibit some, or all, >>> > punctuation characters. >>> >>> Do you really expect users to choose their own random passwords using >>> this function? I would expect that this function would be used for >>> initial system-generated passwords (or system-enforced random >>> passwords), and the system would have control over the admissible set. >>> But users who have to conform to somebody else's rules much prefer >>> obfuscated passwords that pass strength tests to random passwords in >>> my experience. >> >> Right, the primary use case here is "web developer creating a default >> password for an automatically created admin account" (for example), >> not "end user creating a password for an arbitrary service". >> >> We don't want to overgeneralise the canned recipes - keep them dirt >> simple, and if folks want something slightly different, we can go the >> itertools path and have recipes in the documentation. >> > > Out of this whole proposal, this password function is the one I'm most > worried about. As someone who has a project whose entire job is to generate > consistent passwords, I can tell you it's a messy business that will just > lead to never-ending complaints about "why didn't you include this as part > of password alphabet" or "why did you choose that length". It just isn't > worth the hassle when it isn't going to impact a majority of Python users. > This can be something that web frameworks and other folks worry about. Agreed. There are too many policies and regulations for passwords out there. The stdlib is not the right place for this. But the general purpose functionality of having a function which returns a string of given length and characters from a given set is useful for building routines which implement such policies. Just don't call it a password function :-) How about: randstr(length, alphabet) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 22 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-14: Released mxODBC Plone/Zope DA 2.2.3 http://egenix.com/go84 2015-09-26: Python Meeting Duesseldorf Sprint 2015 4 days to go 2015-10-21: Python Meeting Duesseldorf ... 29 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From steve at pearwood.info Tue Sep 22 19:05:34 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 23 Sep 2015 03:05:34 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150922170534.GR31152@ando.pearwood.info> On Tue, Sep 22, 2015 at 04:01:51PM +0000, Brett Cannon wrote: > Out of this whole proposal, this password function is the one I'm most > worried about. As someone who has a project whose entire job is to generate > consistent passwords, I can tell you it's a messy business that will just > lead to never-ending complaints about "why didn't you include this as part > of password alphabet" or "why did you choose that length". It just isn't > worth the hassle when it isn't going to impact a majority of Python users. > This can be something that web frameworks and other folks worry about. I too feel a quiet unease about password(), although I don't have anything concrete to pin it on. I'm happy to be guided by people with more experience in this realm. What if we called it simple_password() and made it clear that it wasn't intended as an all-singing, all-dancing password generator? -- Steve From tim.peters at gmail.com Tue Sep 22 19:41:44 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 22 Sep 2015 12:41:44 -0500 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150922170534.GR31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150922170534.GR31152@ando.pearwood.info> Message-ID: [Steven D'Aprano ] > I too feel a quiet unease about password(), although I don't have > anything concrete to pin it on. I'm happy to be guided by people with > more experience in this realm. > > What if we called it simple_password() and made it clear that it wasn't > intended as an all-singing, all-dancing password generator? Just drop it. Nobody I recall has said anything in favor of it ;-) It would be easy to give it as an example in the docs instead, building directly on choice(). That would steer people who need fancier stuff in the right direction. From steve at pearwood.info Tue Sep 22 19:47:55 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 23 Sep 2015 03:47:55 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <560110A5.8010503@sotecware.net> References: <20150919181612.GT31152@ando.pearwood.info> <560110A5.8010503@sotecware.net> Message-ID: <20150922174755.GS31152@ando.pearwood.info> On Tue, Sep 22, 2015 at 10:26:13AM +0200, Jonas Wielicki wrote: > > On 20.09.2015 02:27, Chris Angelico wrote: > >>> Also, if you ask for 4 bytes from token_hex, do you get 4 hex > >>> digits or 8 (four bytes of entropy)? I think the answer there has to be 8. I interpret Tim's reference to "same" as that the intent of token_hex is to call os.urandom(nbytes), then convert it to a hex string. So the implementation might be as simple as: def token_hex(nbytes): return binascii.hexlify(os.urandom(nbytes)) modulo a call to .decode('ascii') if we want it to return a string. One obvious question is, how many bytes is enough? Perhaps we should set a default value for nbytes, with the understanding that the default value will increase in the future. > > My personal preference for shed colour: token_bytes returns a > > bytestring, its length being the number provided. All the others > > return Unicode strings, their lengths again being the number provided. > > So they're all text bar the one that explicitly says it's in bytes. > > My personal preference would be for the number of bytes to rather > reflect the entropy in the result. This would be a safer use when > migrating from using e.g. token_url to token_alpha with the base32 > alphabet [1], for example because you want to have better readable tokens. > > Speaking of which, a token_base32 would probably make sense, too. Oh oh, scope creep already! And so it begins... *wink* What you are referring to isn't the standard base32, which already exists in the stdlib (in base64.py, together with base16). It's is referred to by its creators as z-base-32, and the reasoning they give seems sound. It's not intended as a replacement for RFC-3458 base32, but an alternative. If the std lib already included a z-base-32 implementation, I would be happy to include token_zbase32 in the same spirit as token_base64. But it doesn't. So first you would have to convince somebody to add zbase32 to the standard library. > [1]: https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt -- Steve From tim.peters at gmail.com Tue Sep 22 20:05:43 2015 From: tim.peters at gmail.com (Tim Peters) Date: Tue, 22 Sep 2015 13:05:43 -0500 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: <20150922174755.GS31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <560110A5.8010503@sotecware.net> <20150922174755.GS31152@ando.pearwood.info> Message-ID: >>>>> Also, if you ask for 4 bytes from token_hex, do you get 4 hex >>>>> digits or 8 (four bytes of entropy)? [Steven D'Aprano] > I think the answer there has to be 8. I interpret Tim's reference to > "same" as that the intent of token_hex is to call os.urandom(nbytes), > then convert it to a hex string. Absolutely. If we're trying to "fail safe", it's the number of unpredictable source bytes that's important, not the length of the string produced. And, e.g., in the case of a URL-safe base64 encoding, passing "number of characters in the string" would be plain idiotic ;-) > So the implementation might be as simple as: > > def token_hex(nbytes): > return binascii.hexlify(os.urandom(nbytes)) > modulo a call to .decode('ascii') if we want it to return a string. Nick Coghlan already posted implementation of these things, before this thread started. They're all easy, _provided that_ you know which obscure functions to call; e.g., def token_url(nbytes): return base64.urlsafe_b64encode(os.urandom(nbytes)).decode("ascii") From srkunze at mail.de Tue Sep 22 20:22:42 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 22 Sep 2015 20:22:42 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <20150922031507.GL31152@ando.pearwood.info> References: <55FF8CFD.9070906@mail.de> <20150922031507.GL31152@ando.pearwood.info> Message-ID: <56019C72.9000806@mail.de> On 22.09.2015 05:15, Steven D'Aprano wrote: > On Mon, Sep 21, 2015 at 05:23:42PM -0400, Terry Reedy wrote: > >> I agree with Paul Moore that propagating None is generally a bad idea. > As I understand it, you and Paul are describing a basic, simple idiom > which is ubiquitous across Python code: using None to stand in for "no > such value" There is not a single "no such value". As I mentioned before, when discussing NULL values on the RDF mailing list, we discovered 6 or 7 domain-agnostic meanings. > when the data type normally used doesn't otherwise have > something suitable. Consequently I really don't understand what you and > Paul have against it. I can tell from what I've seen that people use None for: all kinds of various interesting semantics depending on the variable, on the supposed type and on the function such as: - +infinity for datetimes but only if it signifies the end of a timespan - current datetime - mixing both - default item in a list like [1, 2, None, 4, 9] (putting in 5 would have done the trick) - ... Really? Just imagine a world where Python and other systems would have never invented None, NULLs or anything like that. > I think you meant to say it merely *postpones* the inevitable > exception. But that's wrong, there's nothing inevitable about an > exception here. It's not *hard* to deal with "value-or-None". It's > just tedious, which is why a bit of syntactic sugar may appeal. It's a sign of bad design. So, syntactic sugar does not help when doing toilet paper programming (hope that translation works for English). Best, Sven From rosuav at gmail.com Wed Sep 23 00:53:00 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Sep 2015 08:53:00 +1000 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <56019C72.9000806@mail.de> References: <55FF8CFD.9070906@mail.de> <20150922031507.GL31152@ando.pearwood.info> <56019C72.9000806@mail.de> Message-ID: On Wed, Sep 23, 2015 at 4:22 AM, Sven R. Kunze wrote: > I can tell from what I've seen that people use None for: all kinds of > various interesting semantics depending on the variable, on the supposed > type and on the function such as: > > - +infinity for datetimes but only if it signifies the end of a timespan What this means is that your boundaries can be a datetime or None, where None means "no boundary at this end". > - current datetime > - mixing both I don't know of a situation where None means "now"; can you give an example? > - default item in a list like [1, 2, None, 4, 9] (putting in 5 would have > done the trick) What does this mean? Is this where you're taking an average or somesuch, and pretending that the None doesn't exist? That seems fairly consistent with SQL. Mostly, this does still represent "no such value". ChrisA From fperez.net at gmail.com Wed Sep 23 03:21:11 2015 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 22 Sep 2015 18:21:11 -0700 Subject: [Python-ideas] Null coalescing operators References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 2015-09-22 03:59:07 +0000, Andrew Barnert via Python-ideas said: > I'm not saying they have a right to expect/demand that Guido never > change his mind about anything anywhere ever, just that maybe they get > a little extra consideration on backward compatibility with their use > of ? than with their use of ! or % (which have been in use as operators > or parts of operators for decades). I just wanted to quickly comment on what my original stance was regarding IPython's extensions to the base Python language. This was where I stood as I made decisions when the project was basically just me, and over time we've mostly adopted this as project policy. We fully acknowledge that IPython has to be a strict superset of the Python language, and we are most emphatically *not* a fork of the lanugage intended to be incompatible. We've added some extensions by hijacking a few characters that are invalid in the base language for thigns we deemed to be useful while working interactively, but we always accept that, if the language moves in our direction, it's our job to pack up and move again to a new location. In fact, that already happened once: before Python 2.4, our prefix for "magic functions" was the @ character, and when that was introduced as the decorator prefix, we had to scramble. We carefully decided to pick %, knowing that an existing binary operator would be unlikely to be added also as a new unary prefix. Now, accepting that as our reality doesn't mean that at least we don't want to *inform* you folks of what our uses are, so that at least you can consider them in your decision-making process. Since in some cases, that means there's an established ~ 15 years of a community with a habit of using a particular syntax for something, that may be confused if things change. So at least, we want to let you know. Due precisely to these recent conversations (I had a very similar thread a few days ago with Nick about the ! operator, which we also use in all kinds of nasty ways), we have started documenting more precisely all these differences, so the question "where exactly does IPython go beyond Python" can be answered in one place. You can see the progress here: https://github.com/ipython/ipython/pull/8821 We hope this will be merged soon into our docs, and it should help you folks have a quick reference for these questions. Finally, I want to emphasize that these things aren't really changing much anymore, this is all fairly stable. All these choices have by now stabilized, we only introduced the @ -> % transition when python 2.4 forced us, and more recently we introduced the notion of having a double-%% marker for "cell magics", but that was ~ 4 years ago, and it didn't require a new character, only allowing it to be doubled. Best, f -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Sep 23 04:43:50 2015 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 23 Sep 2015 03:43:50 +0100 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <560211E6.1000907@mrabarnett.plus.com> On 2015-09-23 02:21, Fernando Perez wrote: > On 2015-09-22 03:59:07 +0000, Andrew Barnert via Python-ideas said: > > > I'm not saying they have a right to expect/demand that Guido never > change his mind about anything anywhere ever, just that maybe they get a > little extra consideration on backward compatibility with their use of ? > than with their use of ! or % (which have been in use as operators or > parts of operators for decades). > > > I just wanted to quickly comment on what my original stance was > regarding IPython's extensions to the base Python language.This was > where I stood as I made decisions when the project was basically just > me, and over time we've mostly adopted this as project policy. > > > We fully acknowledge that IPython has to be a strict superset of the > Python language, and we aremost emphatically *not* a fork of the > lanugage intended to be incompatible. We've added some extensions by > hijacking a few characters that are invalid in the base language for > thigns we deemed to be useful while working interactively, but we always > accept that, if the language moves in our direction, it's our job to > pack up and move again to a new location. > > > In fact, that already happened once: before Python 2.4, our prefix for > "magic functions" was the @ character, and when that was introduced as > the decorator prefix, we had to scramble.We carefully decided to pick %, > knowing that an existing binary operator would be unlikely to be added > also as a new unary prefix. > > > Now, accepting that as our reality doesn't mean that at least we don't > want to *inform* you folks of what our uses are, so that at least you > can consider them in your decision-making process.Since in some cases, > that means there's an established ~ 15 years of a community with a habit > of using a particular syntax for something, that may be confused if > things change.So at least, we want to let you know. > > > Due precisely to these recent conversations (I had a very similar thread > a few days ago with Nick about the ! operator, which we also use in all > kinds of nasty ways), we have started documenting more precisely all > these differences, so the question "where exactly does IPython go beyond > Python" can be answered in one place.You can see the progress here: > > > https://github.com/ipython/ipython/pull/8821 > > > We hope this will be merged soon into our docs, and it should help you > folks have a quick reference for these questions. > > > Finally, I want to emphasize that these things aren't really changing > much anymore, this is all fairly stable.All these choices have by now > stabilized, we only introduced the @ -> % transition when python 2.4 > forced us, and more recently we introduced the notion of having a > double-%% marker for "cell magics", but that was ~ 4 years ago, and it > didn't require a new character, only allowing it to be doubled. > From the examples I've seen, the "?" and "??" occur at the end of the line. The proposed 'operators' "?.", "?[", "?(" and "??" wouldn't occur at the end of the line (or, if they did, they'd be inside parentheses, brackets, or braces). So is there really a conflict, in practice? From bussonniermatthias at gmail.com Wed Sep 23 06:56:54 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Tue, 22 Sep 2015 21:56:54 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <560211E6.1000907@mrabarnett.plus.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> <560211E6.1000907@mrabarnett.plus.com> Message-ID: <67F64367-9413-4AD6-9B16-66CA1547E44D@gmail.com> > On Sep 22, 2015, at 19:43, MRAB wrote: > > On 2015-09-23 02:21, Fernando Perez wrote: >> On 2015-09-22 03:59:07 +0000, Andrew Barnert via Python-ideas said: >> >> ... >> >> Finally, I want to emphasize that these things aren't really changing >> much anymore, this is all fairly stable.All these choices have by now >> stabilized, we only introduced the @ -> % transition when python 2.4 >> forced us, and more recently we introduced the notion of having a >> double-%% marker for "cell magics", but that was ~ 4 years ago, and it >> didn't require a new character, only allowing it to be doubled. >> > From the examples I've seen, the "?" and "??" occur at the end of the line. beginning of line can happened too. ?print is equivalent to print? > The proposed 'operators' "?.", "?[", "?(" and "??" wouldn't occur at > the end of the line (or, if they did, they'd be inside parentheses, > brackets, or braces). > > So is there really a conflict, in practice? As stated in previous mails, with current state of proposal no it does not conflict, we should be able to distinguish the two cases. We are just informing the pep authors and contributors of the syntax hijack that we did and currently have in IPython. -- M > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From chris.barker at noaa.gov Wed Sep 23 08:09:36 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 22 Sep 2015 23:09:36 -0700 Subject: [Python-ideas] add a single __future__ for py3? In-Reply-To: <030BCA1C-5B19-4DD5-BBE1-E8C344BB4716@yahoo.com> References: <55FF8758.70406@canterbury.ac.nz> <20150921130813.GZ31152@ando.pearwood.info> <1442843138.3321719.389368849.52DEB79B@webmail.messagingengine.com> <030BCA1C-5B19-4DD5-BBE1-E8C344BB4716@yahoo.com> Message-ID: <1716796771240402588@unknownmsgid> On Sep 22, 2015, at 6:43 PM, Andrew Barnert wrote: On Sep 21, 2015, at 10:59, Gregory P. Smith wrote: I think people should stick with *from __future__ import absolute_import* regardless of what code they are writing. If the py3 way of handling Absolute vs relative import isn't better --- why is it in Py3???? Anyway, the point of a this is to get your py2 code working as similarly as possible on py3. So better or worse, or not all that different, you still want that behavior. But again, it looks like this ship has sailed... Thanks for indulging me. -Chris They will eventually create a file innocuously called something like calendar.py (the same name as a standard library module) in the same directory as their main binary and their debugging of the mysterious failures they just started getting from the tarfile module will suddenly require leveling up to be able to figure it out. ;) But they'll get the same problems either way. If calendar.py isn't on sys.path, it won't interfere with tarfile. And if it is on sys.path, so it does interfere with tarfile, then it's already an absolute import, so enabling absolute_import doesn't help. I suppose if they've done something extra stupid, like putting a package directory on sys.path as well as putting something called calendar.py in that package and importing it with an unqualified import, then maybe it'll be easier for someone to explain all the details of everything they did wrong (including why they shouldn't have put the package on sys.path) if they're using absolute_imports, but beyond that, I don't see how it helps this case. -gps On Mon, Sep 21, 2015 at 8:18 AM Guido van Rossum wrote: > It's just about these four imports, right? > > > from __future__ import absolute_import > from __future__ import division > from __future__ import print_function > from __future__ import unicode_literals > > I think the case is overblown. > > - absolute_import is rarely an issue; the only thing it does (despite the > name) is give an error message when you attempt a relative import without > using a "." in the import. A linter can find this easily for you, and a > little discipline plus the right example can do a lot of good here. > > - division is important. > > - print_function is important. > > - unicode_literals is useless IMO. It breaks some things (yes there are > still APIs that don't take unicode in 2.7) and it doesn't nearly as much as > what would be useful -- e.g. repr() and .readline() still return > 8-bit strings. I recommend just using u-literals and abandoning Python 3.2. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Sep 23 08:19:37 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 22 Sep 2015 23:19:37 -0700 Subject: [Python-ideas] Null coalescing operators In-Reply-To: <1394693929.57358.1442988294597.JavaMail.mobile-sync@iogg1> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <87vbb3mbay.fsf@uwakimon.sk.tsukuba.ac.jp> <1394693929.57358.1442988294597.JavaMail.mobile-sync@iogg1> Message-ID: <1578039969583045795@unknownmsgid> Sent from my iPhone On Sep 22, 2015, at 6:21 PM, Fernando Perez From ncoghlan at gmail.com Wed Sep 23 09:46:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Sep 2015 17:46:18 +1000 Subject: [Python-ideas] Pre-PEP Adding A Secrets Module To The Standard Library In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150921174758.GF31152@ando.pearwood.info> <87zj0fmimv.fsf@uwakimon.sk.tsukuba.ac.jp> <20150922170534.GR31152@ando.pearwood.info> Message-ID: On 23 September 2015 at 03:41, Tim Peters wrote: > [Steven D'Aprano ] >> I too feel a quiet unease about password(), although I don't have >> anything concrete to pin it on. I'm happy to be guided by people with >> more experience in this realm. >> >> What if we called it simple_password() and made it clear that it wasn't >> intended as an all-singing, all-dancing password generator? > > Just drop it. Nobody I recall has said anything in favor of it ;-) I think I may have been the one to suggest it originally, since one of the things we're trying to address is the plethora of bad advice found when Googling for "python password generator", but I'm OK with dropping it from the initial version of the module, just on the general principle that adding things later is relatively easy, while taking them away is hard. > It would be easy to give it as an example in the docs instead, > building directly on choice(). That would steer people who need > fancier stuff in the right direction. Yeah, addressing the default password generation problem should work just as well as a recipe in the secrets module documentation - I see the core goal here as being to help guide folks towards using the right random number generator for security sensitive tasks, and "use the RNG in the secrets module for random secrets, and the RNG in the random module for modelling and simulation" is a much easier story to tell than explaining the technical differences between random.Random and random.SystemRandom. Raymond Hettinger's philosophy with itertools is likely a good guiding principle here: provide a small set of useful primitives, and otherwise favour recipes in the documentation. If we end up with a "more-secrets" module on PyPI akin to "more-itertools", I think that's fine (and also provides an easy way of backporting future secrets module additions to earlier Python versions) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Sep 23 11:00:41 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Sep 2015 19:00:41 +1000 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? Message-ID: This may just be my C programmer brain talking, but reading the examples in PEP 505 makes me think of the existing use of "|" as the bitwise-or operator in both Python and C, and "||" as the logical-or operator in C. Using || for None-coalescence would still introduce a third "or" variant into Python as PEP 505 proposes (for good reasons), but without introducing a new symbolic character that relates to "OR" operations: x | y: bitwise OR (doesn't short circuit) x or y: logical OR (short circuits based on bool(x)) x || y: logical OR (short circuits based on "x is not None") (An analogy with C pointers works fairly well here, as "x || y" in C is a short-circuiting operator that switches on "x != NULL" in the pointer case) Taking some key examples from the PEP: data = data ?? [] headers = headers ?? {} data ?= [] headers ?= {} When written using a doubled pipe instead: data = data || [] headers = headers || {} data ||= [] headers ||= {} Translations would be the same as proposed n PEP 505 (for simplicity, this shows evaluating the LHS multiple times, in practice that wouldn't happen): data = data if data is not None else [] headers = headers if headers is not None else [] data = data if data is not None else [] headers = headers if headers is not None else [] One additional wrinkle is that a single "|" would conflict with the bitwise-or notation in the case of None-aware index access, so the proposal for both that and attribute access would be to make the notation "!|", borrowing the logical negation "!" from "!=". In this approach, where "||" would be the new short-circuiting binary operator standing for "LHS if LHS is not None else RHS", in "!|" the logical negations cancel out to give "LHS if LHS is None else LHS". PEP 505 notation: title?.upper() person?['name'] Using the "is not not None" pipe-based notation: title!|.upper() person!|['name'] And the gist of the translation: title if title is None else title.upper() person if person is None else person['name'] If this particular syntax were to be chosen, I also came up with the following possible mnemonics that may be useful as an explanatory tool: "||" is a barrier to prevent None passing through an expression "!|" explicitly allows None to pass without error Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Wed Sep 23 11:21:44 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 23 Sep 2015 02:21:44 -0700 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: References: Message-ID: On Sep 23, 2015, at 02:00, Nick Coghlan wrote: > > This may just be my C programmer brain talking, but reading the > examples in PEP 505 makes me think of the existing use of "|" as the > bitwise-or operator in both Python and C, and "||" as the logical-or > operator in C. The connection with || as a falsey-coalescing operator in C--and C#, Swift, etc., which have a separate null-coalescing operator that's spelled ??--seems like it could be misleading. Otherwise, I like it, but that's a pretty big otherwise. > One additional wrinkle is that a single "|" would conflict with the > bitwise-or notation in the case of None-aware index access, so the > proposal for both that and attribute access would be to make the > notation "!|", borrowing the logical negation "!" from "!=". Maybe you should have given the examples first, because written on its own like this it looks unspeakably ugly, but in context below it's a lot nicer... > title!|.upper() > person!|['name'] This actually makes me think of the ! from Swift and other languages ("I know this optionally-null object is not null even if the type checker can't prove it, so let me use it that way"), more than negation. Which makes the whole thing make sense, but in a maybe-unpythonically out-of-order way: the bang-or means "either title is not None so I get title.upper(), or it is so I get None". I'm not sure whether other people will read it that way--or, if they do, whether it will be helpful or harmful mnemonically. > If this particular syntax were to be chosen, I also came up with the > following possible mnemonics that may be useful as an explanatory > tool: > > "||" is a barrier to prevent None passing through an expression > "!|" explicitly allows None to pass without error That's definitely easy to understand and remember. But since Python doesn't exist in isolation, and null coalescing and null conditional operators exist in other languages and are being added to many new ones, it might be useful to use similar terms to other languages. (See https://msdn.microsoft.com/en-us/library/ms173224.aspx and https://msdn.microsoft.com/en-us/library/dn986595.aspx for how C# describes them.) From ncoghlan at gmail.com Wed Sep 23 12:53:00 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Sep 2015 20:53:00 +1000 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: References: Message-ID: On 23 September 2015 at 19:21, Andrew Barnert wrote: > On Sep 23, 2015, at 02:00, Nick Coghlan wrote: >> >> This may just be my C programmer brain talking, but reading the >> examples in PEP 505 makes me think of the existing use of "|" as the >> bitwise-or operator in both Python and C, and "||" as the logical-or >> operator in C. > > The connection with || as a falsey-coalescing operator in C--and C#, Swift, etc., which have a separate null-coalescing operator that's spelled ??--seems like it could be misleading. Otherwise, I like it, but that's a pretty big otherwise. One of the problems I occasionally see with folks migrating to Python from other languages is with our relatively expansive definition of "false" values. In particular, C/C++ developers expect all strings and containers (i.e. non-NULL pointers) to be truthy, with only primitive types (i.e. pointers and numbers) able to be false in a boolean content. Accordingly, the difference between C's || and a null-coalescing || in Python would be adequately covered by "Python has no primitive types, everything's an object or a reference to an object, so || in Python is like || with pointers in C/C++, where a reference to None is Python's closest equivalent to NULL". For example, a C/C++ dev might be tempted to write code like this: def example(env=None): env = env || {} ... With || as a null coalescing operator, that code's actually correct, while the same code with "or" would be incorrect: def example(env=None): env = env or {} # Also replaces a passed in empty dict ... >> One additional wrinkle is that a single "|" would conflict with the >> bitwise-or notation in the case of None-aware index access, so the >> proposal for both that and attribute access would be to make the >> notation "!|", borrowing the logical negation "!" from "!=". > > Maybe you should have given the examples first, because written on its own like this it looks unspeakably ugly, but in context below it's a lot nicer... > >> title!|.upper() >> person!|['name'] > > This actually makes me think of the ! from Swift and other languages ("I know this optionally-null object is not null even if the type checker can't prove it, so let me use it that way"), more than negation. Which makes the whole thing make sense, but in a maybe-unpythonically out-of-order way: the bang-or means "either title is not None so I get title.upper(), or it is so I get None". It could also just be a "!" on its own, as the pipe isn't really adding much here: title!.upper() person!['name'] Then the "!" is saying "I know this may not exist, if it doesn't just bail out of this whole subexpression and produce None". That said, it's mainly the doubled "??" operator that I'm not fond of, I'm more OK with the "gracefully tolerate this being None" aspect of the proposal: title?.upper() person?['name'] > I'm not sure whether other people will read it that way--or, if they do, whether it will be helpful or harmful mnemonically. > >> If this particular syntax were to be chosen, I also came up with the >> following possible mnemonics that may be useful as an explanatory >> tool: >> >> "||" is a barrier to prevent None passing through an expression >> "!|" explicitly allows None to pass without error > > That's definitely easy to understand and remember. But since Python doesn't exist in isolation, and null coalescing and null conditional operators exist in other languages and are being added to many new ones, it might be useful to use similar terms to other languages. (See https://msdn.microsoft.com/en-us/library/ms173224.aspx and https://msdn.microsoft.com/en-us/library/dn986595.aspx for how C# describes them.) Those mnemonics are the "How would I try to explain this to a 10 year old?" version, rather than the "How would I try to explain this to a computer science student?" version. Assuming a null coalescing operator is added, I'd expect to see more formal language than that used in the language reference, regardless of the spelling chosen. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rymg19 at gmail.com Wed Sep 23 16:37:37 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 23 Sep 2015 09:37:37 -0500 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: References: Message-ID: *cough* Ruby and Perl *cough* Ruby has two 'or' operators. One is used normally: myval = a == 1 || a == 2 # same as myval = (a == 1 || a == 2) The other one is a bit different: myval = a == 1 or a == 2 # same as (myval = a == 1) or (a == 2) It's used for simple nil and false elision, since Ruby has a stricter concept of falseness than Python. But it's a bug magnet!! That's what I hated about Ruby. Type the wrong operator and get a hidden error. Sometimes, when I code in C++ a lot and then do something in Python, I'll do: if a || b: Then I realize my mistake and fix it. BUT, with this change, it wouldn't be a mistake. It would just do something entirely different. On September 23, 2015 4:00:41 AM CDT, Nick Coghlan wrote: >This may just be my C programmer brain talking, but reading the >examples in PEP 505 makes me think of the existing use of "|" as the >bitwise-or operator in both Python and C, and "||" as the logical-or >operator in C. > >Using || for None-coalescence would still introduce a third "or" >variant into Python as PEP 505 proposes (for good reasons), but >without introducing a new symbolic character that relates to "OR" >operations: > > x | y: bitwise OR (doesn't short circuit) > x or y: logical OR (short circuits based on bool(x)) > x || y: logical OR (short circuits based on "x is not None") > >(An analogy with C pointers works fairly well here, as "x || y" in C >is a short-circuiting operator that switches on "x != NULL" in the >pointer case) > >Taking some key examples from the PEP: > > data = data ?? [] > headers = headers ?? {} > data ?= [] > headers ?= {} > >When written using a doubled pipe instead: > > data = data || [] > headers = headers || {} > data ||= [] > headers ||= {} > >Translations would be the same as proposed n PEP 505 (for simplicity, >this shows evaluating the LHS multiple times, in practice that >wouldn't happen): > > data = data if data is not None else [] > headers = headers if headers is not None else [] > data = data if data is not None else [] > headers = headers if headers is not None else [] > >One additional wrinkle is that a single "|" would conflict with the >bitwise-or notation in the case of None-aware index access, so the >proposal for both that and attribute access would be to make the >notation "!|", borrowing the logical negation "!" from "!=". > >In this approach, where "||" would be the new short-circuiting binary >operator standing for "LHS if LHS is not None else RHS", in "!|" the >logical negations cancel out to give "LHS if LHS is None else >LHS". > >PEP 505 notation: > > title?.upper() > person?['name'] > >Using the "is not not None" pipe-based notation: > > title!|.upper() > person!|['name'] > >And the gist of the translation: > > title if title is None else title.upper() > person if person is None else person['name'] > >If this particular syntax were to be chosen, I also came up with the >following possible mnemonics that may be useful as an explanatory >tool: > > "||" is a barrier to prevent None passing through an expression > "!|" explicitly allows None to pass without error > >Regards, >Nick. > >-- >Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Sep 23 17:59:56 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Sep 2015 01:59:56 +1000 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: References: Message-ID: On 24 September 2015 at 00:37, Ryan Gonzalez wrote: > *cough* Ruby and Perl *cough* > > Ruby has two 'or' operators. One is used normally: > > myval = a == 1 || a == 2 > # same as > myval = (a == 1 || a == 2) > > The other one is a bit different: > > myval = a == 1 or a == 2 > # same as > (myval = a == 1) or (a == 2) > > It's used for simple nil and false elision, since Ruby has a stricter > concept of falseness than Python. The Perl, Ruby and PHP situation is a bit different from the one proposed here - "or" and "||" are semantically identical in those languages aside from operator precedence. That said, it does still count as a point in favour of "??" as the binary operator spelling - experienced developers are unlikely to assume they already know what that means, while the "||" spelling means they're more likely to think "oh, that's just a higher precedence spelling of 'or'". The only other potential spelling of the coalescence case that comes to mind is to make "?" available in conditional expressions as a reference to the LHS: data = data if ? is not None else [] headers = headers if ? is not None else {} title = user_title if ? is not None else local_default_title if ? is not None else global_default_title title?.upper() person?['name'] The expansions of the latter two would then be: title if ? is None else ?.upper() person if ? is None else ?['name'] Augmented assignment would still be a shorthand for the first two examples: data ?= [] headers ?= {} Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Wed Sep 23 18:47:00 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 24 Sep 2015 02:47:00 +1000 Subject: [Python-ideas] PEP 505 [was Re: Null coalescing operators] In-Reply-To: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> Message-ID: <20150923164651.GU31152@ando.pearwood.info> I've now read PEP 505, and I would like to comment. Executive summary: - I have very little interest in the ?? and ?= operators, but don't object to them: vote +0 - I have a little more interest in ?. and ?[ provided that the precedence allows coalescing multiple look-ups from a single question mark: vote +1 - if it uses the (apparent?) Dart semantics, I am opposed: vote -1 - if the syntax chosen uses || or !| as per Nick's suggestion, I feel the cryptic and ugly syntax is worse than the benefit: vote -1 In more detail: I'm sympathetic to the idea of reducing typing, but I think it is critical to recognise that reducing typing is not always a good thing. If it were, we would always name our variables "a", "b" etc, the type of [] would be "ls", and we would say "frm col impt ODt". And if you have no idea what that last one means, that's exactly my point. Reducing typing is a good thing *up to a point*, at which time it becomes excessively terse and cryptic. One of the things I like about Python is that it is not Perl: it doesn't have an excess of punctuation and short-cuts. Too much syntactic sugar is a bad thing. The PEP suggests a handful of new operators: (1) Null Coalescing Operator spam ?? eggs equivalent to a short-circuiting: spam if spam is not None else eggs I'm ambivalent about this. I don't object to it, but nor does it excite me in the least. I don't think the abbreviated syntax gains us enough in expressiveness to make up for the increase in terseness. In its favour, it can reduce code duplication, and also act as a more correct alternative to `spam or eggs`. (See the PEP for details.) So I'm a very luke-warm +0 on this part of the PEP. (2) None coalescing assignment spam ?= eggs being equivalent to: if spam is None: spam = eggs For the same reasons as above, I'm luke-warm on this: +0. (3) Null-Aware Member Access Operator spam?.attr being equivalent to spam.attr if spam is not None else None To me, this passes the test "does it add more than it costs in cryptic punctuation?", so I'm a little more positive about this. If my reading is correct, the PEP underspecifies the behaviour of this when there is a chain of attribute accesses. Consider: spam?.eggs.cheese This can be interpreted two ways: (a) (spam.eggs.cheese) if spam is not None else None (b) (spam.eggs if spam is not None).cheese but the PEP doesn't make it clear which behaviour they have in mind. Dart appears to interpret it as (b), as the reference given in the PEP shows this example: [quote] You can chain ?. calls, for example: obj?.child?.child?.getter [quote] http://blog.sethladd.com/2015/07/null-aware-operators-in-dart.html That would seem to imply that obj?.child.child.getter would end up trying to evaluate null.child if the first ?. operator returned null. I don't think the Dart semantics is useful, indeed it is actively harmful in that it can hide bugs: Suppose we have an object which may be None, but if not, it must have an attribute spam which in turn must have an attribute eggs. This implies that spam must not be None. We want: obj.spam.eggs if obj is not None else None Using the Dart semantics, we chain ?. operators and get this: obj?.spam?.eggs If obj is None, the expression correctly returns None. If obj is not None, and obj.spam is not None, the expression correctly returns eggs. But it is over-eager, and hides a bug: if obj.spam is None, you want to get an AttributeError, but instead the error is silenced and you get None. So I'm -1 with the Dart semantics, and +1 otherwise. (3) Null-Aware Index Access Operator spam?[item] being similar to spam.attr. Same reasoning applies to this as for attribute access. Nick has suggested using || instead of ??, and similar for the other operators. I don't think this is attractive at all, but the deciding factor which makes Nick's syntax a -1 for me is that it is inconsistent and confusing. He has to introduce a !| variation, so the user has to remember when to use two |s and when to use a ! instead, whether the ! goes before or after the | and that !! is never used. -- Steve From mehaase at gmail.com Wed Sep 23 19:22:11 2015 From: mehaase at gmail.com (Mark E. Haase) Date: Wed, 23 Sep 2015 13:22:11 -0400 Subject: [Python-ideas] PEP 505 [was Re: Null coalescing operators] In-Reply-To: <20150923164651.GU31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150923164651.GU31152@ando.pearwood.info> Message-ID: Steven, thanks for the reply. Just to clarify: the current PEP draft was not meant to be read -- it was just a placeholder to get a PEP # assigned. I didn't realize that new PEPs are published in an RSS feed! I do appreciate your detailed feedback, though. Your interpretation of Dart's semantics is correct, and I agree that's absolutely the wrong way to do it. C# does have the short-circuit semantics that you're looking for. To you, and to everybody else in this thread: I am reading every single message, and I'm working on a draft worthy of your time and attention that incorporates all of these viewpoints and offers several, competing alternatives. I will announce the draft on this list when I'm further along. Proposing a new operator is tremendously difficult in Python, because this community doesn't like complex or ugly punctuation. (And adding a new keyword won't happen any time soon.) A similar debate surrounded the ternary operator PEP[1]. That PEP's author eventually held a vote on the competing alternatives, including an option for "don't do anything". I'm hoping to hold a similar referendum on this PEP once I've had time to work on it a bit more. [1] https://www.python.org/dev/peps/pep-0308/ On Wed, Sep 23, 2015 at 12:47 PM, Steven D'Aprano wrote: > I've now read PEP 505, and I would like to comment. > > Executive summary: > > - I have very little interest in the ?? and ?= operators, but don't > object to them: vote +0 > > - I have a little more interest in ?. and ?[ provided that the > precedence allows coalescing multiple look-ups from a single > question mark: vote +1 > > - if it uses the (apparent?) Dart semantics, I am opposed: vote -1 > > - if the syntax chosen uses || or !| as per Nick's suggestion, > I feel the cryptic and ugly syntax is worse than the benefit: > vote -1 > > > In more detail: > > I'm sympathetic to the idea of reducing typing, but I think it is > critical to recognise that reducing typing is not always a good thing. > If it were, we would always name our variables "a", "b" etc, the type of > [] would be "ls", and we would say "frm col impt ODt". And if you have > no idea what that last one means, that's exactly my point. > > Reducing typing is a good thing *up to a point*, at which time it > becomes excessively terse and cryptic. One of the things I like about > Python is that it is not Perl: it doesn't have an excess of punctuation > and short-cuts. Too much syntactic sugar is a bad thing. > > The PEP suggests a handful of new operators: > > (1) Null Coalescing Operator > > spam ?? eggs > > equivalent to a short-circuiting: > > spam if spam is not None else eggs > > I'm ambivalent about this. I don't object to it, but nor does it excite > me in the least. I don't think the abbreviated syntax gains us enough in > expressiveness to make up for the increase in terseness. In its favour, > it can reduce code duplication, and also act as a more correct > alternative to `spam or eggs`. (See the PEP for details.) > > So I'm a very luke-warm +0 on this part of the PEP. > > > > (2) None coalescing assignment > > spam ?= eggs > > being equivalent to: > > if spam is None: > spam = eggs > > For the same reasons as above, I'm luke-warm on this: +0. > > > > (3) Null-Aware Member Access Operator > > spam?.attr > > being equivalent to > > spam.attr if spam is not None else None > > To me, this passes the test "does it add more than it costs in cryptic > punctuation?", so I'm a little more positive about this. > > If my reading is correct, the PEP underspecifies the behaviour of this > when there is a chain of attribute accesses. Consider: > > spam?.eggs.cheese > > This can be interpreted two ways: > > (a) (spam.eggs.cheese) if spam is not None else None > > (b) (spam.eggs if spam is not None).cheese > > but the PEP doesn't make it clear which behaviour they have in mind. > Dart appears to interpret it as (b), as the reference given in the > PEP shows this example: > > [quote] > You can chain ?. calls, for example: > obj?.child?.child?.getter > [quote] > > http://blog.sethladd.com/2015/07/null-aware-operators-in-dart.html > > That would seem to imply that obj?.child.child.getter would end up > trying to evaluate null.child if the first ?. operator returned null. > > I don't think the Dart semantics is useful, indeed it is actively > harmful in that it can hide bugs: > > Suppose we have an object which may be None, but if not, it must > have an attribute spam which in turn must have an attribute eggs. This > implies that spam must not be None. We want: > > obj.spam.eggs if obj is not None else None > > Using the Dart semantics, we chain ?. operators and get this: > > obj?.spam?.eggs > > If obj is None, the expression correctly returns None. If obj is not > None, and obj.spam is not None, the expression correctly returns eggs. > But it is over-eager, and hides a bug: if obj.spam is None, you want to > get an AttributeError, but instead the error is silenced and you get > None. > > So I'm -1 with the Dart semantics, and +1 otherwise. > > > > (3) Null-Aware Index Access Operator > > spam?[item] > > being similar to spam.attr. Same reasoning applies to this as for > attribute access. > > > > Nick has suggested using || instead of ??, and similar for the other > operators. I don't think this is attractive at all, but the deciding > factor which makes Nick's syntax a -1 for me is that it is inconsistent > and confusing. He has to introduce a !| variation, so the user has to > remember when to use two |s and when to use a ! instead, whether the ! > goes before or after the | and that !! is never used. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Mark E. Haase 202-815-0201 -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Wed Sep 23 19:30:12 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Wed, 23 Sep 2015 19:30:12 +0200 Subject: [Python-ideas] Null coalescing operators In-Reply-To: References: <55FF8CFD.9070906@mail.de> <20150922031507.GL31152@ando.pearwood.info> <56019C72.9000806@mail.de> Message-ID: <5602E1A4.6040807@mail.de> On 23.09.2015 00:53, Chris Angelico wrote: > On Wed, Sep 23, 2015 at 4:22 AM, Sven R. Kunze wrote: >> I can tell from what I've seen that people use None for: all kinds of >> various interesting semantics depending on the variable, on the supposed >> type and on the function such as: >> >> - +infinity for datetimes but only if it signifies the end of a timespan > What this means is that your boundaries can be a datetime or None, > where None means "no boundary at this end". Yes. > >> - current datetime >> - mixing both > I don't know of a situation where None means "now"; can you give an example? range_start = range_end = So, if you need something that ranges from 2015-01-01 to now, you basically say the range is expanding. Depending on the function/method, it either means until forever, or now. > >> - default item in a list like [1, 2, None, 4, 9] (putting in 5 would have >> done the trick) > What does this mean? Is this where you're taking an average or > somesuch, and pretending that the None doesn't exist? That seems > fairly consistent with SQL. Imagine you render (as in HTML and the like) 1, 2, 4, 9 and instead of the None you render a 3. Now, the rendering engine needs to special-check None to put in a pre-defined value. Furthermore, all places where you need that list [1, 2, None, 9], you basically need to special-check None and act appropriately. (Of course it was not that simple but you get the idea. The numbers stand for fairly complex objects drawn from the database.) > > Mostly, this does still represent "no such value". Point was "no such value" sucks. It can be a blend of every other value and semantics depending on the function, type and so forth. It's too convenient that people would not use it. As the example with the list shows us, the 3 could have easily be put into the database as it behaves exactly the same as the other objects. The same goes for the special datetime objects. The lack of thinking and appropriate default objects, lead to the usage of None. People tend to use None for everything that is special and you end up with something really nasty to debug. Not why people don't find it problematic when I said "we found 6/7 domain-agnostic semantics for NULL". The "no such value" can be any of them OR a blend of them. That, I don't want to see in the code; that's all. Btw. having the third issue of above, I could add another domain-agnostic meaning for None: "too lazy to create a pre-defined object but instead using None". > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rymg19 at gmail.com Wed Sep 23 19:34:14 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 23 Sep 2015 12:34:14 -0500 Subject: [Python-ideas] PEP 505 [was Re: Null coalescing operators] In-Reply-To: <20150923164651.GU31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150923164651.GU31152@ando.pearwood.info> Message-ID: On Wed, Sep 23, 2015 at 11:47 AM, Steven D'Aprano wrote: > I've now read PEP 505, and I would like to comment. > > Executive summary: > > - I have very little interest in the ?? and ?= operators, but don't > object to them: vote +0 > > - I have a little more interest in ?. and ?[ provided that the > precedence allows coalescing multiple look-ups from a single > question mark: vote +1 > > - if it uses the (apparent?) Dart semantics, I am opposed: vote -1 > > - if the syntax chosen uses || or !| as per Nick's suggestion, > I feel the cryptic and ugly syntax is worse than the benefit: > vote -1 > > > In more detail: > > I'm sympathetic to the idea of reducing typing, but I think it is > critical to recognise that reducing typing is not always a good thing. > If it were, we would always name our variables "a", "b" etc, the type of > [] would be "ls", and we would say "frm col impt ODt". And if you have > no idea what that last one means, that's exactly my point. > from collections import OrderedDict > > Reducing typing is a good thing *up to a point*, at which time it > becomes excessively terse and cryptic. One of the things I like about > Python is that it is not Perl: it doesn't have an excess of punctuation > and short-cuts. Too much syntactic sugar is a bad thing. > > The PEP suggests a handful of new operators: > > (1) Null Coalescing Operator > > spam ?? eggs > > equivalent to a short-circuiting: > > spam if spam is not None else eggs > > I'm ambivalent about this. I don't object to it, but nor does it excite > me in the least. I don't think the abbreviated syntax gains us enough in > expressiveness to make up for the increase in terseness. In its favour, > it can reduce code duplication, and also act as a more correct > alternative to `spam or eggs`. (See the PEP for details.) > > So I'm a very luke-warm +0 on this part of the PEP. > > > > (2) None coalescing assignment > > spam ?= eggs > > being equivalent to: > > if spam is None: > spam = eggs > > For the same reasons as above, I'm luke-warm on this: +0. > > > > (3) Null-Aware Member Access Operator > > spam?.attr > > being equivalent to > > spam.attr if spam is not None else None > > To me, this passes the test "does it add more than it costs in cryptic > punctuation?", so I'm a little more positive about this. > > If my reading is correct, the PEP underspecifies the behaviour of this > when there is a chain of attribute accesses. Consider: > > spam?.eggs.cheese > > This can be interpreted two ways: > > (a) (spam.eggs.cheese) if spam is not None else None > > (b) (spam.eggs if spam is not None).cheese > > but the PEP doesn't make it clear which behaviour they have in mind. > Dart appears to interpret it as (b), as the reference given in the > PEP shows this example: > > [quote] > You can chain ?. calls, for example: > obj?.child?.child?.getter > [quote] > > http://blog.sethladd.com/2015/07/null-aware-operators-in-dart.html > > That would seem to imply that obj?.child.child.getter would end up > trying to evaluate null.child if the first ?. operator returned null. > > I don't think the Dart semantics is useful, indeed it is actively > harmful in that it can hide bugs: > > Suppose we have an object which may be None, but if not, it must > have an attribute spam which in turn must have an attribute eggs. This > implies that spam must not be None. We want: > > obj.spam.eggs if obj is not None else None > > Using the Dart semantics, we chain ?. operators and get this: > > obj?.spam?.eggs > > If obj is None, the expression correctly returns None. If obj is not > None, and obj.spam is not None, the expression correctly returns eggs. > But it is over-eager, and hides a bug: if obj.spam is None, you want to > get an AttributeError, but instead the error is silenced and you get > None. > > So I'm -1 with the Dart semantics, and +1 otherwise. > > I have to kind of agree here. In reality, I don't see any issues like this with approach (a). > > > (3) Null-Aware Index Access Operator > > spam?[item] > > being similar to spam.attr. Same reasoning applies to this as for > attribute access. > > > > Nick has suggested using || instead of ??, and similar for the other > operators. I don't think this is attractive at all, but the deciding > factor which makes Nick's syntax a -1 for me is that it is inconsistent > and confusing. He has to introduce a !| variation, so the user has to > remember when to use two |s and when to use a ! instead, whether the ! > goes before or after the | and that !! is never used. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Sep 23 20:46:15 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 23 Sep 2015 19:46:15 +0100 Subject: [Python-ideas] PEP 505 [was Re: Null coalescing operators] In-Reply-To: <20150923164651.GU31152@ando.pearwood.info> References: <747c20ca-960e-4d0e-83c7-f17a61e3d42d@googlegroups.com> <20150923164651.GU31152@ando.pearwood.info> Message-ID: On 23 September 2015 at 17:47, Steven D'Aprano wrote: > I've now read PEP 505, and I would like to comment. Having read the various messages in this thread, and then your summary (which was interesting, because it put a lot of the various options next to each other) I have to say: 1. The "expanded" versions using if..else are definitely pretty unreadable and ugly (for all the variations). But in practice, I'd be very unlikely to use if expressions in this case - I'd be more likely to expand the whole construct, probably involving an if *statement*. Comparing a multi-line statement to an operator is much harder to do in a general manner. So I guess I can see the benefits, but I suspect the operators won't be used in practice as much as people are implying (in much the same way that the use if the if expression is pretty rare in real Python code, as opposed to examples). 2. All of the punctuation-based suggestions remain ugly to my eyes. ? is too visually striking, and has too many other associations for me ("help" in IPython, and as a suffix for variable names from Lisp). Nick's || version looked plausible, but the inconsistent !| variations bother me. 3. People keep referring to "obj ?? default" in comparison to "obj or default". The comparison is fine - as is the notion that we are talking about a version that simply replaces a truth test with an "is None" test. But to me it also says that we should be looking for a keyword, not a punctuation operator - the "or" version reads nicely, and the punctuation version looks very cryptic in comparison. I can't think of a good keyword, or a viable way to use a keyword for the ?. ?[ and ?( variations, but I wish I could. Summary - I don't mind the addition of the functionality, although I don't think it's crucial. But I really dislike the punctuation. The benefits don't justify the cost for me. Paul From ncoghlan at gmail.com Thu Sep 24 05:00:39 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Sep 2015 13:00:39 +1000 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: References: Message-ID: On 24 Sep 2015 01:59, "Nick Coghlan" wrote: > > The only other potential spelling of the coalescence case that comes > to mind is to make "?" available in conditional expressions as a > reference to the LHS: > > data = data if ? is not None else [] > headers = headers if ? is not None else {} > title = user_title if ? is not None else local_default_title if ? > is not None else global_default_title One advantage of this more explicit spelling is that it permits sentinels other than None in the expanded form: data = data if ? is not sentinel else default() Only the shorthand cases (augmented assignment, attribute access, subscript lookup) would be restricted to checking specifically against None. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at lucidity.plus.com Fri Sep 25 00:28:45 2015 From: python at lucidity.plus.com (Erik) Date: Thu, 24 Sep 2015 23:28:45 +0100 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: References: Message-ID: <5604791D.3000204@lucidity.plus.com> On 24/09/15 04:00, Nick Coghlan wrote: > data = data if ? is not sentinel else default() This reads OK in a short example like this and when using word-based operators such as "is not". However, it's a bit clumsy looking when using operators spelled with punctuation: data = data if ? != None else default() data = data if foo <= ? <= bar else default() > title = user_title if ? is not None else local_default_title if ? is not None else global_default_title I don't think I like the way '?' changes its target during the line in this example. For example, the equivalent of the admittedly-contrived expression: foo = bar if foo is None else baz if baz is not None else foo.frobnicate() is: foo = bar if ? is None else baz if ? is not None else foo.frobnicate() ... so you still have to spell 'foo' repeatedly (and only due to the subtle switch of the '?' target, which might go away (or be added) during code maintenance or refactoring). Also, if '?' is sort of a short-cut way of referencing the LHS, then one might naively expect to be able to write this: [(x, y) for x in range(5) if ? < 3 for y in range(5) if ? > 2] Regs, E. From guido at python.org Fri Sep 25 00:30:45 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Sep 2015 15:30:45 -0700 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: <5604791D.3000204@lucidity.plus.com> References: <5604791D.3000204@lucidity.plus.com> Message-ID: Using "?" as a (pro)noun is even worse than using it as an operator/modifier. On Thu, Sep 24, 2015 at 3:28 PM, Erik wrote: > On 24/09/15 04:00, Nick Coghlan wrote: > >> data = data if ? is not sentinel else default() >> > > This reads OK in a short example like this and when using word-based > operators such as "is not". However, it's a bit clumsy looking when using > operators spelled with punctuation: > > data = data if ? != None else default() > data = data if foo <= ? <= bar else default() > > > title = user_title if ? is not None else local_default_title if ? is not > None else global_default_title > > I don't think I like the way '?' changes its target during the line in > this example. > > For example, the equivalent of the admittedly-contrived expression: > > foo = bar if foo is None else baz if baz is not None else foo.frobnicate() > > is: > > foo = bar if ? is None else baz if ? is not None else foo.frobnicate() > > ... so you still have to spell 'foo' repeatedly (and only due to the > subtle switch of the '?' target, which might go away (or be added) during > code maintenance or refactoring). > > > Also, if '?' is sort of a short-cut way of referencing the LHS, then one > might naively expect to be able to write this: > > [(x, y) for x in range(5) if ? < 3 for y in range(5) if ? > 2] > > Regs, E. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at lucidity.plus.com Fri Sep 25 00:40:51 2015 From: python at lucidity.plus.com (Erik) Date: Thu, 24 Sep 2015 23:40:51 +0100 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: References: <5604791D.3000204@lucidity.plus.com> Message-ID: <56047BF3.6090308@lucidity.plus.com> On 24/09/15 23:30, Guido van Rossum wrote: > Using "?" as a (pro)noun is even worse than using it as an > operator/modifier. That was what I was trying to say, but you did it more correctly and using far fewer characters. How very Pythonic of you ... ;) E. From youtux at gmail.com Fri Sep 25 01:07:53 2015 From: youtux at gmail.com (Alessio Bogon) Date: Fri, 25 Sep 2015 01:07:53 +0200 Subject: [Python-ideas] Using `or?` as the null coalescing operator Message-ID: <6C2E5579-42A0-423F-AB8C-01B49FA59D67@gmail.com> I really like PEP 0505. The only thing that does not convince me is the `??` operator. I would like to know what you think of an alternative like `or?`: a_list = some_list or? [] a_dict = some_dict or? {} The rationale behind is to let `or` do its job with ?truthy? values, while `or?` would require non-None values. The rest of the PEP looks good to me. I apologise in advance if this was already proposed and I missed it. Regards, Alessio From gokoproject at gmail.com Fri Sep 25 01:13:29 2015 From: gokoproject at gmail.com (John Wong) Date: Thu, 24 Sep 2015 19:13:29 -0400 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: <56047BF3.6090308@lucidity.plus.com> References: <5604791D.3000204@lucidity.plus.com> <56047BF3.6090308@lucidity.plus.com> Message-ID: I just read the PEP. As a user, I would prefer || until... [Nick] > > If this particular syntax were to be chosen, I also came up with the > following possible mnemonics that may be useful as an explanatory > tool: > "||" is a barrier to prevent None passing through an expression > "!|" explicitly allows None to pass without error I am like eating my own words, !| is pretty hard to read, especially during code review. The two symbols look too similar. Do we really need to have one doesn't raise exception and one that does? Next, the example title?.upper() in the PEP, this is also kind of ugly and unclear to me what's the purpose. I do appreciate the idea of circuit, but I don't feel the syntax is right. To me this is the debate between defaultdict and primitive dict (but in that debate you don't have the option to raise or not raise exception, but Nick's proposal does). > data = [] if data is None else data Looks like a valid case for a short-cut operator. The argument that "undesirable effect of putting the operands in an unintuitive order" is not so bad. Once you have seen it once it should make sense. I will probably poke Star War and say "python awesome, it is" and our brain will adopt. At least that line is still readable. > data = data ?? [] I would prefer || again simply because of no new syntax. Actually, the price computing example in PEP 505 is not too convincing from a contract standpoint . The proposal is shorter than writing if requested_quanity is not None, but if you have to think about using null coalescing operator, then aren't you already spotting a case you need to handle? The example shows how the bug can be prevented, so maybe requested_quanlity should really default to 0 from the beginning, not None. None shouldn't appear and if it appear it should be a bug, and using null coalescing in this very example is actually a bug from my view. You are just avoiding ever having to think about taking care of such case in your code. But then you have negative number to avoid too... so that still require a sanity check somewhere. Just my four cents. On Thu, Sep 24, 2015 at 6:40 PM, Erik wrote: > On 24/09/15 23:30, Guido van Rossum wrote: > >> Using "?" as a (pro)noun is even worse than using it as an >> operator/modifier. >> > > That was what I was trying to say, but you did it more correctly and using > far fewer characters. How very Pythonic of you ... ;) > > > E. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at lucidity.plus.com Fri Sep 25 02:30:54 2015 From: python at lucidity.plus.com (Erik) Date: Fri, 25 Sep 2015 01:30:54 +0100 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: References: <5604791D.3000204@lucidity.plus.com> <56047BF3.6090308@lucidity.plus.com> Message-ID: <560495BE.3070600@lucidity.plus.com> Throwing this one out there in case it inspires someone to come up with a better variation (or to get it explicitly rejected): object.( if else ) ... where 'accessor' is anything normally allowed after 'object' ([], (), attr) and 'condition' can omit the LHS of any conditional expression (which is taken the associated object) or not (i.e., can be a complete condition independent of the associated object): foo = bar.((param0, param1) if not None else default()) foo = bar.([idx] if != sentinel else default()) And the perhaps more off-the-wall (as 'bar' is not involved in the condition): foo = bar.(attr if secrets.randint(0, 1023) & 1 else default()) E. From steve at pearwood.info Fri Sep 25 03:35:04 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 25 Sep 2015 11:35:04 +1000 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: <560495BE.3070600@lucidity.plus.com> References: <5604791D.3000204@lucidity.plus.com> <56047BF3.6090308@lucidity.plus.com> <560495BE.3070600@lucidity.plus.com> Message-ID: <20150925013500.GD23642@ando.pearwood.info> On Fri, Sep 25, 2015 at 01:30:54AM +0100, Erik wrote: > Throwing this one out there in case it inspires someone to come up with > a better variation (or to get it explicitly rejected): > > object.( if else ) > > ... where 'accessor' is anything normally allowed after 'object' ([], > (), attr) and 'condition' can omit the LHS of any conditional expression > (which is taken the associated object) or not (i.e., can be a complete > condition independent of the associated object): > > foo = bar.((param0, param1) if not None else default()) I think that your intention is for that to be equivalent to: if bar not None: # missing "is" operator foo = bar(param0, param1) else: foo = default() I had to read your description three times before I got to the point where I could understand it. Some problems: I thought `bar.( ...)` meant attribute access, so I initially expected the true branch to evaluate to: foo = bar.(param0, param1) which of course is a syntax error. Presumably you would write `bar.(attr if ...)` for attribute access and not `bar.(.attr if ...)`. I'm still confused about the missing `is`. Maybe you meant: if not None: # evaluates to True ... which is a problem with your suggestion that the left hand side of the condition is optional -- it makes it harder to catch errors in typing. Worse, it's actually ambiguous in some cases: spam = eggs.(cheese if - x else "aardvark") can be read as: if eggs - x: # implied bool(eggs - x) spam = eggs.cheese else: spam = "aardvark" or as this: if -x: # implied bool(-x) spam = eggs.cheese else: spam = "aardvark" > foo = bar.([idx] if != sentinel else default()) I **really** hate this syntax. It almost makes me look more fondly at the || / !| syntax. Looking at this, I really want to interprete the last part as foo = bar.default() so I can see this being a really common error. "Why isn't my method being called?" -1 on this. -- Steve From python at lucidity.plus.com Fri Sep 25 04:01:12 2015 From: python at lucidity.plus.com (Erik) Date: Fri, 25 Sep 2015 03:01:12 +0100 Subject: [Python-ideas] Using "||" (doubled pipe) as the null coalescing operator? In-Reply-To: <20150925013500.GD23642@ando.pearwood.info> References: <5604791D.3000204@lucidity.plus.com> <56047BF3.6090308@lucidity.plus.com> <560495BE.3070600@lucidity.plus.com> <20150925013500.GD23642@ando.pearwood.info> Message-ID: <5604AAE8.50808@lucidity.plus.com> Hi Steven, On 25/09/15 02:35, Steven D'Aprano wrote: > I think that your intention is for that to be equivalent to: > > if bar not None: # missing "is" operator > foo = bar(param0, param1) > else: > foo = default() Yes, you are correct. I omitted the 'is'. > I thought `bar.( ...)` meant attribute access, so I initially > expected the true branch to evaluate to: > > foo = bar.(param0, param1) > > which of course is a syntax error. Presumably you would write > `bar.(attr if ...)` for attribute access and not `bar.(.attr if ...)`. I chose ".()" on purpose because it was a syntax error. Not including the "." meant it looks like a function call, so that wasn't workable. ".()" was supposed to read "I'm doing something with this object, but what I'm doing is conditional, so read on". > I'm still confused about the missing `is`. Maybe you meant: No, I meant to write 'is'. > Worse, it's actually ambiguous in some cases: Hmmm. Yes, OK, I see the problem here. >> foo = bar.([idx] if != sentinel else default()) > > I **really** hate this syntax. "hate" is a very strong word. You've prefixed it with "really" (and emphasised that with several asterisks) - are you trying to tell me something? ;) > I really want to interprete the > last part as > > foo = bar.default() Yes, I can see that's a reasonable interpretation. I never expected my suggestion to be embraced as-is, but perhaps it will inspire someone else to come up with a more enlightened suggestion - I did say that at the top of the post ;) E. From tim.peters at gmail.com Fri Sep 25 06:02:30 2015 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 24 Sep 2015 23:02:30 -0500 Subject: [Python-ideas] PEP 504: Using the system RNG by default In-Reply-To: References: <1442341539.574404.384456273.435775D6@webmail.messagingengine.com> <87mvwnxful.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: [Tim] > ... > "Password generators" should be the least of our worries. Best I can > tell, the PHP paper's highly technical MT attack against those has > scant chance of working in Python except when random.choice(x) is > known to have len(x) a power of 2. Then it's a very powerful attack. Ha! That's actually its worse case, although everyone missed that. I wrote a solver, and bumped into this while testing it. The rub is this line in _randbelow(): k = n.bit_length() # don't use (n-1) here because n can be 1 If n == 2**i, k is i+1 then, and ._randbelow() goes on to throw away half of all 32-bit MT outputs. Everyone before assumed it wouldn't throw any away. The best case for this kind of solver is when .choice(x) has len(x) one less than a power of 2, say 2**i - 1. Then k = i, and ._randbelow() throws away 1 of each of 2**i MT outputs (on average). For small i (say, len(x) == 63), every time I tried then the solver (which can only record bits from MT outputs it _knows_ were produced) found itself stuck with inconsistent equations. If len(x) = 2**20 - 1, _then_ it has a great chance of succeeding. There's about a chance in a million then that a single .choice() call will consume 2 32-bit MT outputs, It takes around 1,250 consecutive observations (of .choice() results) to deduce the starting state then, assuming .choice() never skips an MT output. The chance that no output was in fact skipped is about: >>> (1 - 1./2**20) ** 1250 0.9988086167972104 So that attack is very likely to succeed. So, until the "secrets" module is released, and you're too dense to use os.urandom(), don't pick passwords from a million-character alphabet ;-) From steve at pearwood.info Sat Sep 26 15:07:15 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 26 Sep 2015 23:07:15 +1000 Subject: [Python-ideas] PEP 506 (secrets module) and token functions In-Reply-To: <20150919181612.GT31152@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> Message-ID: <20150926130715.GG23642@ando.pearwood.info> I'm looking for guidance and/or consensus on two issues regarding token* functions in secrets: output type, and default values. The idea is that the module will include a few functions for generating tokens, suitable for (say) password recovery, with the following signatures: def token_bytes(nbytes:int) -> bytes: """Return nbytes random bytes.""" def token_hex(nbytes:int) -> ???? : """Return nbytes random bytes, encoded to hex""" def token_url(nbytes:int) -> ???? : """Return nbytes random bytes, URL-safe base64 encoded.""" Question one: - token_bytes obviously should return bytes. What should the others return, bytes or str? Question two: - Many people will have no idea how many bytes should be used to be confident that it will be hard for an attacker to guess. Earlier, I suggested that the three functions include default values for nbytes, and there were no objections. Do we have consensus on this, and if so, what default value should we use? Question three: - If we have default values, do we need some sort of documented exception to the general backwards-compatibility requirement? E.g. suppose we release the module in 3.6.0 with defaults of 32 bytes, and in 3.6.2 we discover that's too small and we should have used 64 bytes. Can we change the default in 3.6.3 without notice? -- Steve From storchaka at gmail.com Sat Sep 26 15:56:09 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 26 Sep 2015 16:56:09 +0300 Subject: [Python-ideas] PEP 506 (secrets module) and token functions In-Reply-To: <20150926130715.GG23642@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150926130715.GG23642@ando.pearwood.info> Message-ID: On 26.09.15 16:07, Steven D'Aprano wrote: > Question one: > > - token_bytes obviously should return bytes. What should the others > return, bytes or str? Why don't left conversion to the user? You can provide simple receipts in the documentation. def token_hex(nbytes): return token_bytes(nbytes).hex() def token_url(nbytes): return base64.urlsafe_b64encode(token_bytes(nbytes)).rstrip(b'=') We don't know what functions are needed by users. After the secrets module is widely used, we could gather the statistics of most popular patterns and add some of them in the stdlib. > Question two: > > - Many people will have no idea how many bytes should be used to be > confident that it will be hard for an attacker to guess. Earlier, I > suggested that the three functions include default values for nbytes, > and there were no objections. Do we have consensus on this, and if so, > what default value should we use? I would made the nbytes argument mandatory, and exposed recommended values in examples. >>> secrets.token_bytes(32) b'\xf8\x80Ejh\x1ck\xfbL\xc3l\xd3ev\x1bT\xbe\x983\x072\xbbP\xe2\xee\xf8\xdc\xaf\xe4\xddJ#' From rosuav at gmail.com Sat Sep 26 16:04:49 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 27 Sep 2015 00:04:49 +1000 Subject: [Python-ideas] PEP 506 (secrets module) and token functions In-Reply-To: <20150926130715.GG23642@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150926130715.GG23642@ando.pearwood.info> Message-ID: On Sat, Sep 26, 2015 at 11:07 PM, Steven D'Aprano wrote: > Question one: > > - token_bytes obviously should return bytes. What should the others > return, bytes or str? str. The point of encoding them is to turn the entropy into some form of text, so IMO it makes sense to treat this as text. > Question two: > > - Many people will have no idea how many bytes should be used to be > confident that it will be hard for an attacker to guess. Earlier, I > suggested that the three functions include default values for nbytes, > and there were no objections. Do we have consensus on this, and if so, > what default value should we use? > > Question three: > > - If we have default values, do we need some sort of documented > exception to the general backwards-compatibility requirement? > > E.g. suppose we release the module in 3.6.0 with defaults of 32 bytes, > and in 3.6.2 we discover that's too small and we should have used 64 > bytes. Can we change the default in 3.6.3 without notice? So as I understand you, there are three options: 1) No default. Whenever you want entropy, you say how much. Simple. 2) Fixed default, covered by backward guarantee promises. 3) Variable default with an implication that using the default entropy is "secure enough" for most purposes. Can you adequately define "secure enough" across all purposes? If so, I would support that. The precise number would never be documented specifically (if you want to know what your version does, try it interactively), and then it can indeed be changed in 3.6.3 - or even without a version number bump at all (in ten years' time, Red Hat might choose to continue shipping CPython 3.6.1, but change the default entropy value). Otherwise, I would be inclined toward not having a default at all. Having one that can be changed only in 3.7 seems like the worst of both worlds - programs can't depend on the value being constant, but a security enhancement can't be done on an already-released version. ChrisA From vxgmichel at gmail.com Sat Sep 26 16:29:12 2015 From: vxgmichel at gmail.com (Vincent Michel) Date: Sat, 26 Sep 2015 16:29:12 +0200 Subject: [Python-ideas] Submitting a job to an asyncio event loop Message-ID: Hi, I noticed there is currently no standard solution to submit a job from a thread to an asyncio event loop. Here's what the asyncio documentation says about concurrency and multithreading: > To schedule a callback from a different thread, the BaseEventLoop.call_soon_threadsafe() method should be used. > Example to schedule a coroutine from a different thread: > loop.call_soon_threadsafe(asyncio.async, coro_func()) The issue with this method is the loss of the coroutine result. One way to deal with this issue is to connect the asyncio.Future returned by async (or ensure_future) to a concurrent.futures.Future. It is then possible to use a subclass of concurrent.futures.Executor to submit a callback to an asyncio event loop. Such an executor can also be used to set up communication between two event loops using run_in_executor. I posted an implementation called LoopExecutor on GitHub: https://github.com/vxgmichel/asyncio-loopexecutor The repo contains the loopexecutor module along with tests for several use cases. The README describes the whole thing (context, examples, issues, implementation). It is interesting to note that this executor is a bit different than ThreadPoolExecutor and ProcessPoolExecutor since it can also submit a coroutine function. Example: with LoopExecutor(loop) as executor: future = executor.submit(operator.add, 1, 2) assert future.result() == 3 future = executor.submit(asyncio.sleep, 0.1, result=3) assert future.result() == 3 This works in both cases because submit always cast the given function to a coroutine. That means it would also work with a function that returns a Future. Here's a few topic related to the current implementation that might be interesting to discuss: - possible drawback of casting the callback to a coroutine - possible drawback of concurrent.future.Future using asyncio.Future._copy_state - does LoopExecutor need to implement the shutdown method? - removing the limitation in run_in_executor (can't submit a coroutine function) - adding a generic Future connection function in asyncio - reimplementing wrap_future with the generic connection - adding LoopExecutor to asyncio (or concurrent.futures) At the moment, the interaction between asyncio and concurrent.futures only goes one way. It would be nice to have a standard solution (LoopExecutor or something else) to make it bidirectional. Thanks, Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Sep 26 23:00:01 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 26 Sep 2015 14:00:01 -0700 Subject: [Python-ideas] PEP 506 (secrets module) and token functions In-Reply-To: <20150926130715.GG23642@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150926130715.GG23642@ando.pearwood.info> Message-ID: On Sep 26, 2015, at 06:07, Steven D'Aprano wrote: > > Question three: > > - If we have default values, do we need some sort of documented > exception to the general backwards-compatibility requirement? Why not just use a default value of None, and document that None picks an appropriate value? Then, if it changes to a different appropriate value in 3.7 or 3.6.3 or some custom build of CPython, it hasn't broken backward compatibility. > > E.g. suppose we release the module in 3.6.0 with defaults of 32 bytes, > and in 3.6.2 we discover that's too small and we should have used 64 > bytes. Can we change the default in 3.6.3 without notice? From guido at python.org Sun Sep 27 04:52:15 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Sep 2015 19:52:15 -0700 Subject: [Python-ideas] Submitting a job to an asyncio event loop In-Reply-To: References: Message-ID: Hi Vincent, I've read your write-up with interest. You're right that it's a bit awkward to make calls from the threaded world into the asyncio world. Interestingly, there's much better support for passing work off from the asyncio event loop to a thread (run_in_executor()). Perhaps that's because the use case there was obvious from the start: some things that may block for I/O just don't have an async interface yet, so in order to use them from an asyncio task they must be off-loaded to a separate thread or else the entire event loop is blocked. (This is used for calling getaddrinfo(), for example.) I'm curious where you have encountered the opposite use case? I think if I had to do this myself I would go for a more minimalist interface: something like your submit() method but without the call to asyncio.coroutine(fn). Having the caller pass in the already-called coroutine object might simplify the signature even further. I'm not sure I see the advantage of trying to make this an executor -- but perhaps I'm missing something? --Guido On Sat, Sep 26, 2015 at 7:29 AM, Vincent Michel wrote: > Hi, > > I noticed there is currently no standard solution to submit a job from a > thread to an asyncio event loop. > > Here's what the asyncio documentation says about concurrency and > multithreading: > > > To schedule a callback from a different thread, the > BaseEventLoop.call_soon_threadsafe() method should be used. > > Example to schedule a coroutine from a different thread: > > loop.call_soon_threadsafe(asyncio.async, coro_func()) > > The issue with this method is the loss of the coroutine result. > > One way to deal with this issue is to connect the asyncio.Future returned > by async (or ensure_future) to a concurrent.futures.Future. It is then > possible to use a subclass of concurrent.futures.Executor to submit a > callback to an asyncio event loop. Such an executor can also be used to set > up communication between two event loops using run_in_executor. > > I posted an implementation called LoopExecutor on GitHub: > https://github.com/vxgmichel/asyncio-loopexecutor > The repo contains the loopexecutor module along with tests for several use > cases. The README describes the whole thing (context, examples, issues, > implementation). > > It is interesting to note that this executor is a bit different than > ThreadPoolExecutor and ProcessPoolExecutor since it can also submit a > coroutine function. Example: > > with LoopExecutor(loop) as executor: > future = executor.submit(operator.add, 1, 2) > assert future.result() == 3 > future = executor.submit(asyncio.sleep, 0.1, result=3) > assert future.result() == 3 > > This works in both cases because submit always cast the given function to > a coroutine. That means it would also work with a function that returns a > Future. > > Here's a few topic related to the current implementation that might be > interesting to discuss: > > - possible drawback of casting the callback to a coroutine > - possible drawback of concurrent.future.Future using > asyncio.Future._copy_state > - does LoopExecutor need to implement the shutdown method? > - removing the limitation in run_in_executor (can't submit a coroutine > function) > - adding a generic Future connection function in asyncio > - reimplementing wrap_future with the generic connection > - adding LoopExecutor to asyncio (or concurrent.futures) > > At the moment, the interaction between asyncio and concurrent.futures only > goes one way. It would be nice to have a standard solution (LoopExecutor or > something else) to make it bidirectional. > > Thanks, > > Vincent > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Sep 27 15:28:13 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Sep 2015 23:28:13 +1000 Subject: [Python-ideas] PEP 506 (secrets module) and token functions In-Reply-To: <20150926130715.GG23642@ando.pearwood.info> References: <20150919181612.GT31152@ando.pearwood.info> <20150926130715.GG23642@ando.pearwood.info> Message-ID: On 26 September 2015 at 23:07, Steven D'Aprano wrote: > I'm looking for guidance and/or consensus on two issues regarding token* > functions in secrets: output type, and default values. > > The idea is that the module will include a few functions for generating > tokens, suitable for (say) password recovery, with the > following signatures: > > def token_bytes(nbytes:int) -> bytes: > """Return nbytes random bytes.""" > def token_hex(nbytes:int) -> ???? : > """Return nbytes random bytes, encoded to hex""" > > def token_url(nbytes:int) -> ???? : > """Return nbytes random bytes, URL-safe base64 encoded.""" > > > Question one: > > - token_bytes obviously should return bytes. What should the others > return, bytes or str? token_hex and token_url are inspired by Pyramid's and Django's token generators (albeit with a different implementation technique in the latter case), so I'd look at what type those return. The Django token generator is django.utils.crypto.get_random_string, and returns text. The Pyramid CSRF token generator in sessions.BaseCookieSessionFactory.CookieSession.new_csrf_token also returns text However, I'm starting to think we should just pick one of the two algorithms and call it "token_str" (with the shorter output from the URL-safe base64 with any trailing "=" removed being my preference). For folks that want or need to use a different token generation algorithm, we can offer the Pyramid and Django generation algorithms as recipes in the documentation. > Question two: > > - Many people will have no idea how many bytes should be used to be > confident that it will be hard for an attacker to guess. Earlier, I > suggested that the three functions include default values for nbytes, > and there were no objections. Do we have consensus on this, and if so, > what default value should we use? 32 bytes (256 bits of entropy) seems like a reasonable default to me. > Question three: > > - If we have default values, do we need some sort of documented > exception to the general backwards-compatibility requirement? > > E.g. suppose we release the module in 3.6.0 with defaults of 32 bytes, > and in 3.6.2 we discover that's too small and we should have used 64 > bytes. Can we change the default in 3.6.3 without notice? I like Andrew's suggestion of making the default None, and saying that passing None means we'll choose an appropriate length, which will be 32 bytes for now, but may change in maintenance releases to increase the length if we decide 256 bits of entropy isn't enough. Changes in the default length could be indicated through "versionchanged" notes in the "token_bytes" documentation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Sep 27 15:30:04 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Sep 2015 23:30:04 +1000 Subject: [Python-ideas] PEP 506 (secrets module) and token functions In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150926130715.GG23642@ando.pearwood.info> Message-ID: On 26 September 2015 at 23:56, Serhiy Storchaka wrote: > On 26.09.15 16:07, Steven D'Aprano wrote: >> >> Question one: >> >> - token_bytes obviously should return bytes. What should the others >> return, bytes or str? > > > Why don't left conversion to the user? You can provide simple receipts in > the documentation. > > def token_hex(nbytes): > return token_bytes(nbytes).hex() > > def token_url(nbytes): > return base64.urlsafe_b64encode(token_bytes(nbytes)).rstrip(b'=') > > We don't know what functions are needed by users. After the secrets module > is widely used, we could gather the statistics of most popular patterns and > add some of them in the stdlib. We already have those patterns based on what web frameworks use - the hex token generator pattern is taken from Pyramid's token generator, while the base64 one is inspired by Django's (the latter actually uses the "choosing from an alphabet" implementation style, but the proposed base64 approach makes the same general trade-off of encoding more bits of entropy per character to make the overall output shorter). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Sep 27 15:35:33 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Sep 2015 23:35:33 +1000 Subject: [Python-ideas] PEP 506 (secrets module) and token functions In-Reply-To: References: <20150919181612.GT31152@ando.pearwood.info> <20150926130715.GG23642@ando.pearwood.info> Message-ID: On 27 September 2015 at 00:04, Chris Angelico wrote: > Can you adequately define "secure enough" across all purposes? If so, > I would support that. The precise number would never be documented > specifically (if you want to know what your version does, try it > interactively), and then it can indeed be changed in 3.6.3 - or even > without a version number bump at all (in ten years' time, Red Hat > might choose to continue shipping CPython 3.6.1, but change the > default entropy value). We backported PEP 466 with its "the default SSL context settings may change in maintenance releases" behaviour to the Python 2.7.5 based system Python in RHEL 7.2, so I expect we'd be OK with backporting changes to default entropy settings in the secrets module. The default settings in the system provided OpenSSL have also long been subject to change (that's one of the reasons CPython defaults to dynamically linking to OpenSSL on *nix systems). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From vxgmichel at gmail.com Sun Sep 27 15:36:05 2015 From: vxgmichel at gmail.com (Vincent Michel) Date: Sun, 27 Sep 2015 15:36:05 +0200 Subject: [Python-ideas] Submitting a job to an asyncio event loop In-Reply-To: References: Message-ID: Hi Guido, Thanks for your interest, I work for a synchrotron and we use the distributed control system TANGO. The main implementation is in C++, but we use a python binding called PyTango. The current server implementation (on the C++ side) does not feature an event loop but instead create a different thread for each client. TANGO: http://www.tango-controls.org/ PyTango: http://www.esrf.eu/computing/cs/tango/tango_doc/kernel_doc/pytango/latest/index.html I wanted to add asyncio support to the library, so that we can benefit from single-threaded asynchronous programming. The problem is that client callbacks run in different threads and there is not much we can do about it until a pure python implementation is developed (and it's a lot of work). Instead, it is possible to use an asyncio event loop, run the server through run_in_executor (juste like you mentioned in your mail), and redirect all the client callbacks to the event loop. That's the part where job submission from a different thread comes in handy. A very similar solution has been developed using gevent, but I like explicit coroutines better :p Another use case is the communication between two event loops. From what I've seen, the current context (get/set event loop) is only related to the current thread. It makes it easy to run different event loops in different threads. Even though I'm not sure what the use case is, I suppose it's been done intentionally. Then the executor interface is useful to run things like: executor = LoopExecutor(other_loop) result = await my_loop.run_in_executor(executor, coro_func, *args) There is working example in the test directory: https://github.com/vxgmichel/asyncio-loopexecutor/blob/master/test/test_multi_loop.py *** The coroutine(fn) cast only makes sense if a subclass of Executor is used, in order to be consistent with the Executor.submit signature. Otherwise, passing an already-called coroutine is perfectly fine. I think it is a good idea to define a simple submit function like you recommended: def submit_to_loop(loop, coro): future = concurrent.futures.Future() callback = partial(schedule, coro, destination=future) loop.call_soon_threadsafe(callback) return future And then use the executor interface if we realize it is actually useful. It's really not a lot of code anyway: class LoopExecutor(concurrent.futures.Executor): def __init__(self, loop=None): self.loop = loop or asyncio.get_event_loop() def submit(self, fn, *args, **kwargs): coro = asyncio.coroutine(fn)(*args, **kwargs) return submit_to_loop(self.loop, coro) I'll update the repository. Cheers, Vincent 2015-09-27 4:52 GMT+02:00 Guido van Rossum : > > Hi Vincent, > > I've read your write-up with interest. You're right that it's a bit awkward to make calls from the threaded world into the asyncio world. Interestingly, there's much better support for passing work off from the asyncio event loop to a thread (run_in_executor()). Perhaps that's because the use case there was obvious from the start: some things that may block for I/O just don't have an async interface yet, so in order to use them from an asyncio task they must be off-loaded to a separate thread or else the entire event loop is blocked. (This is used for calling getaddrinfo(), for example.) > > I'm curious where you have encountered the opposite use case? > > I think if I had to do this myself I would go for a more minimalist interface: something like your submit() method but without the call to asyncio.coroutine(fn). Having the caller pass in the already-called coroutine object might simplify the signature even further. I'm not sure I see the advantage of trying to make this an executor -- but perhaps I'm missing something? > > --Guido > > > > On Sat, Sep 26, 2015 at 7:29 AM, Vincent Michel wrote: >> >> Hi, >> >> I noticed there is currently no standard solution to submit a job from a thread to an asyncio event loop. >> >> Here's what the asyncio documentation says about concurrency and multithreading: >> >> > To schedule a callback from a different thread, the BaseEventLoop.call_soon_threadsafe() method should be used. >> > Example to schedule a coroutine from a different thread: >> > loop.call_soon_threadsafe(asyncio.async, coro_func()) >> >> The issue with this method is the loss of the coroutine result. >> >> One way to deal with this issue is to connect the asyncio.Future returned by async (or ensure_future) to a concurrent.futures.Future. It is then possible to use a subclass of concurrent.futures.Executor to submit a callback to an asyncio event loop. Such an executor can also be used to set up communication between two event loops using run_in_executor. >> >> I posted an implementation called LoopExecutor on GitHub: >> https://github.com/vxgmichel/asyncio-loopexecutor >> The repo contains the loopexecutor module along with tests for several use cases. The README describes the whole thing (context, examples, issues, implementation). >> >> It is interesting to note that this executor is a bit different than ThreadPoolExecutor and ProcessPoolExecutor since it can also submit a coroutine function. Example: >> >> with LoopExecutor(loop) as executor: >> future = executor.submit(operator.add, 1, 2) >> assert future.result() == 3 >> future = executor.submit(asyncio.sleep, 0.1, result=3) >> assert future.result() == 3 >> >> This works in both cases because submit always cast the given function to a coroutine. That means it would also work with a function that returns a Future. >> >> Here's a few topic related to the current implementation that might be interesting to discuss: >> >> - possible drawback of casting the callback to a coroutine >> - possible drawback of concurrent.future.Future using asyncio.Future._copy_state >> - does LoopExecutor need to implement the shutdown method? >> - removing the limitation in run_in_executor (can't submit a coroutine function) >> - adding a generic Future connection function in asyncio >> - reimplementing wrap_future with the generic connection >> - adding LoopExecutor to asyncio (or concurrent.futures) >> >> At the moment, the interaction between asyncio and concurrent.futures only goes one way. It would be nice to have a standard solution (LoopExecutor or something else) to make it bidirectional. >> >> Thanks, >> >> Vincent >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > --Guido van Rossum (python.org/~guido) From guido at python.org Sun Sep 27 18:42:46 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Sep 2015 09:42:46 -0700 Subject: [Python-ideas] Submitting a job to an asyncio event loop In-Reply-To: References: Message-ID: OK, I think I understand your primary use case -- the C++ library calls callbacks in their own threads but you want the callback code to run in your event loop, where presumably it is structured as a coroutine and may use `yield from` or `await` to wait for other coroutines, tasks or futures. Then when that coroutine is done it returns a value which your machinery passes back as the result of a concurrent.futures.Future on which the callback thread is waiting. I don't think the use case involving multiple event loops in different threads is as clear. I am still waiting for someone who is actually trying to use this. It might be useful on a system where there is a system event loop that must be used for UI events (assuming this event loop can somehow be wrapped in a custom asyncio loop) and where an app might want to have a standard asyncio event loop for network I/O. Come to think of it, the ProactorEventLoop on Windows has both advantages and disadvantages, and some app might need to use both that and SelectorEventLoop. But this is a real pain (because you can't share any mutable state between event loops). On Sun, Sep 27, 2015 at 6:36 AM, Vincent Michel wrote: > Hi Guido, > > Thanks for your interest, > > I work for a synchrotron and we use the distributed control system > TANGO. The main implementation is in C++, but we use a python binding > called PyTango. The current server implementation (on the C++ side) > does not feature an event loop but instead create a different thread > for each client. > > TANGO: http://www.tango-controls.org/ > PyTango: > http://www.esrf.eu/computing/cs/tango/tango_doc/kernel_doc/pytango/latest/index.html > > I wanted to add asyncio support to the library, so that we can benefit > from single-threaded asynchronous programming. The problem is that > client callbacks run in different threads and there is not much we can > do about it until a pure python implementation is developed (and it's > a lot of work). Instead, it is possible to use an asyncio event loop, > run the server through run_in_executor (juste like you mentioned in > your mail), and redirect all the client callbacks to the event loop. > That's the part where job submission from a different thread comes in > handy. > > A very similar solution has been developed using gevent, but I like > explicit coroutines better :p > > Another use case is the communication between two event loops. From > what I've seen, the current context (get/set event loop) is only > related to the current thread. It makes it easy to run different event > loops in different threads. Even though I'm not sure what the use case > is, I suppose it's been done intentionally. Then the executor > interface is useful to run things like: > > executor = LoopExecutor(other_loop) > result = await my_loop.run_in_executor(executor, coro_func, *args) > > There is working example in the test directory: > > https://github.com/vxgmichel/asyncio-loopexecutor/blob/master/test/test_multi_loop.py > > *** > > The coroutine(fn) cast only makes sense if a subclass of Executor is > used, in order to be consistent with the Executor.submit signature. > Otherwise, passing an already-called coroutine is perfectly fine. I > think it is a good idea to define a simple submit function like you > recommended: > > def submit_to_loop(loop, coro): > future = concurrent.futures.Future() > callback = partial(schedule, coro, destination=future) > loop.call_soon_threadsafe(callback) > return future > > And then use the executor interface if we realize it is actually > useful. It's really not a lot of code anyway: > > class LoopExecutor(concurrent.futures.Executor): > > def __init__(self, loop=None): > self.loop = loop or asyncio.get_event_loop() > > def submit(self, fn, *args, **kwargs): > coro = asyncio.coroutine(fn)(*args, **kwargs) > return submit_to_loop(self.loop, coro) > > I'll update the repository. > > Cheers, > > Vincent > > 2015-09-27 4:52 GMT+02:00 Guido van Rossum : > > > > Hi Vincent, > > > > I've read your write-up with interest. You're right that it's a bit > awkward to make calls from the threaded world into the asyncio world. > Interestingly, there's much better support for passing work off from the > asyncio event loop to a thread (run_in_executor()). Perhaps that's because > the use case there was obvious from the start: some things that may block > for I/O just don't have an async interface yet, so in order to use them > from an asyncio task they must be off-loaded to a separate thread or else > the entire event loop is blocked. (This is used for calling getaddrinfo(), > for example.) > > > > I'm curious where you have encountered the opposite use case? > > > > I think if I had to do this myself I would go for a more minimalist > interface: something like your submit() method but without the call to > asyncio.coroutine(fn). Having the caller pass in the already-called > coroutine object might simplify the signature even further. I'm not sure I > see the advantage of trying to make this an executor -- but perhaps I'm > missing something? > > > > --Guido > > > > > > > > On Sat, Sep 26, 2015 at 7:29 AM, Vincent Michel > wrote: > >> > >> Hi, > >> > >> I noticed there is currently no standard solution to submit a job from > a thread to an asyncio event loop. > >> > >> Here's what the asyncio documentation says about concurrency and > multithreading: > >> > >> > To schedule a callback from a different thread, the > BaseEventLoop.call_soon_threadsafe() method should be used. > >> > Example to schedule a coroutine from a different thread: > >> > loop.call_soon_threadsafe(asyncio.async, coro_func()) > >> > >> The issue with this method is the loss of the coroutine result. > >> > >> One way to deal with this issue is to connect the asyncio.Future > returned by async (or ensure_future) to a concurrent.futures.Future. It is > then possible to use a subclass of concurrent.futures.Executor to submit a > callback to an asyncio event loop. Such an executor can also be used to set > up communication between two event loops using run_in_executor. > >> > >> I posted an implementation called LoopExecutor on GitHub: > >> https://github.com/vxgmichel/asyncio-loopexecutor > >> The repo contains the loopexecutor module along with tests for several > use cases. The README describes the whole thing (context, examples, issues, > implementation). > >> > >> It is interesting to note that this executor is a bit different than > ThreadPoolExecutor and ProcessPoolExecutor since it can also submit a > coroutine function. Example: > >> > >> with LoopExecutor(loop) as executor: > >> future = executor.submit(operator.add, 1, 2) > >> assert future.result() == 3 > >> future = executor.submit(asyncio.sleep, 0.1, result=3) > >> assert future.result() == 3 > >> > >> This works in both cases because submit always cast the given function > to a coroutine. That means it would also work with a function that returns > a Future. > >> > >> Here's a few topic related to the current implementation that might be > interesting to discuss: > >> > >> - possible drawback of casting the callback to a coroutine > >> - possible drawback of concurrent.future.Future using > asyncio.Future._copy_state > >> - does LoopExecutor need to implement the shutdown method? > >> - removing the limitation in run_in_executor (can't submit a coroutine > function) > >> - adding a generic Future connection function in asyncio > >> - reimplementing wrap_future with the generic connection > >> - adding LoopExecutor to asyncio (or concurrent.futures) > >> > >> At the moment, the interaction between asyncio and concurrent.futures > only goes one way. It would be nice to have a standard solution > (LoopExecutor or something else) to make it bidirectional. > >> > >> Thanks, > >> > >> Vincent > >> > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > > > > > -- > > --Guido van Rossum (python.org/~guido) > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From vxgmichel at gmail.com Sun Sep 27 22:29:05 2015 From: vxgmichel at gmail.com (Vincent Michel) Date: Sun, 27 Sep 2015 22:29:05 +0200 Subject: [Python-ideas] Submitting a job to an asyncio event loop In-Reply-To: References: Message-ID: Yes that's exactly it. No problem for the multiple event loops, it was a fun thing to play with. Then there's probably no reason to have a loop executor either. I think the important part is really the interface between asyncio futures and concurrent futures, since it is not trivial to write and maintain. In particular, getting exceptions and cancellation to work safely can be a bit tricky. 2015-09-27 18:42 GMT+02:00 Guido van Rossum : > OK, I think I understand your primary use case -- the C++ library calls > callbacks in their own threads but you want the callback code to run in your > event loop, where presumably it is structured as a coroutine and may use > `yield from` or `await` to wait for other coroutines, tasks or futures. Then > when that coroutine is done it returns a value which your machinery passes > back as the result of a concurrent.futures.Future on which the callback > thread is waiting. > > I don't think the use case involving multiple event loops in different > threads is as clear. I am still waiting for someone who is actually trying > to use this. It might be useful on a system where there is a system event > loop that must be used for UI events (assuming this event loop can somehow > be wrapped in a custom asyncio loop) and where an app might want to have a > standard asyncio event loop for network I/O. Come to think of it, the > ProactorEventLoop on Windows has both advantages and disadvantages, and some > app might need to use both that and SelectorEventLoop. But this is a real > pain (because you can't share any mutable state between event loops). > > > > On Sun, Sep 27, 2015 at 6:36 AM, Vincent Michel wrote: >> >> Hi Guido, >> >> Thanks for your interest, >> >> I work for a synchrotron and we use the distributed control system >> TANGO. The main implementation is in C++, but we use a python binding >> called PyTango. The current server implementation (on the C++ side) >> does not feature an event loop but instead create a different thread >> for each client. >> >> TANGO: http://www.tango-controls.org/ >> PyTango: >> http://www.esrf.eu/computing/cs/tango/tango_doc/kernel_doc/pytango/latest/index.html >> >> I wanted to add asyncio support to the library, so that we can benefit >> from single-threaded asynchronous programming. The problem is that >> client callbacks run in different threads and there is not much we can >> do about it until a pure python implementation is developed (and it's >> a lot of work). Instead, it is possible to use an asyncio event loop, >> run the server through run_in_executor (juste like you mentioned in >> your mail), and redirect all the client callbacks to the event loop. >> That's the part where job submission from a different thread comes in >> handy. >> >> A very similar solution has been developed using gevent, but I like >> explicit coroutines better :p >> >> Another use case is the communication between two event loops. From >> what I've seen, the current context (get/set event loop) is only >> related to the current thread. It makes it easy to run different event >> loops in different threads. Even though I'm not sure what the use case >> is, I suppose it's been done intentionally. Then the executor >> interface is useful to run things like: >> >> executor = LoopExecutor(other_loop) >> result = await my_loop.run_in_executor(executor, coro_func, *args) >> >> There is working example in the test directory: >> >> https://github.com/vxgmichel/asyncio-loopexecutor/blob/master/test/test_multi_loop.py >> >> *** >> >> The coroutine(fn) cast only makes sense if a subclass of Executor is >> used, in order to be consistent with the Executor.submit signature. >> Otherwise, passing an already-called coroutine is perfectly fine. I >> think it is a good idea to define a simple submit function like you >> recommended: >> >> def submit_to_loop(loop, coro): >> future = concurrent.futures.Future() >> callback = partial(schedule, coro, destination=future) >> loop.call_soon_threadsafe(callback) >> return future >> >> And then use the executor interface if we realize it is actually >> useful. It's really not a lot of code anyway: >> >> class LoopExecutor(concurrent.futures.Executor): >> >> def __init__(self, loop=None): >> self.loop = loop or asyncio.get_event_loop() >> >> def submit(self, fn, *args, **kwargs): >> coro = asyncio.coroutine(fn)(*args, **kwargs) >> return submit_to_loop(self.loop, coro) >> >> I'll update the repository. >> >> Cheers, >> >> Vincent >> >> 2015-09-27 4:52 GMT+02:00 Guido van Rossum : >> > >> > Hi Vincent, >> > >> > I've read your write-up with interest. You're right that it's a bit >> > awkward to make calls from the threaded world into the asyncio world. >> > Interestingly, there's much better support for passing work off from the >> > asyncio event loop to a thread (run_in_executor()). Perhaps that's because >> > the use case there was obvious from the start: some things that may block >> > for I/O just don't have an async interface yet, so in order to use them from >> > an asyncio task they must be off-loaded to a separate thread or else the >> > entire event loop is blocked. (This is used for calling getaddrinfo(), for >> > example.) >> > >> > I'm curious where you have encountered the opposite use case? >> > >> > I think if I had to do this myself I would go for a more minimalist >> > interface: something like your submit() method but without the call to >> > asyncio.coroutine(fn). Having the caller pass in the already-called >> > coroutine object might simplify the signature even further. I'm not sure I >> > see the advantage of trying to make this an executor -- but perhaps I'm >> > missing something? >> > >> > --Guido >> > >> > >> > >> > On Sat, Sep 26, 2015 at 7:29 AM, Vincent Michel >> > wrote: >> >> >> >> Hi, >> >> >> >> I noticed there is currently no standard solution to submit a job from >> >> a thread to an asyncio event loop. >> >> >> >> Here's what the asyncio documentation says about concurrency and >> >> multithreading: >> >> >> >> > To schedule a callback from a different thread, the >> >> > BaseEventLoop.call_soon_threadsafe() method should be used. >> >> > Example to schedule a coroutine from a different thread: >> >> > loop.call_soon_threadsafe(asyncio.async, coro_func()) >> >> >> >> The issue with this method is the loss of the coroutine result. >> >> >> >> One way to deal with this issue is to connect the asyncio.Future >> >> returned by async (or ensure_future) to a concurrent.futures.Future. It is >> >> then possible to use a subclass of concurrent.futures.Executor to submit a >> >> callback to an asyncio event loop. Such an executor can also be used to set >> >> up communication between two event loops using run_in_executor. >> >> >> >> I posted an implementation called LoopExecutor on GitHub: >> >> https://github.com/vxgmichel/asyncio-loopexecutor >> >> The repo contains the loopexecutor module along with tests for several >> >> use cases. The README describes the whole thing (context, examples, issues, >> >> implementation). >> >> >> >> It is interesting to note that this executor is a bit different than >> >> ThreadPoolExecutor and ProcessPoolExecutor since it can also submit a >> >> coroutine function. Example: >> >> >> >> with LoopExecutor(loop) as executor: >> >> future = executor.submit(operator.add, 1, 2) >> >> assert future.result() == 3 >> >> future = executor.submit(asyncio.sleep, 0.1, result=3) >> >> assert future.result() == 3 >> >> >> >> This works in both cases because submit always cast the given function >> >> to a coroutine. That means it would also work with a function that returns a >> >> Future. >> >> >> >> Here's a few topic related to the current implementation that might be >> >> interesting to discuss: >> >> >> >> - possible drawback of casting the callback to a coroutine >> >> - possible drawback of concurrent.future.Future using >> >> asyncio.Future._copy_state >> >> - does LoopExecutor need to implement the shutdown method? >> >> - removing the limitation in run_in_executor (can't submit a coroutine >> >> function) >> >> - adding a generic Future connection function in asyncio >> >> - reimplementing wrap_future with the generic connection >> >> - adding LoopExecutor to asyncio (or concurrent.futures) >> >> >> >> At the moment, the interaction between asyncio and concurrent.futures >> >> only goes one way. It would be nice to have a standard solution >> >> (LoopExecutor or something else) to make it bidirectional. >> >> >> >> Thanks, >> >> >> >> Vincent >> >> >> >> >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> https://mail.python.org/mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > >> > >> > >> > >> > -- >> > --Guido van Rossum (python.org/~guido) > > > > > -- > --Guido van Rossum (python.org/~guido) From guido at python.org Sun Sep 27 22:39:12 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Sep 2015 13:39:12 -0700 Subject: [Python-ideas] Submitting a job to an asyncio event loop In-Reply-To: References: Message-ID: Do you want to propose a minimal patch to asyncio? A PR for https://github.com/python/asyncio would be the best thing to do. I'd leave the LoopExecutor out of it for now. The code could probably live at the bottom of futures.py. On Sun, Sep 27, 2015 at 1:29 PM, Vincent Michel wrote: > Yes that's exactly it. No problem for the multiple event loops, it was > a fun thing to play with. Then there's probably no reason to have a > loop executor either. > > I think the important part is really the interface between asyncio > futures and concurrent futures, since it is not trivial to write and > maintain. In particular, getting exceptions and cancellation to work > safely can be a bit tricky. > > > 2015-09-27 18:42 GMT+02:00 Guido van Rossum : > > OK, I think I understand your primary use case -- the C++ library calls > > callbacks in their own threads but you want the callback code to run in > your > > event loop, where presumably it is structured as a coroutine and may use > > `yield from` or `await` to wait for other coroutines, tasks or futures. > Then > > when that coroutine is done it returns a value which your machinery > passes > > back as the result of a concurrent.futures.Future on which the callback > > thread is waiting. > > > > I don't think the use case involving multiple event loops in different > > threads is as clear. I am still waiting for someone who is actually > trying > > to use this. It might be useful on a system where there is a system event > > loop that must be used for UI events (assuming this event loop can > somehow > > be wrapped in a custom asyncio loop) and where an app might want to have > a > > standard asyncio event loop for network I/O. Come to think of it, the > > ProactorEventLoop on Windows has both advantages and disadvantages, and > some > > app might need to use both that and SelectorEventLoop. But this is a real > > pain (because you can't share any mutable state between event loops). > > > > > > > > On Sun, Sep 27, 2015 at 6:36 AM, Vincent Michel > wrote: > >> > >> Hi Guido, > >> > >> Thanks for your interest, > >> > >> I work for a synchrotron and we use the distributed control system > >> TANGO. The main implementation is in C++, but we use a python binding > >> called PyTango. The current server implementation (on the C++ side) > >> does not feature an event loop but instead create a different thread > >> for each client. > >> > >> TANGO: http://www.tango-controls.org/ > >> PyTango: > >> > http://www.esrf.eu/computing/cs/tango/tango_doc/kernel_doc/pytango/latest/index.html > >> > >> I wanted to add asyncio support to the library, so that we can benefit > >> from single-threaded asynchronous programming. The problem is that > >> client callbacks run in different threads and there is not much we can > >> do about it until a pure python implementation is developed (and it's > >> a lot of work). Instead, it is possible to use an asyncio event loop, > >> run the server through run_in_executor (juste like you mentioned in > >> your mail), and redirect all the client callbacks to the event loop. > >> That's the part where job submission from a different thread comes in > >> handy. > >> > >> A very similar solution has been developed using gevent, but I like > >> explicit coroutines better :p > >> > >> Another use case is the communication between two event loops. From > >> what I've seen, the current context (get/set event loop) is only > >> related to the current thread. It makes it easy to run different event > >> loops in different threads. Even though I'm not sure what the use case > >> is, I suppose it's been done intentionally. Then the executor > >> interface is useful to run things like: > >> > >> executor = LoopExecutor(other_loop) > >> result = await my_loop.run_in_executor(executor, coro_func, *args) > >> > >> There is working example in the test directory: > >> > >> > https://github.com/vxgmichel/asyncio-loopexecutor/blob/master/test/test_multi_loop.py > >> > >> *** > >> > >> The coroutine(fn) cast only makes sense if a subclass of Executor is > >> used, in order to be consistent with the Executor.submit signature. > >> Otherwise, passing an already-called coroutine is perfectly fine. I > >> think it is a good idea to define a simple submit function like you > >> recommended: > >> > >> def submit_to_loop(loop, coro): > >> future = concurrent.futures.Future() > >> callback = partial(schedule, coro, destination=future) > >> loop.call_soon_threadsafe(callback) > >> return future > >> > >> And then use the executor interface if we realize it is actually > >> useful. It's really not a lot of code anyway: > >> > >> class LoopExecutor(concurrent.futures.Executor): > >> > >> def __init__(self, loop=None): > >> self.loop = loop or asyncio.get_event_loop() > >> > >> def submit(self, fn, *args, **kwargs): > >> coro = asyncio.coroutine(fn)(*args, **kwargs) > >> return submit_to_loop(self.loop, coro) > >> > >> I'll update the repository. > >> > >> Cheers, > >> > >> Vincent > >> > >> 2015-09-27 4:52 GMT+02:00 Guido van Rossum : > >> > > >> > Hi Vincent, > >> > > >> > I've read your write-up with interest. You're right that it's a bit > >> > awkward to make calls from the threaded world into the asyncio world. > >> > Interestingly, there's much better support for passing work off from > the > >> > asyncio event loop to a thread (run_in_executor()). Perhaps that's > because > >> > the use case there was obvious from the start: some things that may > block > >> > for I/O just don't have an async interface yet, so in order to use > them from > >> > an asyncio task they must be off-loaded to a separate thread or else > the > >> > entire event loop is blocked. (This is used for calling > getaddrinfo(), for > >> > example.) > >> > > >> > I'm curious where you have encountered the opposite use case? > >> > > >> > I think if I had to do this myself I would go for a more minimalist > >> > interface: something like your submit() method but without the call to > >> > asyncio.coroutine(fn). Having the caller pass in the already-called > >> > coroutine object might simplify the signature even further. I'm not > sure I > >> > see the advantage of trying to make this an executor -- but perhaps > I'm > >> > missing something? > >> > > >> > --Guido > >> > > >> > > >> > > >> > On Sat, Sep 26, 2015 at 7:29 AM, Vincent Michel > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> I noticed there is currently no standard solution to submit a job > from > >> >> a thread to an asyncio event loop. > >> >> > >> >> Here's what the asyncio documentation says about concurrency and > >> >> multithreading: > >> >> > >> >> > To schedule a callback from a different thread, the > >> >> > BaseEventLoop.call_soon_threadsafe() method should be used. > >> >> > Example to schedule a coroutine from a different thread: > >> >> > loop.call_soon_threadsafe(asyncio.async, coro_func()) > >> >> > >> >> The issue with this method is the loss of the coroutine result. > >> >> > >> >> One way to deal with this issue is to connect the asyncio.Future > >> >> returned by async (or ensure_future) to a concurrent.futures.Future. > It is > >> >> then possible to use a subclass of concurrent.futures.Executor to > submit a > >> >> callback to an asyncio event loop. Such an executor can also be used > to set > >> >> up communication between two event loops using run_in_executor. > >> >> > >> >> I posted an implementation called LoopExecutor on GitHub: > >> >> https://github.com/vxgmichel/asyncio-loopexecutor > >> >> The repo contains the loopexecutor module along with tests for > several > >> >> use cases. The README describes the whole thing (context, examples, > issues, > >> >> implementation). > >> >> > >> >> It is interesting to note that this executor is a bit different than > >> >> ThreadPoolExecutor and ProcessPoolExecutor since it can also submit a > >> >> coroutine function. Example: > >> >> > >> >> with LoopExecutor(loop) as executor: > >> >> future = executor.submit(operator.add, 1, 2) > >> >> assert future.result() == 3 > >> >> future = executor.submit(asyncio.sleep, 0.1, result=3) > >> >> assert future.result() == 3 > >> >> > >> >> This works in both cases because submit always cast the given > function > >> >> to a coroutine. That means it would also work with a function that > returns a > >> >> Future. > >> >> > >> >> Here's a few topic related to the current implementation that might > be > >> >> interesting to discuss: > >> >> > >> >> - possible drawback of casting the callback to a coroutine > >> >> - possible drawback of concurrent.future.Future using > >> >> asyncio.Future._copy_state > >> >> - does LoopExecutor need to implement the shutdown method? > >> >> - removing the limitation in run_in_executor (can't submit a > coroutine > >> >> function) > >> >> - adding a generic Future connection function in asyncio > >> >> - reimplementing wrap_future with the generic connection > >> >> - adding LoopExecutor to asyncio (or concurrent.futures) > >> >> > >> >> At the moment, the interaction between asyncio and concurrent.futures > >> >> only goes one way. It would be nice to have a standard solution > >> >> (LoopExecutor or something else) to make it bidirectional. > >> >> > >> >> Thanks, > >> >> > >> >> Vincent > >> >> > >> >> > >> >> _______________________________________________ > >> >> Python-ideas mailing list > >> >> Python-ideas at python.org > >> >> https://mail.python.org/mailman/listinfo/python-ideas > >> >> Code of Conduct: http://python.org/psf/codeofconduct/ > >> > > >> > > >> > > >> > > >> > -- > >> > --Guido van Rossum (python.org/~guido) > > > > > > > > > > -- > > --Guido van Rossum (python.org/~guido) > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From vxgmichel at gmail.com Sun Sep 27 23:16:42 2015 From: vxgmichel at gmail.com (Vincent Michel) Date: Sun, 27 Sep 2015 23:16:42 +0200 Subject: [Python-ideas] Submitting a job to an asyncio event loop In-Reply-To: References: Message-ID: Great, I'll do that! 2015-09-27 22:39 GMT+02:00 Guido van Rossum : > Do you want to propose a minimal patch to asyncio? A PR for > https://github.com/python/asyncio would be the best thing to do. I'd leave > the LoopExecutor out of it for now. The code could probably live at the > bottom of futures.py. > > On Sun, Sep 27, 2015 at 1:29 PM, Vincent Michel wrote: >> >> Yes that's exactly it. No problem for the multiple event loops, it was >> a fun thing to play with. Then there's probably no reason to have a >> loop executor either. >> >> I think the important part is really the interface between asyncio >> futures and concurrent futures, since it is not trivial to write and >> maintain. In particular, getting exceptions and cancellation to work >> safely can be a bit tricky. >> >> >> 2015-09-27 18:42 GMT+02:00 Guido van Rossum : >> > OK, I think I understand your primary use case -- the C++ library calls >> > callbacks in their own threads but you want the callback code to run in >> > your >> > event loop, where presumably it is structured as a coroutine and may use >> > `yield from` or `await` to wait for other coroutines, tasks or futures. >> > Then >> > when that coroutine is done it returns a value which your machinery >> > passes >> > back as the result of a concurrent.futures.Future on which the callback >> > thread is waiting. >> > >> > I don't think the use case involving multiple event loops in different >> > threads is as clear. I am still waiting for someone who is actually >> > trying >> > to use this. It might be useful on a system where there is a system >> > event >> > loop that must be used for UI events (assuming this event loop can >> > somehow >> > be wrapped in a custom asyncio loop) and where an app might want to have >> > a >> > standard asyncio event loop for network I/O. Come to think of it, the >> > ProactorEventLoop on Windows has both advantages and disadvantages, and >> > some >> > app might need to use both that and SelectorEventLoop. But this is a >> > real >> > pain (because you can't share any mutable state between event loops). >> > >> > >> > >> > On Sun, Sep 27, 2015 at 6:36 AM, Vincent Michel >> > wrote: >> >> >> >> Hi Guido, >> >> >> >> Thanks for your interest, >> >> >> >> I work for a synchrotron and we use the distributed control system >> >> TANGO. The main implementation is in C++, but we use a python binding >> >> called PyTango. The current server implementation (on the C++ side) >> >> does not feature an event loop but instead create a different thread >> >> for each client. >> >> >> >> TANGO: http://www.tango-controls.org/ >> >> PyTango: >> >> >> >> http://www.esrf.eu/computing/cs/tango/tango_doc/kernel_doc/pytango/latest/index.html >> >> >> >> I wanted to add asyncio support to the library, so that we can benefit >> >> from single-threaded asynchronous programming. The problem is that >> >> client callbacks run in different threads and there is not much we can >> >> do about it until a pure python implementation is developed (and it's >> >> a lot of work). Instead, it is possible to use an asyncio event loop, >> >> run the server through run_in_executor (juste like you mentioned in >> >> your mail), and redirect all the client callbacks to the event loop. >> >> That's the part where job submission from a different thread comes in >> >> handy. >> >> >> >> A very similar solution has been developed using gevent, but I like >> >> explicit coroutines better :p >> >> >> >> Another use case is the communication between two event loops. From >> >> what I've seen, the current context (get/set event loop) is only >> >> related to the current thread. It makes it easy to run different event >> >> loops in different threads. Even though I'm not sure what the use case >> >> is, I suppose it's been done intentionally. Then the executor >> >> interface is useful to run things like: >> >> >> >> executor = LoopExecutor(other_loop) >> >> result = await my_loop.run_in_executor(executor, coro_func, *args) >> >> >> >> There is working example in the test directory: >> >> >> >> >> >> https://github.com/vxgmichel/asyncio-loopexecutor/blob/master/test/test_multi_loop.py >> >> >> >> *** >> >> >> >> The coroutine(fn) cast only makes sense if a subclass of Executor is >> >> used, in order to be consistent with the Executor.submit signature. >> >> Otherwise, passing an already-called coroutine is perfectly fine. I >> >> think it is a good idea to define a simple submit function like you >> >> recommended: >> >> >> >> def submit_to_loop(loop, coro): >> >> future = concurrent.futures.Future() >> >> callback = partial(schedule, coro, destination=future) >> >> loop.call_soon_threadsafe(callback) >> >> return future >> >> >> >> And then use the executor interface if we realize it is actually >> >> useful. It's really not a lot of code anyway: >> >> >> >> class LoopExecutor(concurrent.futures.Executor): >> >> >> >> def __init__(self, loop=None): >> >> self.loop = loop or asyncio.get_event_loop() >> >> >> >> def submit(self, fn, *args, **kwargs): >> >> coro = asyncio.coroutine(fn)(*args, **kwargs) >> >> return submit_to_loop(self.loop, coro) >> >> >> >> I'll update the repository. >> >> >> >> Cheers, >> >> >> >> Vincent >> >> >> >> 2015-09-27 4:52 GMT+02:00 Guido van Rossum : >> >> > >> >> > Hi Vincent, >> >> > >> >> > I've read your write-up with interest. You're right that it's a bit >> >> > awkward to make calls from the threaded world into the asyncio world. >> >> > Interestingly, there's much better support for passing work off from >> >> > the >> >> > asyncio event loop to a thread (run_in_executor()). Perhaps that's >> >> > because >> >> > the use case there was obvious from the start: some things that may >> >> > block >> >> > for I/O just don't have an async interface yet, so in order to use >> >> > them from >> >> > an asyncio task they must be off-loaded to a separate thread or else >> >> > the >> >> > entire event loop is blocked. (This is used for calling >> >> > getaddrinfo(), for >> >> > example.) >> >> > >> >> > I'm curious where you have encountered the opposite use case? >> >> > >> >> > I think if I had to do this myself I would go for a more minimalist >> >> > interface: something like your submit() method but without the call >> >> > to >> >> > asyncio.coroutine(fn). Having the caller pass in the already-called >> >> > coroutine object might simplify the signature even further. I'm not >> >> > sure I >> >> > see the advantage of trying to make this an executor -- but perhaps >> >> > I'm >> >> > missing something? >> >> > >> >> > --Guido >> >> > >> >> > >> >> > >> >> > On Sat, Sep 26, 2015 at 7:29 AM, Vincent Michel >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> I noticed there is currently no standard solution to submit a job >> >> >> from >> >> >> a thread to an asyncio event loop. >> >> >> >> >> >> Here's what the asyncio documentation says about concurrency and >> >> >> multithreading: >> >> >> >> >> >> > To schedule a callback from a different thread, the >> >> >> > BaseEventLoop.call_soon_threadsafe() method should be used. >> >> >> > Example to schedule a coroutine from a different thread: >> >> >> > loop.call_soon_threadsafe(asyncio.async, coro_func()) >> >> >> >> >> >> The issue with this method is the loss of the coroutine result. >> >> >> >> >> >> One way to deal with this issue is to connect the asyncio.Future >> >> >> returned by async (or ensure_future) to a concurrent.futures.Future. >> >> >> It is >> >> >> then possible to use a subclass of concurrent.futures.Executor to >> >> >> submit a >> >> >> callback to an asyncio event loop. Such an executor can also be used >> >> >> to set >> >> >> up communication between two event loops using run_in_executor. >> >> >> >> >> >> I posted an implementation called LoopExecutor on GitHub: >> >> >> https://github.com/vxgmichel/asyncio-loopexecutor >> >> >> The repo contains the loopexecutor module along with tests for >> >> >> several >> >> >> use cases. The README describes the whole thing (context, examples, >> >> >> issues, >> >> >> implementation). >> >> >> >> >> >> It is interesting to note that this executor is a bit different than >> >> >> ThreadPoolExecutor and ProcessPoolExecutor since it can also submit >> >> >> a >> >> >> coroutine function. Example: >> >> >> >> >> >> with LoopExecutor(loop) as executor: >> >> >> future = executor.submit(operator.add, 1, 2) >> >> >> assert future.result() == 3 >> >> >> future = executor.submit(asyncio.sleep, 0.1, result=3) >> >> >> assert future.result() == 3 >> >> >> >> >> >> This works in both cases because submit always cast the given >> >> >> function >> >> >> to a coroutine. That means it would also work with a function that >> >> >> returns a >> >> >> Future. >> >> >> >> >> >> Here's a few topic related to the current implementation that might >> >> >> be >> >> >> interesting to discuss: >> >> >> >> >> >> - possible drawback of casting the callback to a coroutine >> >> >> - possible drawback of concurrent.future.Future using >> >> >> asyncio.Future._copy_state >> >> >> - does LoopExecutor need to implement the shutdown method? >> >> >> - removing the limitation in run_in_executor (can't submit a >> >> >> coroutine >> >> >> function) >> >> >> - adding a generic Future connection function in asyncio >> >> >> - reimplementing wrap_future with the generic connection >> >> >> - adding LoopExecutor to asyncio (or concurrent.futures) >> >> >> >> >> >> At the moment, the interaction between asyncio and >> >> >> concurrent.futures >> >> >> only goes one way. It would be nice to have a standard solution >> >> >> (LoopExecutor or something else) to make it bidirectional. >> >> >> >> >> >> Thanks, >> >> >> >> >> >> Vincent >> >> >> >> >> >> >> >> >> _______________________________________________ >> >> >> Python-ideas mailing list >> >> >> Python-ideas at python.org >> >> >> https://mail.python.org/mailman/listinfo/python-ideas >> >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > --Guido van Rossum (python.org/~guido) >> > >> > >> > >> > >> > -- >> > --Guido van Rossum (python.org/~guido) > > > > > -- > --Guido van Rossum (python.org/~guido) From eric at trueblade.com Mon Sep 28 03:23:30 2015 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 27 Sep 2015 21:23:30 -0400 Subject: [Python-ideas] Binary f-strings Message-ID: <56089692.1080303@trueblade.com> Now that f-strings are in the 3.6 branch, I'd like to turn my attention to binary f-strings (fb'' or bf''). The idea is that: >>> bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n' Might be translated as: >>> (b'datestamp:' + ... bytes(format(datetime.datetime.now(), ... str(b'%Y%m%d', 'ascii')), ... 'ascii') + ... b'\r\n') Which would result in: b'datestamp:20150927\r\n' The only real question is: what encoding to use for the second parameter to bytes()? Since an object must return unicode from __format__(), I need to convert that to bytes in order to join everything together. But how? Here I suggest 'ascii'. Unfortunately, this would give an error if __format__ returned anything with a char greater than 127. I think we've learned that an API that only raises an exception with certain specific inputs is fragile. Guido has suggested using 'utf-8' as the encoding. That has some appeal, but if we're designing this for wire protocols, not all protocols will be using utf-8. Another idea would be to extend the "conversion char" from just 's', 'r', or 'a', which don't make much sense for bytes, to instead be a string that specifies the encoding. The default could be ascii, and if you want to specify something else: bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n' That would work for any encoding that doesn't have ':', '{', or '}' in the encoding name. Which seems like a reasonable restriction. And I might be over-generalizing here, but you'd presumably want to make the encoding a non-constant: bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n' I think my initial proposal will be to use 'ascii', and not support any conversion characters at all for fb-strings, not even 's', 'r', and 'a'. In the future, if we want to support encodings other than 'ascii', we could then add !conversions mapping to encodings. My reasoning for using 'ascii' is that 'utf-8' could easily be an error for non-utf-8 protocols. And by using 'ascii', at least we'd give a runtime error and not put possibly bogus data into the resulting binary string. Granted, the tradeoff is that we now have a case where whether or not the code raises an exception is dependent upon the values being formatted. If 'ascii' is the default, we could later switch to 'utf-8', but we couldn't go the other way. The only place this is likely to be a problem is when formatting unicode string values. No other built-in type is going to have a non-ascii compatible character in its __format__, unless you do tricky things with datetime format_specs. Of course user-defined types can return any unicode chars from __format__. Once we make a decision, I can apply the same logic to b''.format(), if that's desirable. I'm open to suggestions on this. Thanks for reading. -- Eric. From steve at pearwood.info Mon Sep 28 04:09:58 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 28 Sep 2015 12:09:58 +1000 Subject: [Python-ideas] Binary f-strings In-Reply-To: <56089692.1080303@trueblade.com> References: <56089692.1080303@trueblade.com> Message-ID: <20150928020957.GK23642@ando.pearwood.info> On Sun, Sep 27, 2015 at 09:23:30PM -0400, Eric V. Smith wrote: > Now that f-strings are in the 3.6 branch, I'd like to turn my attention > to binary f-strings (fb'' or bf''). > > The idea is that: > > >>> bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n' > > Might be translated as: > > >>> (b'datestamp:' + > ... bytes(format(datetime.datetime.now(), > ... str(b'%Y%m%d', 'ascii')), > ... 'ascii') + > ... b'\r\n') What's wrong with this? f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'.encode('ascii') This eliminates all your questions about which encoding we should guess is more useful (ascii? utf-8? something else?), allows the caller to set an error handler without inventing yet more cryptic format codes, and is nicely explicit. If people are worried about the length of ".encode(...)", a helper function works great: def b(s): return bytes(s, 'utf-8') # or whatever encoding makes sense for them b(f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n') > Which would result in: > b'datestamp:20150927\r\n' > > The only real question is: what encoding to use for the second parameter > to bytes()? Since an object must return unicode from __format__(), I > need to convert that to bytes in order to join everything together. But how? > > Here I suggest 'ascii'. Unfortunately, this would give an error if > __format__ returned anything with a char greater than 127. I think we've > learned that an API that only raises an exception with certain specific > inputs is fragile. > > Guido has suggested using 'utf-8' as the encoding. That has some appeal, > but if we're designing this for wire protocols, not all protocols will > be using utf-8. Using UTF-8 is not sufficient, since there are strings that can't be encoded into UTF-8 because they contain surrogates: py> '\uDA11'.encode('utf-8') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'utf-8' codec can't encode character '\uda11' in position 0: surrogates not allowed but we surely don't want to suppress such errors by default. Sometimes they will be an error that needs fixing. -- Steve From njs at pobox.com Mon Sep 28 04:41:50 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 27 Sep 2015 19:41:50 -0700 Subject: [Python-ideas] Binary f-strings In-Reply-To: <56089692.1080303@trueblade.com> References: <56089692.1080303@trueblade.com> Message-ID: Naively, I'd expect that since f-strings and .format share the same infrastructure, fb-strings should work the same way as bytes.format -- and in particular, either both should be supported or neither. Since bytes.format apparently got rejected during the PEP 460/PEP 461 discussions: https://bugs.python.org/issue3982#msg224023 I guess you'd need to dig up those earlier discussions and see what the issues were? -n On Sun, Sep 27, 2015 at 6:23 PM, Eric V. Smith wrote: > Now that f-strings are in the 3.6 branch, I'd like to turn my attention > to binary f-strings (fb'' or bf''). > > The idea is that: > >>>> bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n' > > Might be translated as: > >>>> (b'datestamp:' + > ... bytes(format(datetime.datetime.now(), > ... str(b'%Y%m%d', 'ascii')), > ... 'ascii') + > ... b'\r\n') > > > Which would result in: > b'datestamp:20150927\r\n' > > The only real question is: what encoding to use for the second parameter > to bytes()? Since an object must return unicode from __format__(), I > need to convert that to bytes in order to join everything together. But how? > > Here I suggest 'ascii'. Unfortunately, this would give an error if > __format__ returned anything with a char greater than 127. I think we've > learned that an API that only raises an exception with certain specific > inputs is fragile. > > Guido has suggested using 'utf-8' as the encoding. That has some appeal, > but if we're designing this for wire protocols, not all protocols will > be using utf-8. > > Another idea would be to extend the "conversion char" from just 's', > 'r', or 'a', which don't make much sense for bytes, to instead be a > string that specifies the encoding. The default could be ascii, and if > you want to specify something else: > bf'datestamp:{datetime.datetime.now()!utf-8:%Y%m%d}\r\n' > > That would work for any encoding that doesn't have ':', '{', or '}' in > the encoding name. Which seems like a reasonable restriction. > > And I might be over-generalizing here, but you'd presumably want to make > the encoding a non-constant: > bf'datestamp:{datetime.datetime.now()!{encoding}:%Y%m%d}\r\n' > > I think my initial proposal will be to use 'ascii', and not support any > conversion characters at all for fb-strings, not even 's', 'r', and 'a'. > In the future, if we want to support encodings other than 'ascii', we > could then add !conversions mapping to encodings. > > My reasoning for using 'ascii' is that 'utf-8' could easily be an error > for non-utf-8 protocols. And by using 'ascii', at least we'd give a > runtime error and not put possibly bogus data into the resulting binary > string. Granted, the tradeoff is that we now have a case where whether > or not the code raises an exception is dependent upon the values being > formatted. If 'ascii' is the default, we could later switch to 'utf-8', > but we couldn't go the other way. > > The only place this is likely to be a problem is when formatting unicode > string values. No other built-in type is going to have a non-ascii > compatible character in its __format__, unless you do tricky things with > datetime format_specs. Of course user-defined types can return any > unicode chars from __format__. > > Once we make a decision, I can apply the same logic to b''.format(), if > that's desirable. > > I'm open to suggestions on this. > > Thanks for reading. > > -- > Eric. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Nathaniel J. Smith -- http://vorpus.org From rosuav at gmail.com Mon Sep 28 05:03:32 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 28 Sep 2015 13:03:32 +1000 Subject: [Python-ideas] Binary f-strings In-Reply-To: References: <56089692.1080303@trueblade.com> Message-ID: On Mon, Sep 28, 2015 at 12:41 PM, Nathaniel Smith wrote: > Naively, I'd expect that since f-strings and .format share the same > infrastructure, fb-strings should work the same way as bytes.format -- > and in particular, either both should be supported or neither. Since > bytes.format apparently got rejected during the PEP 460/PEP 461 > discussions: > https://bugs.python.org/issue3982#msg224023 > I guess you'd need to dig up those earlier discussions and see what > the issues were? The biggest issues are summarized into PEP 461: https://www.python.org/dev/peps/pep-0461/#proposed-variations Since the __format__ machinery is all based around text strings, there'll need to be some (explicit or implicit) encode step. Hence this thread. How bad would it be to simply say "there are no bf strings"? As Steven says, you can simply use a normal f''.encode() operation, with no confusion. Otherwise, there'll be these "format-like" operations that can do things that format() can't do... and then there'd be edge cases, too, like a string with a b-prefix that contains non-ASCII characters in it: >>> ?????? = 1961 >>> apollo = 1969 >>> print(f"It took {apollo-??????} years to get from orbit to the moon.") It took 8 years to get from orbit to the moon. >>> print(b"It took {apollo-??????} years to get from orbit to the moon.") File "", line 1 SyntaxError: bytes can only contain ASCII literal characters. If that were a binary f-string, those Cyrillic characters should still be legal (as they define an identifier, rather than ending up in the code). Would it confuse (a) humans, or (b) tools, to have these "texty bits" inside a byte string? In any case, bf strings can be added later, but once they're added, their semantics would be locked in. I'd be inclined to leave them out for 3.6 and see what people say. A bit of real-world usage of f-strings might show a clear front-runner in terms of expectations (UTF-8, ASCII, or something else). ChrisA From steve at pearwood.info Mon Sep 28 05:28:45 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 28 Sep 2015 13:28:45 +1000 Subject: [Python-ideas] Binary f-strings In-Reply-To: References: <56089692.1080303@trueblade.com> Message-ID: <20150928032845.GM23642@ando.pearwood.info> On Mon, Sep 28, 2015 at 01:03:32PM +1000, Chris Angelico wrote: [...] > >>> ?????? = 1961 > >>> apollo = 1969 > >>> print(f"It took {apollo-??????} years to get from orbit to the moon.") > It took 8 years to get from orbit to the moon. > >>> print(b"It took {apollo-??????} years to get from orbit to the moon.") > File "", line 1 > SyntaxError: bytes can only contain ASCII literal characters. > > If that were a binary f-string, those Cyrillic characters should still > be legal (as they define an identifier, rather than ending up in the > code). Would it confuse (a) humans, or (b) tools, to have these "texty > bits" inside a byte string? It would confuse the heck out of me. I leave it to the reader to decide whether I am a human or a tool. -- Steve From abarnert at yahoo.com Mon Sep 28 05:48:02 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 27 Sep 2015 20:48:02 -0700 Subject: [Python-ideas] Binary f-strings In-Reply-To: <56089692.1080303@trueblade.com> References: <56089692.1080303@trueblade.com> Message-ID: On Sep 27, 2015, at 18:23, Eric V. Smith wrote: > > The only place this is likely to be a problem is when formatting unicode > string values. No other built-in type is going to have a non-ascii > compatible character in its __format__, unless you do tricky things with > datetime format_specs. Of course user-defined types can return any > unicode chars from __format__. The fact that it can't handle bytes and bytes-like types makes this much less useful than %. Beyond that, the fact that it only works reliably for the same types as %, minus bytes, plus a few others including datetime means the benefit isn't nearly as large as for f-strings and str.format, which work reliably for every type in the world, and extensibly so for many types. And meanwhile, the cost is much higher, from code that seems to work if you don't test it well to even higher performance costs (and usually in code that needs performance more). Of course you could create a __bformat__(*args, encoding, errors, **kw) protocol (where object.__bformat__ just returns self.__format__(*args, **kw).encode(encoding, errors)), which has the same effect as your proposal except that types that need to know they're being bytes-formatted to do something reasonable, or that just want to know so they can optimize, can do so. And this of course lets you add __bformat__ to bytes, etc.--although it doesn't seem to help for types that support the buffer protocol, so it's still not as good as %b. But I don't think anyone will want that. From ncoghlan at gmail.com Mon Sep 28 09:13:06 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 28 Sep 2015 17:13:06 +1000 Subject: [Python-ideas] Using `or?` as the null coalescing operator In-Reply-To: <6C2E5579-42A0-423F-AB8C-01B49FA59D67@gmail.com> References: <6C2E5579-42A0-423F-AB8C-01B49FA59D67@gmail.com> Message-ID: On 25 September 2015 at 09:07, Alessio Bogon wrote: > I really like PEP 0505. The only thing that does not convince me is the `??` operator. I would like to know what you think of an alternative like `or?`: > > a_list = some_list or? [] > a_dict = some_dict or? {} > > The rationale behind is to let `or` do its job with ?truthy? values, while `or?` would require non-None values. > The rest of the PEP looks good to me. > > I apologise in advance if this was already proposed and I missed it. It hasn't been suggested that I recall, and yes, I also prefer it to the doubled ?? spelling. One concrete advantage is that it helps convey that this is a short-circuiting control flow operator like 'and' and 'or' rather than a normal binary operator that always evaluates both operands. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From k7hoven at gmail.com Mon Sep 28 12:50:15 2015 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 28 Sep 2015 13:50:15 +0300 Subject: [Python-ideas] Using `or?` as the null coalescing operator In-Reply-To: References: <6C2E5579-42A0-423F-AB8C-01B49FA59D67@gmail.com> Message-ID: On Mon, Sep 28, 2015 at 10:13 AM, Nick Coghlan wrote: > On 25 September 2015 at 09:07, Alessio Bogon wrote: >> I really like PEP 0505. The only thing that does not convince me is the `??` operator. I would like to know what you think of an alternative like `or?`: >> >> a_list = some_list or? [] >> a_dict = some_dict or? {} >> And have the following syntax options been considered? a_list = some_list else [] a_list = some_list or [] if None -- Koos From toddrjen at gmail.com Mon Sep 28 13:46:36 2015 From: toddrjen at gmail.com (Todd) Date: Mon, 28 Sep 2015 13:46:36 +0200 Subject: [Python-ideas] maxsplit in os.path.split Message-ID: The "str.split" and "str.rsplit" methods have a useful "maxsplit" option, which lets you set the number of times to split, defaulting to -1 (which is "unlimited"). The corresponding "os.path.split", however, has no "maxsplit" option. It can only split once, which splits the last path segment (the "basename") from the rest (equivalent of "str.rsplit" with "maxsplit=1"). I think it would be useful if "os.path.split" also had a "maxsplit" option. This would default to "1" (the current behavior"), but could be set to any value allowed by "str.split". Using this option would follow the behavior of "str.rsplit" for that value of "maxsplit". -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Sep 28 13:58:41 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 28 Sep 2015 12:58:41 +0100 Subject: [Python-ideas] maxsplit in os.path.split In-Reply-To: References: Message-ID: On 28 September 2015 at 12:46, Todd wrote: > I think it would be useful if "os.path.split" also had a "maxsplit" option. > This would default to "1" (the current behavior"), but could be set to any > value allowed by "str.split". Using this option would follow the behavior > of "str.rsplit" for that value of "maxsplit". In Python 3.6+ (which is the only place a change like this is likely to happen) you're probably better using pathlib. There, you can use path.parts, which returns a tuple of the path elements, so you can do things like >>> Path('C:\\what\\ever\\you\\like.txt').parts[-3:] ('ever', 'you', 'like.txt') That's usable now in Python 3.4+, and a backport is available at https://pypi.python.org/pypi/pathlib/ Paul From guido at python.org Mon Sep 28 16:12:09 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 07:12:09 -0700 Subject: [Python-ideas] maxsplit in os.path.split In-Reply-To: References: Message-ID: Also, the similarity between str.*split() and os.path.split() is not close enough to draw conclusions about one from the other. On Mon, Sep 28, 2015 at 4:58 AM, Paul Moore wrote: > On 28 September 2015 at 12:46, Todd wrote: > > I think it would be useful if "os.path.split" also had a "maxsplit" > option. > > This would default to "1" (the current behavior"), but could be set to > any > > value allowed by "str.split". Using this option would follow the > behavior > > of "str.rsplit" for that value of "maxsplit". > > In Python 3.6+ (which is the only place a change like this is likely > to happen) you're probably better using pathlib. There, you can use > path.parts, which returns a tuple of the path elements, so you can do > things like > > >>> Path('C:\\what\\ever\\you\\like.txt').parts[-3:] > ('ever', 'you', 'like.txt') > > That's usable now in Python 3.4+, and a backport is available at > https://pypi.python.org/pypi/pathlib/ > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Mon Sep 28 16:33:03 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 28 Sep 2015 09:33:03 -0500 Subject: [Python-ideas] Using `or?` as the null coalescing operator In-Reply-To: References: <6C2E5579-42A0-423F-AB8C-01B49FA59D67@gmail.com> Message-ID: <078DE967-1A38-4886-9DDB-567E6021F19F@gmail.com> On September 28, 2015 5:50:15 AM CDT, Koos Zevenhoven wrote: >On Mon, Sep 28, 2015 at 10:13 AM, Nick Coghlan >wrote: >> On 25 September 2015 at 09:07, Alessio Bogon >wrote: >>> I really like PEP 0505. The only thing that does not convince me is >the `??` operator. I would like to know what you think of an >alternative like `or?`: >>> >>> a_list = some_list or? [] >>> a_dict = some_dict or? {} >>> > >And have the following syntax options been considered? > >a_list = some_list else [] > This one's ambiguous. How would you parse: x if a else b else c As: x if (a else b) else c Or: x if a else (b else c) >a_list = some_list or [] if None > >-- Koos >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. From jdhardy at gmail.com Mon Sep 28 18:02:54 2015 From: jdhardy at gmail.com (Jeff Hardy) Date: Mon, 28 Sep 2015 09:02:54 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts Message-ID: TL;DR: +1 for the idea -1 on the propagating member-access or index operators +1 on spelling it "or?" C# has had null-coalescing since about 2005, and it's one feature I miss in every other language that I use. I view null/None as a necessary evil, so getting rid of them as soon possible is a good thing in my book. Nearly every bit of Python I've ever written would have benefitted from it, if just to get rid of the "x if x is not None else []" mess. That said, I think the other (propagating) operators are a mistake, and I think they were a mistake in C# as well. I'm not I've ever had a situation where I wished they existed, in any language. Better to get rid of the Nones as soon as possible than bring them along. It's worth reading the C# design team's notes and subsequent discussion on the associativity of "?." [1] since it goes around and around with no really good answer and no particularly intuitive behaviour. Rather than worry about that, I'd prefer to see just the basic None-coalescing added. I like Alessio's suggestion of "or?" (which seems like it should be read in a calm but threatening tone, a la Liam Neeson). It just seems more Pythonic; ?? is fine in C# but seems punctuation-heavy for Python. It does mean the ?= and ?. and ?[] are probably out, and I'm OK with that. - Jeff [1] https://roslyn.codeplex.com/discussions/543895 -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Mon Sep 28 18:11:45 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 28 Sep 2015 18:11:45 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: Message-ID: <560966C1.1040704@mail.de> On 28.09.2015 18:02, Jeff Hardy wrote: > TL;DR: > +1 for the idea > -1 on the propagating member-access or index operators > +1 on spelling it "or?" > > C# has had null-coalescing since about 2005, and it's one feature I > miss in every other language that I use. I view null/None as a > necessary evil, so getting rid of them as soon possible is a good > thing in my book. Nearly every bit of Python I've ever written would > have benefitted from it, if just to get rid of the "x if x is not None > else []" mess. > > That said, I think the other (propagating) operators are a mistake, > and I think they were a mistake in C# as well. I'm not I've ever had a > situation where I wished they existed, in any language. Better to get > rid of the Nones as soon as possible than bring them along. It's worth > reading the C# design team's notes and subsequent discussion on the > associativity of "?." [1] since it goes around and around with no > really good answer and no particularly intuitive behaviour. > > Rather than worry about that, I'd prefer to see just the basic > None-coalescing added. I like Alessio's suggestion of "or?" (which > seems like it should be read in a calm but threatening tone, a la Liam > Neeson). It just seems more Pythonic; ?? is fine in C# but seems > punctuation-heavy for Python. It does mean the ?= and ?. and ?[] are > probably out, and I'm OK with that. > > - Jeff > > [1] https://roslyn.codeplex.com/discussions/543895 > That sums it all up for me as well, though I would rather use "else" instead of "or?" (see punctuation-heavy). Best, Sven From steve at pearwood.info Mon Sep 28 18:37:33 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 29 Sep 2015 02:37:33 +1000 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560966C1.1040704@mail.de> References: <560966C1.1040704@mail.de> Message-ID: <20150928163733.GN23642@ando.pearwood.info> On Mon, Sep 28, 2015 at 06:11:45PM +0200, Sven R. Kunze wrote: > That sums it all up for me as well, though I would rather use "else" > instead of "or?" (see punctuation-heavy). `else` is ambiguous. Consider: result = spam if eggs else cheese else aardvark could be interpreted three ways: result = (spam if eggs else cheese) else aardvark result = spam if (eggs else cheese) else aardvark result = spam if eggs else (cheese else aardvark) Whichever precedence you pick, some people will get it wrong and it will silently do the wrong thing and lead to hard-to-diagnose bugs. Using "else" for this will be a bug-magnet. -- Steve From donald at stufft.io Mon Sep 28 18:40:51 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 28 Sep 2015 12:40:51 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <20150928163733.GN23642@ando.pearwood.info> References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> Message-ID: could use two words! result = spam or else eggs On September 28, 2015 at 12:38:21 PM, Steven D'Aprano (steve at pearwood.info) wrote: > On Mon, Sep 28, 2015 at 06:11:45PM +0200, Sven R. Kunze wrote: > > > That sums it all up for me as well, though I would rather use "else" > > instead of "or?" (see punctuation-heavy). > > `else` is ambiguous. Consider: > > result = spam if eggs else cheese else aardvark > > could be interpreted three ways: > > result = (spam if eggs else cheese) else aardvark > result = spam if (eggs else cheese) else aardvark > result = spam if eggs else (cheese else aardvark) > > Whichever precedence you pick, some people will get it wrong and it will > silently do the wrong thing and lead to hard-to-diagnose bugs. Using > "else" for this will be a bug-magnet. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From abarnert at yahoo.com Mon Sep 28 18:41:36 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Sep 2015 09:41:36 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: Message-ID: <1E0CDC0C-CF95-4780-8DFB-BC4D6B5CC192@yahoo.com> On Sep 28, 2015, at 09:02, Jeff Hardy wrote: > > It's worth reading the C# design team's notes and subsequent discussion on the associativity of "?." [1] since it goes around and around with no really good answer and no particularly intuitive behaviour. Many of the problems raised there are irrelevant to Python: the fact that C# has "value types" that aren't referenced and can't be null, the fact that its ASTs are often processed by type-driven programming, the fact that it's not considered normal to raise and catch an exception in cases that aren't truly exceptional, the fact that there's (human-reader) ambiguity with ?:, the fact that . and [] are actually operators rather than a different kind of syntax, etc. There may be parallel problems to some of those issues in Python, but just assuming there will be because there are in C# doesn't establish that. The one argument that does carry over is that the "right-associative" version is harder to implement. That's worth discussing in Python terms, but it would be much more useful for someone to write an implementation to prove that the grammar, AST, and compiler code actually aren't that complicated, than to argue that they wouldn't necessarily be so. (The fact that it's harder to see where the exception comes from in something like spam(a?.b.c) is also the same in both languages, but that's already been discussed here, and I don't think that's a real problem in the first place--after all, a.b.c already raises an AttributeError that makes it just as hard to see whether it comes from None.b or None.c.) From cmeyer1969 at gmail.com Mon Sep 28 18:46:56 2015 From: cmeyer1969 at gmail.com (Chris Meyer) Date: Mon, 28 Sep 2015 09:46:56 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> Message-ID: <08F886E8-EF4F-40AB-AAA6-35432C42DB5C@gmail.com> > On Sep 28, 2015, at 9:40 AM, Donald Stufft wrote: > > could use two words! > > result = spam or else eggs Could use otherwise: result = spam otherwise eggs > On September 28, 2015 at 12:38:21 PM, Steven D'Aprano (steve at pearwood.info) wrote: >> On Mon, Sep 28, 2015 at 06:11:45PM +0200, Sven R. Kunze wrote: >> >>> That sums it all up for me as well, though I would rather use "else" >>> instead of "or?" (see punctuation-heavy). >> >> `else` is ambiguous. Consider: >> >> result = spam if eggs else cheese else aardvark >> >> could be interpreted three ways: >> >> result = (spam if eggs else cheese) else aardvark >> result = spam if (eggs else cheese) else aardvark >> result = spam if eggs else (cheese else aardvark) >> >> Whichever precedence you pick, some people will get it wrong and it will >> silently do the wrong thing and lead to hard-to-diagnose bugs. Using >> "else" for this will be a bug-magnet. >> >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Mon Sep 28 18:46:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Sep 2015 09:46:15 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> Message-ID: On Sep 28, 2015, at 09:40, Donald Stufft wrote: > > could use two words! > > result = spam or else eggs Unless you change the tokenizer to understand "or else" as a special case, or add another level of lookahead to the parser, how do you handle "spam if eggs or else cheese else aardvark" and vice-versa? It does make the meaning less confusing to a human, but it makes understanding how the compiler parses that meaning harder to understand to a human. >> On September 28, 2015 at 12:38:21 PM, Steven D'Aprano (steve at pearwood.info) wrote: >>> On Mon, Sep 28, 2015 at 06:11:45PM +0200, Sven R. Kunze wrote: >>> >>> That sums it all up for me as well, though I would rather use "else" >>> instead of "or?" (see punctuation-heavy). >> >> `else` is ambiguous. Consider: >> >> result = spam if eggs else cheese else aardvark >> >> could be interpreted three ways: >> >> result = (spam if eggs else cheese) else aardvark >> result = spam if (eggs else cheese) else aardvark >> result = spam if eggs else (cheese else aardvark) >> >> Whichever precedence you pick, some people will get it wrong and it will >> silently do the wrong thing and lead to hard-to-diagnose bugs. Using >> "else" for this will be a bug-magnet. >> >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > ----------------- > Donald Stufft > PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From srkunze at mail.de Mon Sep 28 18:47:05 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 28 Sep 2015 18:47:05 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <20150928163733.GN23642@ando.pearwood.info> References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> Message-ID: <56096F09.40804@mail.de> On 28.09.2015 18:37, Steven D'Aprano wrote: > On Mon, Sep 28, 2015 at 06:11:45PM +0200, Sven R. Kunze wrote: > >> That sums it all up for me as well, though I would rather use "else" >> instead of "or?" (see punctuation-heavy). > `else` is ambiguous. Consider: > > result = spam if eggs else cheese else aardvark > > could be interpreted three ways: > > result = (spam if eggs else cheese) else aardvark > result = spam if (eggs else cheese) else aardvark > result = spam if eggs else (cheese else aardvark) > > Whichever precedence you pick, some people will get it wrong and it will > silently do the wrong thing and lead to hard-to-diagnose bugs. Using > "else" for this will be a bug-magnet. I wouldn't make a mountain out of a molehill. Other existing operators have the same issue. Best, Sven From srkunze at mail.de Mon Sep 28 19:00:46 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 28 Sep 2015 19:00:46 +0200 Subject: [Python-ideas] Binary f-strings In-Reply-To: References: <56089692.1080303@trueblade.com> Message-ID: <5609723E.7000606@mail.de> On 28.09.2015 05:03, Chris Angelico wrote: > >>>> ?????? = 1961 >>>> apollo = 1969 >>>> print(f"It took {apollo-??????} years to get from orbit to the moon.") > It took 8 years to get from orbit to the moon. >>>> print(b"It took {apollo-??????} years to get from orbit to the moon.") > File "", line 1 > SyntaxError: bytes can only contain ASCII literal characters. > > If that were a binary f-string, those Cyrillic characters should still > be legal (as they define an identifier, rather than ending up in the > code). Would it confuse (a) humans, or (b) tools, to have these "texty > bits" inside a byte string? I don't think so. "{...}" indicates the injection of whatever "..." stands for, thus is not part of the resulting string. So, no issue here for me. (The only thing that would confuse me, is that "??????" is an allowed identifier in the first place. But that seems to be a different matter.) > In any case, bf strings can be added later, but once they're added, > their semantics would be locked in. I'd be inclined to leave them out > for 3.6 and see what people say. A bit of real-world usage of > f-strings might show a clear front-runner in terms of expectations > (UTF-8, ASCII, or something else). > I tend to agree here. Best, Sven From abarnert at yahoo.com Mon Sep 28 19:24:43 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Sep 2015 10:24:43 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <56096F09.40804@mail.de> References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> <56096F09.40804@mail.de> Message-ID: <3819F6B1-3221-41E2-9103-62B71CBA7708@yahoo.com> On Sep 28, 2015, at 09:47, Sven R. Kunze wrote: > >> On 28.09.2015 18:37, Steven D'Aprano wrote: >>> On Mon, Sep 28, 2015 at 06:11:45PM +0200, Sven R. Kunze wrote: >>> >>> That sums it all up for me as well, though I would rather use "else" >>> instead of "or?" (see punctuation-heavy). >> `else` is ambiguous. Consider: >> >> result = spam if eggs else cheese else aardvark >> >> could be interpreted three ways: >> >> result = (spam if eggs else cheese) else aardvark >> result = spam if (eggs else cheese) else aardvark >> result = spam if eggs else (cheese else aardvark) >> >> Whichever precedence you pick, some people will get it wrong and it will >> silently do the wrong thing and lead to hard-to-diagnose bugs. Using >> "else" for this will be a bug-magnet. > > I wouldn't make a mountain out of a molehill. Other existing operators have the same issue. Which other keywords or symbols may be either a binary operator or part of a ternary operator depending on context? > > > Best, > Sven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From guido at python.org Mon Sep 28 19:29:02 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 10:29:02 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: Message-ID: On Mon, Sep 28, 2015 at 9:02 AM, Jeff Hardy wrote: > -1 on the propagating member-access or index operators > Can someone explain with examples what this refers to? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Sep 28 19:38:03 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 28 Sep 2015 11:38:03 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: Message-ID: <56097AFB.1040906@oddbird.net> On 09/28/2015 11:29 AM, Guido van Rossum wrote: > On Mon, Sep 28, 2015 at 9:02 AM, Jeff Hardy > wrote: > > -1 on the propagating member-access or index operators > > > Can someone explain with examples what this refers to? "Member-access or index operators" refers to the proposed ?. or ?[ operators. "Propagating" refers to the proposed behavior where use of ?. or ?[ "propagates" through the following chain of operations. For example: x = foo?.bar.spam.eggs Where both `.spam` and `.eggs` would behave like `?.spam` and `?.eggs` (propagating None rather than raising AttributeError), simply because a `.?` had occurred earlier in the chain. So the above behaves differently from: temp = foo?.bar x = temp.spam.eggs Which raises questions about whether the propagation escapes parentheses, too: x = (foo?.bar).spam.eggs Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From guido at python.org Mon Sep 28 20:38:38 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 11:38:38 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <56097AFB.1040906@oddbird.net> References: <56097AFB.1040906@oddbird.net> Message-ID: On Mon, Sep 28, 2015 at 10:38 AM, Carl Meyer wrote: > On 09/28/2015 11:29 AM, Guido van Rossum wrote: > > On Mon, Sep 28, 2015 at 9:02 AM, Jeff Hardy > > wrote: > > > > -1 on the propagating member-access or index operators > > > > > > Can someone explain with examples what this refers to? > > "Member-access or index operators" refers to the proposed ?. or ?[ > operators. > Got that. :-) > "Propagating" refers to the proposed behavior where use of ?. or ?[ > "propagates" through the following chain of operations. For example: > > x = foo?.bar.spam.eggs > > Where both `.spam` and `.eggs` would behave like `?.spam` and `?.eggs` > (propagating None rather than raising AttributeError), simply because a > `.?` had occurred earlier in the chain. So the above behaves differently > from: > > temp = foo?.bar > x = temp.spam.eggs > > Which raises questions about whether the propagation escapes > parentheses, too: > > x = (foo?.bar).spam.eggs > Oh, I see. That's evil. The correct behavior here is that "foo?.bar.spam.eggs" should mean the same as (None if foo is None else foo.bar.spam.eggs) (Stop until you understand that is *not* the same as either of the alternatives you describe.) I can see the confusion that led to the idea of "propagation" -- it probably comes from an attempt to define "foo?.bar" without reference to the context (in this case the relevant context is that it's followed by ".spam.eggs"). It should not escape parentheses. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From emile at fenx.com Mon Sep 28 21:38:04 2015 From: emile at fenx.com (Emile van Sebille) Date: Mon, 28 Sep 2015 12:38:04 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <3819F6B1-3221-41E2-9103-62B71CBA7708@yahoo.com> References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> <56096F09.40804@mail.de> <3819F6B1-3221-41E2-9103-62B71CBA7708@yahoo.com> Message-ID: On 9/28/2015 10:24 AM, Andrew Barnert via Python-ideas wrote: > On Sep 28, 2015, at 09:47, Sven R. Kunze wrote: >> I wouldn't make a mountain out of a molehill. Other existing operators have the same issue. > > Which other keywords or symbols may be either a binary operator or part of a ternary operator depending on context? These come to mind: a = b = c a < b < c Emile From carl at oddbird.net Mon Sep 28 21:43:24 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 28 Sep 2015 13:43:24 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> Message-ID: <5609985C.40603@oddbird.net> On 09/28/2015 12:38 PM, Guido van Rossum wrote: > On Mon, Sep 28, 2015 at 10:38 AM, Carl Meyer "Propagating" refers to the proposed behavior where use of ?. or ?[ > "propagates" through the following chain of operations. For example: > > x = foo?.bar.spam.eggs > > Where both `.spam` and `.eggs` would behave like `?.spam` and `?.eggs` > (propagating None rather than raising AttributeError), simply because a > `.?` had occurred earlier in the chain. So the above behaves differently > from: > > temp = foo?.bar > x = temp.spam.eggs > > Which raises questions about whether the propagation escapes > parentheses, too: > > x = (foo?.bar).spam.eggs > > Oh, I see. That's evil. > > The correct behavior here is that "foo?.bar.spam.eggs" should mean the > same as > > (None if foo is None else foo.bar.spam.eggs) > > (Stop until you understand that is *not* the same as either of the > alternatives you describe.) I see that. The distinction is "short-circuit" vs "propagate." Short-circuit is definitely more comprehensible and palatable. [snip] > It should not escape parentheses. Good. I assume that the short-circuiting would follow the precedence order; that is, nothing with looser precedence than member and index access would be short-circuited. So, for example, foo?.bar['baz'].spam would short-circuit the indexing and the final member access, translating to foo.bar['baz'].spam if foo is not None else None but foo?.bar or 'baz' would mean (foo.bar if foo is not None else None) or 'baz' and would never evaluate to None. Similarly for any operator that binds less tightly than member/index access (which is basically all Python operators). AFAICS, under your proposed semantics what I said above is still true, that x = foo?.bar.baz would necessarily have a different meaning than temp = foo?.bar x = temp.baz Or put differently, that whereas these two are trivially equivalent (the definition of left-to-right binding within a precedence class): foo.bar.baz (foo.bar).baz these two are not equivalent: foo?.bar.baz (foo?.bar).baz I'm having trouble coming up with a parallel example where the existing short-circuit operators break "extractibility" of a sub-expression like that. I guess this is because the proposed short-circuiting still "breaks out of the precedence order" in a way that the existing short-circuiting operators don't. Both member access and indexing are within the same left-to-right binding precedence class, but the new operators would have a short-circuit effect that swallows operations beyond where normal left-to-right binding would suggest their effect should reach. Are there existing examples of behavior like this in Python that I'm missing? Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From srkunze at mail.de Mon Sep 28 21:47:10 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Mon, 28 Sep 2015 21:47:10 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <3819F6B1-3221-41E2-9103-62B71CBA7708@yahoo.com> References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> <56096F09.40804@mail.de> <3819F6B1-3221-41E2-9103-62B71CBA7708@yahoo.com> Message-ID: <5609993E.9010103@mail.de> On 28.09.2015 19:24, Andrew Barnert wrote: > On Sep 28, 2015, at 09:47, Sven R. Kunze wrote: >> >>> result = (spam if eggs else cheese) else aardvark >>> result = spam if (eggs else cheese) else aardvark >>> result = spam if eggs else (cheese else aardvark) >>> >>> Whichever precedence you pick, some people will get it wrong and it will >>> silently do the wrong thing and lead to hard-to-diagnose bugs. Using >>> "else" for this will be a bug-magnet. >> I wouldn't make a mountain out of a molehill. Other existing operators have the same issue. > Which other keywords or symbols may be either a binary operator or part of a ternary operator depending on context? It has nothing to do with either of it. I've seen young students struggling with the op precedence of AND and OR; and I've seen experienced coworkers rather adding superfluous pairs of parentheses just to make sure or because they still don't know better. Best, Sven From carl at oddbird.net Mon Sep 28 21:53:05 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 28 Sep 2015 13:53:05 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <5609985C.40603@oddbird.net> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> Message-ID: <56099AA1.1010609@oddbird.net> On 09/28/2015 01:43 PM, Carl Meyer wrote: [snip] > I assume that the short-circuiting would follow the precedence > order; that is, nothing with looser precedence than member and index > access would be short-circuited. So, for example, > > foo?.bar['baz'].spam > > would short-circuit the indexing and the final member access, translating to > > foo.bar['baz'].spam if foo is not None else None > > but > > foo?.bar or 'baz' > > would mean > > (foo.bar if foo is not None else None) or 'baz' > > and would never evaluate to None. Similarly for any operator that binds > less tightly than member/index access (which is basically all Python > operators). For a possibly less-intuitive example of this principle (arbitrarily picking the operator that binds next-most-tightly), what should foo?.bar**3 mean? Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From guido at python.org Mon Sep 28 21:53:43 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 12:53:43 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <5609985C.40603@oddbird.net> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> Message-ID: On Mon, Sep 28, 2015 at 12:43 PM, Carl Meyer wrote: > On 09/28/2015 12:38 PM, Guido van Rossum wrote: > > On Mon, Sep 28, 2015 at 10:38 AM, Carl Meyer > > "Propagating" refers to the proposed behavior where use of ?. or ?[ > > "propagates" through the following chain of operations. For example: > > > > x = foo?.bar.spam.eggs > > > > Where both `.spam` and `.eggs` would behave like `?.spam` and > `?.eggs` > > (propagating None rather than raising AttributeError), simply > because a > > `.?` had occurred earlier in the chain. So the above behaves > differently > > from: > > > > temp = foo?.bar > > x = temp.spam.eggs > > > > Which raises questions about whether the propagation escapes > > parentheses, too: > > > > x = (foo?.bar).spam.eggs > > > > Oh, I see. That's evil. > > > > The correct behavior here is that "foo?.bar.spam.eggs" should mean the > > same as > > > > (None if foo is None else foo.bar.spam.eggs) > > > > (Stop until you understand that is *not* the same as either of the > > alternatives you describe.) > > I see that. The distinction is "short-circuit" vs "propagate." > Short-circuit is definitely more comprehensible and palatable. > Right. > [snip] > > It should not escape parentheses. > > Good. I assume that the short-circuiting would follow the precedence > order; that is, nothing with looser precedence than member and index > access would be short-circuited. So, for example, > > foo?.bar['baz'].spam > > would short-circuit the indexing and the final member access, translating > to > > foo.bar['baz'].spam if foo is not None else None > > but > > foo?.bar or 'baz' > > would mean > > (foo.bar if foo is not None else None) or 'baz' > > and would never evaluate to None. Similarly for any operator that binds > less tightly than member/index access (which is basically all Python > operators). > Correct. The scope of ? would be all following .foo, .[stuff], or .(args) -- but stopping at any other operator (including parens). > AFAICS, under your proposed semantics what I said above is still true, that > > x = foo?.bar.baz > > would necessarily have a different meaning than > > temp = foo?.bar > x = temp.baz > > Or put differently, that whereas these two are trivially equivalent (the > definition of left-to-right binding within a precedence class): > > foo.bar.baz > (foo.bar).baz > > these two are not equivalent: > > foo?.bar.baz > (foo?.bar).baz > Right. > I'm having trouble coming up with a parallel example where the existing > short-circuit operators break "extractibility" of a sub-expression like > that. > Why is that an interesting property? > I guess this is because the proposed short-circuiting still "breaks out > of the precedence order" in a way that the existing short-circuiting > operators don't. Both member access and indexing are within the same > left-to-right binding precedence class, but the new operators would have > a short-circuit effect that swallows operations beyond where normal > left-to-right binding would suggest their effect should reach. > > Are there existing examples of behavior like this in Python that I'm > missing? I don't know, but I think you shouldn't worry about this. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Sep 28 21:57:26 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 12:57:26 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <56099AA1.1010609@oddbird.net> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099AA1.1010609@oddbird.net> Message-ID: On Mon, Sep 28, 2015 at 12:53 PM, Carl Meyer wrote: > On 09/28/2015 01:43 PM, Carl Meyer wrote: > [snip] > > I assume that the short-circuiting would follow the precedence > > order; that is, nothing with looser precedence than member and index > > access would be short-circuited. So, for example, > > > > foo?.bar['baz'].spam > > > > would short-circuit the indexing and the final member access, > translating to > > > > foo.bar['baz'].spam if foo is not None else None > > > > but > > > > foo?.bar or 'baz' > > > > would mean > > > > (foo.bar if foo is not None else None) or 'baz' > > > > and would never evaluate to None. Similarly for any operator that binds > > less tightly than member/index access (which is basically all Python > > operators). > > For a possibly less-intuitive example of this principle (arbitrarily > picking the operator that binds next-most-tightly), what should > > foo?.bar**3 > > mean? > It's nonsense -- it means (foo?.bar)**3 but since foo?.bar can return None and None**3 is an error you shouldn't do that. But don't try to then come up with syntax that rejects foo?.bar**something statically, because something might be an object implements __rpow__. And I still don't see why this "principle" would be important. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Sep 28 22:00:47 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 28 Sep 2015 14:00:47 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> Message-ID: <56099C6F.90700@oddbird.net> On 09/28/2015 01:53 PM, Guido van Rossum wrote: > On Mon, Sep 28, 2015 at 12:43 PM, Carl Meyer Or put differently, that whereas these two are trivially equivalent (the > definition of left-to-right binding within a precedence class): > > foo.bar.baz > (foo.bar).baz > > these two are not equivalent: > > foo?.bar.baz > (foo?.bar).baz > > > Right. > > > I'm having trouble coming up with a parallel example where the existing > short-circuit operators break "extractibility" of a sub-expression like > that. > > > Why is that an interesting property? Because breaking up an overly-complex expression into smaller expressions by means of extracting sub-expressions into temporary variables is a common programming task (in my experience anyway -- especially when trying to decipher some long-gone programmer's overly-complex code), and it's usually one that can be handled pretty mechanically according to precedence rules, without having to consider that some operators might have action-at-a-distance beyond their precedence. > I guess this is because the proposed short-circuiting still "breaks out > of the precedence order" in a way that the existing short-circuiting > operators don't. Both member access and indexing are within the same > left-to-right binding precedence class, but the new operators would have > a short-circuit effect that swallows operations beyond where normal > left-to-right binding would suggest their effect should reach. > > Are there existing examples of behavior like this in Python that I'm > missing? > > > I don't know, but I think you shouldn't worry about this. I think it's kind of odd, but if nobody else is worried about it, I won't worry about it either :-) Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Mon Sep 28 22:05:15 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 28 Sep 2015 14:05:15 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099AA1.1010609@oddbird.net> Message-ID: <56099D7B.4020900@oddbird.net> On 09/28/2015 01:57 PM, Guido van Rossum wrote: > On Mon, Sep 28, 2015 at 12:53 PM, Carl Meyer > wrote: > > On 09/28/2015 01:43 PM, Carl Meyer wrote: > [snip] > > I assume that the short-circuiting would follow the precedence > > order; that is, nothing with looser precedence than member and index > > access would be short-circuited. So, for example, > > > > foo?.bar['baz'].spam > > > > would short-circuit the indexing and the final member access, translating to > > > > foo.bar['baz'].spam if foo is not None else None > > > > but > > > > foo?.bar or 'baz' > > > > would mean > > > > (foo.bar if foo is not None else None) or 'baz' > > > > and would never evaluate to None. Similarly for any operator that binds > > less tightly than member/index access (which is basically all Python > > operators). > > For a possibly less-intuitive example of this principle (arbitrarily > picking the operator that binds next-most-tightly), what should > > foo?.bar**3 > > mean? > > > It's nonsense -- it means (foo?.bar)**3 but since foo?.bar can return > None and None**3 is an error you shouldn't do that. But don't try to > then come up with syntax that rejects foo?.bar**something statically, > because something might be an object implements __rpow__. > > And I still don't see why this "principle" would be important. The only "principle" in question here is "nothing with looser precedence than member and index access would be short-circuited," and you seem to agree with it. I was just making sure that foo?.bar**3 couldn't possibly mean (foo.bar**3 if foo is None else None) and I'm glad it couldn't. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From guido at python.org Mon Sep 28 22:06:18 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 13:06:18 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <56099C6F.90700@oddbird.net> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> Message-ID: On Mon, Sep 28, 2015 at 1:00 PM, Carl Meyer wrote: > On 09/28/2015 01:53 PM, Guido van Rossum wrote: > > On Mon, Sep 28, 2015 at 12:43 PM, Carl Meyer > Or put differently, that whereas these two are trivially equivalent > (the > > definition of left-to-right binding within a precedence class): > > > > foo.bar.baz > > (foo.bar).baz > > > > these two are not equivalent: > > > > foo?.bar.baz > > (foo?.bar).baz > > > > > > Right. > > > > > > I'm having trouble coming up with a parallel example where the > existing > > short-circuit operators break "extractibility" of a sub-expression > like > > that. > > > > > > Why is that an interesting property? > > Because breaking up an overly-complex expression into smaller > expressions by means of extracting sub-expressions into temporary > variables is a common programming task (in my experience anyway -- > especially when trying to decipher some long-gone programmer's > overly-complex code), and it's usually one that can be handled pretty > mechanically according to precedence rules, without having to consider > that some operators might have action-at-a-distance beyond their > precedence. > Well, if just the foo?.bar.baz part is already too complex you probably need to reconsider your career. :-) Seriously, when breaking things into smaller parts you *have* to understand the shortcut properties. You can't break "foo() or bar()" into a = foo() b = bar() return a or b either. > > I guess this is because the proposed short-circuiting still "breaks > out > > of the precedence order" in a way that the existing short-circuiting > > operators don't. Both member access and indexing are within the same > > left-to-right binding precedence class, but the new operators would > have > > a short-circuit effect that swallows operations beyond where normal > > left-to-right binding would suggest their effect should reach. > > > > Are there existing examples of behavior like this in Python that I'm > > missing? > > > > > > I don't know, but I think you shouldn't worry about this. > > I think it's kind of odd, but if nobody else is worried about it, I > won't worry about it either :-) > Good idea. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Mon Sep 28 22:15:19 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 28 Sep 2015 16:15:19 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <56099C6F.90700@oddbird.net> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> Message-ID: <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> The ? Modifying additional attribute accesses beyond just the immediate one bothers me too and feels more ruby than python to me. Sent from my iPhone > On Sep 28, 2015, at 4:00 PM, Carl Meyer wrote: > > I think it's kind of odd, but if nobody else is worried about it, I > won't worry about it either :-) From guido at python.org Mon Sep 28 22:24:50 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 13:24:50 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: On Mon, Sep 28, 2015 at 1:15 PM, Donald Stufft wrote: > The ? Modifying additional attribute accesses beyond just the immediate > one bothers me too and feels more ruby than python to me. > Really? Have you thought about it? Suppose I have an object post which may be None or something with a tag attribute which should be a string. And suppose I want to get the lowercased tag, if the object exists, else None. This seems a perfect use case for writing post?.tag.lower() -- this signifies that post may be None but if it exists, post.tag is not expected to be None. So basically I want the equivalent of (post.tag.lower() if post is not None else None). But if post?.tag.lower() were interpreted strictly as (post?.tag).lower(), then I would have to write post?.tag?.lower?(), which is an abomination. OTOH if post?.tag.lower() automatically meant post?.tag?.lower?() then I would silently get no error when post exists but post.tag is None (which in this example is an error). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Mon Sep 28 22:27:04 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 29 Sep 2015 06:27:04 +1000 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> Message-ID: <85si5y2stj.fsf@benfinney.id.au> Carl Meyer writes: > On 09/28/2015 01:53 PM, Guido van Rossum wrote: > > On Mon, Sep 28, 2015 at 12:43 PM, Carl Meyer: > > > I'm having trouble coming up with a parallel example where the > > > existing short-circuit operators break "extractibility" of a > > > sub-expression like that. > > > > Why is that an interesting property? > > Because breaking up an overly-complex expression into smaller > expressions by means of extracting sub-expressions into temporary > variables is a common programming task +1, this is a hugely important tool in the mental toolkit. Making that more difficult is a high cost, thank you for expressing it so explicitly. > it's usually one that can be handled pretty mechanically according to > precedence rules, without having to consider that some operators might > have action-at-a-distance beyond their precedence. > > > I don't know, but I think you shouldn't worry about this. > > I think it's kind of odd, but if nobody else is worried about it, I > won't worry about it either :-) I share the concerns Carl is expressing; action-at-a-distance is something I'm glad Python doesn't have much of, and I would be loath to see that increase. -- \ ?A lie can be told in a few words. Debunking that lie can take | `\ pages. That is why my book? is five hundred pages long.? ?Chris | _o__) Rodda, 2011-05-05 | Ben Finney From guido at python.org Mon Sep 28 22:32:06 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 13:32:06 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <85si5y2stj.fsf@benfinney.id.au> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <85si5y2stj.fsf@benfinney.id.au> Message-ID: On Mon, Sep 28, 2015 at 1:27 PM, Ben Finney wrote: > Carl Meyer writes: > > > On 09/28/2015 01:53 PM, Guido van Rossum wrote: > > > On Mon, Sep 28, 2015 at 12:43 PM, Carl Meyer: > > > > I'm having trouble coming up with a parallel example where the > > > > existing short-circuit operators break "extractibility" of a > > > > sub-expression like that. > > > > > > Why is that an interesting property? > > > > Because breaking up an overly-complex expression into smaller > > expressions by means of extracting sub-expressions into temporary > > variables is a common programming task > > +1, this is a hugely important tool in the mental toolkit. Making that > more difficult is a high cost, thank you for expressing it so explicitly. > > > it's usually one that can be handled pretty mechanically according to > > precedence rules, without having to consider that some operators might > > have action-at-a-distance beyond their precedence. > > > > > I don't know, but I think you shouldn't worry about this. > > > > I think it's kind of odd, but if nobody else is worried about it, I > > won't worry about it either :-) > > I share the concerns Carl is expressing; action-at-a-distance is > something I'm glad Python doesn't have much of, and I would be loath to > see that increase. > Really? You would consider a syntactic feature whose scope is limited to things to its immediate right with the most tightly binding pseudo-operators "action-at-a-distance"? The rhetoric around this issue is beginning to sound ridiculous. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Sep 28 22:38:38 2015 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 28 Sep 2015 21:38:38 +0100 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> Message-ID: <5609A54E.8050806@mrabarnett.plus.com> On 2015-09-28 21:06, Guido van Rossum wrote: > On Mon, Sep 28, 2015 at 1:00 PM, Carl Meyer > wrote: > > On 09/28/2015 01:53 PM, Guido van Rossum wrote: > > On Mon, Sep 28, 2015 at 12:43 PM, Carl Meyer > > Or put differently, that whereas these two are trivially equivalent (the > > definition of left-to-right binding within a precedence class): > > > > foo.bar.baz > > (foo.bar).baz > > > > these two are not equivalent: > > > > foo?.bar.baz > > (foo?.bar).baz > > > > > > Right. > > > > > > I'm having trouble coming up with a parallel example where the existing > > short-circuit operators break "extractibility" of a sub-expression like > > that. > > > > > > Why is that an interesting property? > > Because breaking up an overly-complex expression into smaller > expressions by means of extracting sub-expressions into temporary > variables is a common programming task (in my experience anyway -- > especially when trying to decipher some long-gone programmer's > overly-complex code), and it's usually one that can be handled pretty > mechanically according to precedence rules, without having to consider > that some operators might have action-at-a-distance beyond their > precedence. > > > Well, if just the foo?.bar.baz part is already too complex you probably > need to reconsider your career. :-) > > Seriously, when breaking things into smaller parts you *have* to > understand the shortcut properties. You can't break "foo() or bar()" into > > a = foo() > b = bar() > return a or b > > either. > Exactly. Can you break: result = do_this() if test() else do_that() into parts without changing its meaning/behaviour? condition = test() true_result = do_this() false_result = do_that() result = true_result if condition else false_result The ? 'operators' are syntactic sugar. > > I guess this is because the proposed short-circuiting still "breaks out > > of the precedence order" in a way that the existing short-circuiting > > operators don't. Both member access and indexing are within the same > > left-to-right binding precedence class, but the new operators would have > > a short-circuit effect that swallows operations beyond where normal > > left-to-right binding would suggest their effect should reach. > > > > Are there existing examples of behavior like this in Python that I'm > > missing? > > > > > > I don't know, but I think you shouldn't worry about this. > > I think it's kind of odd, but if nobody else is worried about it, I > won't worry about it either :-) > > > Good idea. > From ben+python at benfinney.id.au Mon Sep 28 22:38:37 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 29 Sep 2015 06:38:37 +1000 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: <85oagm2saa.fsf@benfinney.id.au> Guido van Rossum writes: > This seems a perfect use case for writing post?.tag.lower() -- this > signifies that post may be None but if it exists, post.tag is not > expected to be None. So basically I want the equivalent of > (post.tag.lower() if post is not None else None). You're deliberately choosing straightforward examples. That's fine for showing the intended use case, but it does mean dismissing the concerns about ambiguity in complex cases. It also means the use cases are so simply they are easily expressed succinctly with existing syntax, with the advantage of being more explicit in their effect; so they don't argue strongly for the need to add the new syntax. So, the corner case examples in this thread, which mix up precedence, are useful because they show how confusion is increased by making the precedence and binding rules more complicated. -- \ ?People are very open-minded about new things, as long as | `\ they're exactly like the old ones.? ?Charles F. Kettering | _o__) | Ben Finney From donald at stufft.io Mon Sep 28 22:41:52 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 28 Sep 2015 16:41:52 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: On September 28, 2015 at 4:25:12 PM, Guido van Rossum (guido at python.org) wrote: > On Mon, Sep 28, 2015 at 1:15 PM, Donald Stufft wrote: > > > The ? Modifying additional attribute accesses beyond just the immediate > > one bothers me too and feels more ruby than python to me. > > > > Really? Have you thought about it? Not extensively, mostly this is a gut feeling. > > Suppose I have an object post which may be None or something with a tag > attribute which should be a string. And suppose I want to get the > lowercased tag, if the object exists, else None. > > This seems a perfect use case for writing post?.tag.lower() -- this > signifies that post may be None but if it exists, post.tag is not expected > to be None. So basically I want the equivalent of (post.tag.lower() if post > is not None else None). > > But if post?.tag.lower() were interpreted strictly as (post?.tag).lower(), > then I would have to write post?.tag?.lower?(), which is an abomination. > OTOH if post?.tag.lower() automatically meant post?.tag?.lower?() then I > would silently get no error when post exists but post.tag is None (which in > this example is an error). > Does ? propagate past a non None value? If it were?post?.tag.name.lower() and post was not None, but tag was None would that be an error or would the ? propagate to the tag as well? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From guido at python.org Mon Sep 28 22:54:09 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 13:54:09 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <85oagm2saa.fsf@benfinney.id.au> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> Message-ID: On Mon, Sep 28, 2015 at 1:38 PM, Ben Finney wrote: > Guido van Rossum writes: > > > This seems a perfect use case for writing post?.tag.lower() -- this > > signifies that post may be None but if it exists, post.tag is not > > expected to be None. So basically I want the equivalent of > > (post.tag.lower() if post is not None else None). > > You're deliberately choosing straightforward examples. That's fine for > showing the intended use case, but it does mean dismissing the concerns > about ambiguity in complex cases. > > It also means the use cases are so simply they are easily expressed > succinctly with existing syntax, with the advantage of being more > explicit in their effect; so they don't argue strongly for the need to > add the new syntax. > > So, the corner case examples in this thread, which mix up precedence, > are useful because they show how confusion is increased by making the > precedence and binding rules more complicated. > But your argument seems to boil down to "it is possible to write obfuscated code using this feature". If you want to dumb down the feature so that foo?.bar.baz means just (foo?.bar).baz then it's useless and I should just reject the PEP. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Sep 28 22:56:22 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 13:56:22 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: On Mon, Sep 28, 2015 at 1:41 PM, Donald Stufft wrote: > On September 28, 2015 at 4:25:12 PM, Guido van Rossum (guido at python.org) > wrote: > > On Mon, Sep 28, 2015 at 1:15 PM, Donald Stufft wrote: > > > > > The ? Modifying additional attribute accesses beyond just the immediate > > > one bothers me too and feels more ruby than python to me. > > > > > > > Really? Have you thought about it? > > Not extensively, mostly this is a gut feeling. > > > > > Suppose I have an object post which may be None or something with a tag > > attribute which should be a string. And suppose I want to get the > > lowercased tag, if the object exists, else None. > > > > This seems a perfect use case for writing post?.tag.lower() -- this > > signifies that post may be None but if it exists, post.tag is not > expected > > to be None. So basically I want the equivalent of (post.tag.lower() if > post > > is not None else None). > > > > But if post?.tag.lower() were interpreted strictly as > (post?.tag).lower(), > > then I would have to write post?.tag?.lower?(), which is an abomination. > > OTOH if post?.tag.lower() automatically meant post?.tag?.lower?() then I > > would silently get no error when post exists but post.tag is None (which > in > > this example is an error). > > > > Does ? propagate past a non None value? If it were post?.tag.name.lower() > and post was not None, but tag was None would that be an error or would the > ? propagate to the tag as well? > I was trying to clarify that by saying that foo?.bar.baz means (foo.bar.baz if foo is not None else None). IOW if tag was None that would be an error. The rule then is quite simple: each ? does exactly one None check and divides the expression into exactly two branches -- one for the case where the thing preceding ? is None and one for the case where it isn't. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Sep 28 23:04:34 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 28 Sep 2015 15:04:34 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> Message-ID: <5609AB62.5040503@oddbird.net> On 09/28/2015 02:54 PM, Guido van Rossum wrote: > If you want to dumb down the feature so that foo?.bar.baz means just > (foo?.bar).baz then it's useless and I should just reject the PEP. I think you're right that in practice ?. and ?[ would probably be just fine, because the scope of their action is still quite limited. But even if they are rejected, I think a simple `??` or `or?` (or however it's spelled) operator to reduce the repetition of "x if x is not None else y" is worth consideration on its own merits. This operator is entirely unambiguous, and I think would be useful and frequently used, whether or not ?. and ?[ are added along with it. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From donald at stufft.io Mon Sep 28 23:06:32 2015 From: donald at stufft.io (Donald Stufft) Date: Mon, 28 Sep 2015 17:06:32 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: On September 28, 2015 at 4:56:44 PM, Guido van Rossum (guido at python.org) wrote: > On Mon, Sep 28, 2015 at 1:41 PM, Donald Stufft wrote: > > > On September 28, 2015 at 4:25:12 PM, Guido van Rossum (guido at python.org) > > wrote: > > > On Mon, Sep 28, 2015 at 1:15 PM, Donald Stufft wrote: > > > > > > > The ? Modifying additional attribute accesses beyond just the immediate > > > > one bothers me too and feels more ruby than python to me. > > > > > > > > > > Really? Have you thought about it? > > > > Not extensively, mostly this is a gut feeling. > > > > > > > > Suppose I have an object post which may be None or something with a tag > > > attribute which should be a string. And suppose I want to get the > > > lowercased tag, if the object exists, else None. > > > > > > This seems a perfect use case for writing post?.tag.lower() -- this > > > signifies that post may be None but if it exists, post.tag is not > > expected > > > to be None. So basically I want the equivalent of (post.tag.lower() if > > post > > > is not None else None). > > > > > > But if post?.tag.lower() were interpreted strictly as > > (post?.tag).lower(), > > > then I would have to write post?.tag?.lower?(), which is an abomination. > > > OTOH if post?.tag.lower() automatically meant post?.tag?.lower?() then I > > > would silently get no error when post exists but post.tag is None (which > > in > > > this example is an error). > > > > > > > Does ? propagate past a non None value? If it were post?.tag.name.lower() > > and post was not None, but tag was None would that be an error or would the > > ? propagate to the tag as well? > > > > I was trying to clarify that by saying that foo?.bar.baz means (foo.bar.baz > if foo is not None else None). IOW if tag was None that would be an error. > > The rule then is quite simple: each ? does exactly one None check and > divides the expression into exactly two branches -- one for the case where > the thing preceding ? is None and one for the case where it isn't. > Ok, that makes me feel less bad than my initial impression was that ? was going to modify all following things so that they were all implicitly ?. Just splitting it into two different branches seems OK. I?m not a big fan of the punctuation though. It took me a minute to realize that post?.tag.lower() was saying if post is None, not if post.tag is None and I feel like it?s easy to miss the ?, especially when combined with other punctuation.? ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA From bruce at leban.us Mon Sep 28 23:05:44 2015 From: bruce at leban.us (Bruce Leban) Date: Mon, 28 Sep 2015 14:05:44 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: On Mon, Sep 28, 2015 at 1:56 PM, Guido van Rossum wrote: > The rule then is quite simple: each ? does exactly one None check and > divides the expression into exactly two branches -- one for the case where > the thing preceding ? is None and one for the case where it isn't. > I think this is exactly the right rule (when combined with the previously stated rule that ?. ?() ?[] have the same precedence as the standard versions of those operators). --- Bruce Check out my new puzzle book: http://J.mp/ingToConclusions Get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Sep 28 21:47:13 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Sep 2015 19:47:13 +0000 (UTC) Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: Message-ID: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> On Monday, September 28, 2015 12:05 PM, Guido van Rossum wrote: >On Mon, Sep 28, 2015 at 10:38 AM, Carl Meyer wrote: >>"Propagating" refers to the proposed behavior where use of ?. or ?[ >>"propagates" through the following chain of operations. For example: >> >> x = foo?.bar.spam.eggs >> >>Where both `.spam` and `.eggs` would behave like `?.spam` and `?.eggs` >>(propagating None rather than raising AttributeError), simply because a >>`.?` had occurred earlier in the chain. So the above behaves differently >>from: >> >> temp = foo?.bar >> x = temp.spam.eggs >> >>Which raises questions about whether the propagation escapes >>parentheses, too: >> >> x = (foo?.bar).spam.eggs >> > >Oh, I see. That's evil. > >The correct behavior here is that "foo?.bar.spam.eggs" should mean the same as > > (None if foo is None else foo.bar.spam.eggs) > >(Stop until you understand that is *not* the same as either of the alternatives you describe.) > >I can see the confusion that led to the idea of "propagation" -- it probably comes from an attempt to define "foo?.bar" without reference to the context (in this case the relevant context is that it's followed by ".spam.eggs"). It would really help to have a complete spec, or at least a quick workthrough of how an expression gets parsed and compiled. I assume it's something like this: spam?.eggs.cheese becomes this pseudo-AST (I've skipped the loads and maybe some other stuff): Expr( value=Attribute( value=Attribute( value=Name(id='spam'), attr='eggs', uptalk=True), attr='cheese', uptalk=False)) ? which is then compiled as this pseudo-bytecode: LOAD_NAME 'spam' DUP_TOP POP_JUMP_IF_NONE :label LOAD_ATTR 'eggs' LOAD_ATTR 'cheese' :label I've invented a new opcode POP_JUMP_IF_NONE, but it should be clear what it does. I think it's clear how replacing spam with any other expression works, and how subscripting works. So the only question is whether understanding how .eggs.cheese becomes a pair of LOAD_ATTRs is sufficient to understand how ?.eggs.cheese becomes a JUMP_IF_NONE followed by the same pair of LOAD_ATTRs through the same two steps. I suppose the reference documentation wording is also important here, to explain that an uptalked attributeref or subscription short-circuits the whole primary. From guido at python.org Mon Sep 28 23:48:23 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 14:48:23 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Mon, Sep 28, 2015 at 12:47 PM, Andrew Barnert wrote: > On Monday, September 28, 2015 12:05 PM, Guido van Rossum > wrote: > >On Mon, Sep 28, 2015 at 10:38 AM, Carl Meyer wrote: > >>"Propagating" refers to the proposed behavior where use of ?. or ?[ > >>"propagates" through the following chain of operations. For example: > >> > >> x = foo?.bar.spam.eggs > >> > >>Where both `.spam` and `.eggs` would behave like `?.spam` and `?.eggs` > >>(propagating None rather than raising AttributeError), simply because a > >>`.?` had occurred earlier in the chain. So the above behaves differently > >>from: > >> > >> temp = foo?.bar > >> x = temp.spam.eggs > >> > >>Which raises questions about whether the propagation escapes > >>parentheses, too: > >> > >> x = (foo?.bar).spam.eggs > >> > > > >Oh, I see. That's evil. > > > >The correct behavior here is that "foo?.bar.spam.eggs" should mean the > same as > > > > (None if foo is None else foo.bar.spam.eggs) > > > >(Stop until you understand that is *not* the same as either of the > alternatives you describe.) > > > >I can see the confusion that led to the idea of "propagation" -- it > probably comes from an attempt to define "foo?.bar" without reference to > the context (in this case the relevant context is that it's followed by > ".spam.eggs"). > > > It would really help to have a complete spec, or at least a quick > workthrough of how an expression gets parsed and compiled. > Isn't the PEP author still planning to do that? But it hasn't happened yet. :-( > I assume it's something like this: > > spam?.eggs.cheese becomes this pseudo-AST (I've skipped the loads and > maybe some other stuff): > > Expr( > value=Attribute( > value=Attribute( > value=Name(id='spam'), attr='eggs', uptalk=True), > attr='cheese', uptalk=False)) > Hm, I think the problem is that this way of representing the tree encourages thinking that each attribute (with or without ?) can be treated on its own. ? which is then compiled as this pseudo-bytecode: > > LOAD_NAME 'spam' > DUP_TOP > POP_JUMP_IF_NONE :label > LOAD_ATTR 'eggs' > LOAD_ATTR 'cheese' > :label > > > I've invented a new opcode POP_JUMP_IF_NONE, but it should be clear what > it does. I think it's clear how replacing spam with any other expression > works, and how subscripting works. So the only question is whether > understanding how .eggs.cheese becomes a pair of LOAD_ATTRs is sufficient > to understand how ?.eggs.cheese becomes a JUMP_IF_NONE followed by the same > pair of LOAD_ATTRs through the same two steps. > To most people of course that's indecipherable mumbo-jumbo. :-) > I suppose the reference documentation wording is also important here, to > explain that an uptalked attributeref or subscription short-circuits the > whole primary. > Apparently clarifying that is the entire point of this thread. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano at ramalho.org Mon Sep 28 23:48:49 2015 From: luciano at ramalho.org (Luciano Ramalho) Date: Mon, 28 Sep 2015 18:48:49 -0300 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> Message-ID: Glyph tweeted yesterday that everyone should watch the "Nothing is Something" 35' talk by Sandi Metz at RailsConf 2015. It's great and, in a way, relevant to this discussion. https://www.youtube.com/watch?v=29MAL8pJImQ BTW, so far, `or?` is the least horrible token suggested, IMHO. I like the basic semantics, though. Cheers, Luciano On Mon, Sep 28, 2015 at 4:47 PM, Andrew Barnert via Python-ideas wrote: > On Monday, September 28, 2015 12:05 PM, Guido van Rossum wrote: >>On Mon, Sep 28, 2015 at 10:38 AM, Carl Meyer wrote: >>>"Propagating" refers to the proposed behavior where use of ?. or ?[ >>>"propagates" through the following chain of operations. For example: >>> >>> x = foo?.bar.spam.eggs >>> >>>Where both `.spam` and `.eggs` would behave like `?.spam` and `?.eggs` >>>(propagating None rather than raising AttributeError), simply because a >>>`.?` had occurred earlier in the chain. So the above behaves differently >>>from: >>> >>> temp = foo?.bar >>> x = temp.spam.eggs >>> >>>Which raises questions about whether the propagation escapes >>>parentheses, too: >>> >>> x = (foo?.bar).spam.eggs >>> >> >>Oh, I see. That's evil. >> >>The correct behavior here is that "foo?.bar.spam.eggs" should mean the same as >> >> (None if foo is None else foo.bar.spam.eggs) >> >>(Stop until you understand that is *not* the same as either of the alternatives you describe.) >> >>I can see the confusion that led to the idea of "propagation" -- it probably comes from an attempt to define "foo?.bar" without reference to the context (in this case the relevant context is that it's followed by ".spam.eggs"). > > > It would really help to have a complete spec, or at least a quick workthrough of how an expression gets parsed and compiled. > > I assume it's something like this: > > spam?.eggs.cheese becomes this pseudo-AST (I've skipped the loads and maybe some other stuff): > > Expr( > value=Attribute( > value=Attribute( > value=Name(id='spam'), attr='eggs', uptalk=True), > attr='cheese', uptalk=False)) > > > ? which is then compiled as this pseudo-bytecode: > > LOAD_NAME 'spam' > DUP_TOP > POP_JUMP_IF_NONE :label > LOAD_ATTR 'eggs' > LOAD_ATTR 'cheese' > :label > > > I've invented a new opcode POP_JUMP_IF_NONE, but it should be clear what it does. I think it's clear how replacing spam with any other expression works, and how subscripting works. So the only question is whether understanding how .eggs.cheese becomes a pair of LOAD_ATTRs is sufficient to understand how ?.eggs.cheese becomes a JUMP_IF_NONE followed by the same pair of LOAD_ATTRs through the same two steps. > > I suppose the reference documentation wording is also important here, to explain that an uptalked attributeref or subscription short-circuits the whole primary. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg From guido at python.org Mon Sep 28 23:49:27 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 14:49:27 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: On Mon, Sep 28, 2015 at 2:06 PM, Donald Stufft wrote: > I?m not a big fan of the punctuation though. It took me a minute to > realize that post?.tag.lower() was saying if post is None, not if post.tag > is None and I feel like it?s easy to miss the ?, especially when combined > with other punctuation. > But that's a different point (for the record I'm not a big fan of the ? either). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From niilos at gmx.com Tue Sep 29 00:10:34 2015 From: niilos at gmx.com (Niilos) Date: Tue, 29 Sep 2015 00:10:34 +0200 Subject: [Python-ideas] list as parameter for the split function Message-ID: <5609BADA.8060801@gmx.com> Hello everyone, I was wondering how to split a string with multiple separators. For instance, if I edit some subtitle file and I want the string '00:02:34,452 --> 00:02:37,927' to become ['00', '02', '34', '452', '00', '02', '37', '927'] I have to use split too much time and I didn't find a "clean" way to do it. I imagined the split function with an iterator as parameter. The string would be split each time its substring is in the iterator. Here is the syntax I considered for this : >>> '00:02:34,452 --> 00:02:37,927'.split([ ':', ' --> ', ',' ]) ['00', '02', '34', '452', '00', '02', '37', '927'] Is it a relevant idea ? What do you think about it ? Regards, Niilos. From rymg19 at gmail.com Tue Sep 29 00:23:27 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 28 Sep 2015 17:23:27 -0500 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: <5609BADA.8060801@gmx.com> References: <5609BADA.8060801@gmx.com> Message-ID: import re parts = re.split(':|(-->)|,', '00:02:34...') On Mon, Sep 28, 2015 at 5:10 PM, Niilos wrote: > Hello everyone, > > I was wondering how to split a string with multiple separators. > For instance, if I edit some subtitle file and I want the string > '00:02:34,452 --> 00:02:37,927' to become ['00', '02', '34', '452', '00', > '02', '37', '927'] I have to use split too much time and I didn't find a > "clean" way to do it. > I imagined the split function with an iterator as parameter. The string > would be split each time its substring is in the iterator. > > Here is the syntax I considered for this : > > >>> '00:02:34,452 --> 00:02:37,927'.split([ ':', ' --> ', ',' ]) > ['00', '02', '34', '452', '00', '02', '37', '927'] > > Is it a relevant idea ? What do you think about it ? > > Regards, > Niilos. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From emile at fenx.com Tue Sep 29 00:23:47 2015 From: emile at fenx.com (Emile van Sebille) Date: Mon, 28 Sep 2015 15:23:47 -0700 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: <5609BADA.8060801@gmx.com> References: <5609BADA.8060801@gmx.com> Message-ID: On 9/28/2015 3:10 PM, Niilos wrote: > '00:02:34,452 --> 00:02:37,927'.split([ ':', ' --> ', ',' ]) '00:02:34,452 --> 00:02:37,927'.replace(",",":").replace(" --> ",":").split(":") Emile From p.f.moore at gmail.com Tue Sep 29 00:24:26 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 28 Sep 2015 23:24:26 +0100 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: <5609BADA.8060801@gmx.com> References: <5609BADA.8060801@gmx.com> Message-ID: On 28 September 2015 at 23:10, Niilos wrote: > I was wondering how to split a string with multiple separators. > For instance, if I edit some subtitle file and I want the string > '00:02:34,452 --> 00:02:37,927' to become ['00', '02', '34', '452', '00', > '02', '37', '927'] I have to use split too much time and I didn't find a > "clean" way to do it. You can use re.split: >>> re.split(r':|,| --> ', '00:02:34,452 --> 00:02:37,927') ['00', '02', '34', '452', '00', '02', '37', '927'] Paul From rymg19 at gmail.com Tue Sep 29 00:26:03 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 28 Sep 2015 17:26:03 -0500 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: References: <5609BADA.8060801@gmx.com> Message-ID: Really, you could also just use re.match: import re pat = re.compile(r'(\d\d):(\d\d):(\d\d),(\d{3}) --> (\d\d):(\d\d):(\d\d),(\d{3})') def parse(string): return pat.match(string) ... print(parse('00:02:34,452 --> 00:02:37,927')) # prints ['00', '02', '34', '452', '00', '02', '37', '927'] That way, if the input is invalid, `None` will be returned, so you have free error checking (sort of). On Mon, Sep 28, 2015 at 5:23 PM, Ryan Gonzalez wrote: > import re > parts = re.split(':|(-->)|,', '00:02:34...') > > > > On Mon, Sep 28, 2015 at 5:10 PM, Niilos wrote: > >> Hello everyone, >> >> I was wondering how to split a string with multiple separators. >> For instance, if I edit some subtitle file and I want the string >> '00:02:34,452 --> 00:02:37,927' to become ['00', '02', '34', '452', '00', >> '02', '37', '927'] I have to use split too much time and I didn't find a >> "clean" way to do it. >> I imagined the split function with an iterator as parameter. The string >> would be split each time its substring is in the iterator. >> >> Here is the syntax I considered for this : >> >> >>> '00:02:34,452 --> 00:02:37,927'.split([ ':', ' --> ', ',' ]) >> ['00', '02', '34', '452', '00', '02', '37', '927'] >> >> Is it a relevant idea ? What do you think about it ? >> >> Regards, >> Niilos. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > Ryan > [ERROR]: Your autotools build scripts are 200 lines longer than your > program. Something?s wrong. > http://kirbyfan64.github.io/ > > -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something?s wrong. http://kirbyfan64.github.io/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Sep 29 00:27:23 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 29 Sep 2015 08:27:23 +1000 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: <5609BADA.8060801@gmx.com> References: <5609BADA.8060801@gmx.com> Message-ID: On Tue, Sep 29, 2015 at 8:10 AM, Niilos wrote: > I was wondering how to split a string with multiple separators. > For instance, if I edit some subtitle file and I want the string > '00:02:34,452 --> 00:02:37,927' to become ['00', '02', '34', '452', '00', > '02', '37', '927'] I have to use split too much time and I didn't find a > "clean" way to do it. > I imagined the split function with an iterator as parameter. The string > would be split each time its substring is in the iterator. > > Here is the syntax I considered for this : > >>>> '00:02:34,452 --> 00:02:37,927'.split([ ':', ' --> ', ',' ]) > ['00', '02', '34', '452', '00', '02', '37', '927'] > > Is it a relevant idea ? What do you think about it ? Two possibilities: 1) Replace all separators with the same one. '00:02:34,452 --> 00:02:37,927'.replace(",",":").replace(" --> ",":").split(":") 2) Use a regular expression. re.split(":|,| --> ",'00:02:34,452 --> 00:02:37,927') # or working the other way: find all the digit strings re.findall("[0-9]+",'00:02:34,452 --> 00:02:37,927') You could also consider a more full parser; presumably splitting into strings is just the first step. I don't have anything handy in Python, but there would be ways of doing the whole thing in less steps. ChrisA From tjreedy at udel.edu Tue Sep 29 00:40:41 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 28 Sep 2015 18:40:41 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> <56096F09.40804@mail.de> <3819F6B1-3221-41E2-9103-62B71CBA7708@yahoo.com> Message-ID: On 9/28/2015 3:38 PM, Emile van Sebille wrote: > On 9/28/2015 10:24 AM, Andrew Barnert via Python-ideas wrote: >> On Sep 28, 2015, at 09:47, Sven R. Kunze >> wrote: > > >>> I wouldn't make a mountain out of a molehill. Other existing >>> operators have the same issue. >> >> Which other keywords or symbols may be either a binary operator or >> part of a ternary operator depending on context? > > These come to mind: > > a = b = c > a < b < c These are chained comparisons, which get separated, not ternary operators. a < b = c < e > f in g is also syntactically valid, and I don't think anything is gained by calling it a pentanary operator. -- Terry Jan Reedy From abarnert at yahoo.com Tue Sep 29 00:45:11 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Sep 2015 15:45:11 -0700 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: <5609BADA.8060801@gmx.com> References: <5609BADA.8060801@gmx.com> Message-ID: <35EC0367-E741-4640-96FD-4B709F81550C@yahoo.com> On Sep 28, 2015, at 15:10, Niilos wrote: > > Hello everyone, > > I was wondering how to split a string with multiple separators. > For instance, if I edit some subtitle file and I want the string '00:02:34,452 --> 00:02:37,927' to become ['00', '02', '34', '452', '00', '02', '37', '927'] I have to use split too much time and I didn't find a "clean" way to do it. > I imagined the split function with an iterator as parameter. The string would be split each time its substring is in the iterator. As a side note, a list is not an Iterator. It's an iterable, but an Iterator is a special kind of iterable that only allows one pass, which is definitely not what you want here. In fact, what you probably want is a sequence (or maybe just a container, since the only thing you want to do is test "in"). Also, the way you've defined this ("each time its substring is in the iterator") is either ambiguous, or inherently expensive, depending on how you read it. And once you work out what you actually mean, it's hard to express it better than as a regular expression, which is why half a dozen people jumped to that answer. From srkunze at mail.de Tue Sep 29 00:52:40 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 29 Sep 2015 00:52:40 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> Message-ID: <5609C4B8.5080501@mail.de> On 28.09.2015 23:48, Luciano Ramalho wrote: > Glyph tweeted yesterday that everyone should watch the "Nothing is > Something" 35' talk by Sandi Metz at RailsConf 2015. It's great and, > in a way, relevant to this discussion. > > https://www.youtube.com/watch?v=29MAL8pJImQ Nice watch. It's completely in line with our internal guidelines. Great to see that people with practical experience come to the same conclusion. Best, Sven From guido at python.org Tue Sep 29 01:03:22 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 16:03:22 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> <56096F09.40804@mail.de> <3819F6B1-3221-41E2-9103-62B71CBA7708@yahoo.com> Message-ID: On Mon, Sep 28, 2015 at 3:40 PM, Terry Reedy wrote: > On 9/28/2015 3:38 PM, Emile van Sebille wrote: > >> On 9/28/2015 10:24 AM, Andrew Barnert via Python-ideas wrote: >> >>> On Sep 28, 2015, at 09:47, Sven R. Kunze >>> wrote: >>> >> >> >> I wouldn't make a mountain out of a molehill. Other existing >>>> operators have the same issue. >>>> >>> >>> Which other keywords or symbols may be either a binary operator or >>> part of a ternary operator depending on context? >>> >> >> These come to mind: >> >> a = b = c >> a < b < c >> > > These are chained comparisons, which get separated, not ternary operators. > a < b = c < e > f in g is also syntactically valid, and I don't think > anything is gained by calling it a pentanary operator. > But a < b < c is an excellent example of something that cannot be mindlessly refactored into (a < b) < c. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Sep 29 02:38:39 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 28 Sep 2015 20:38:39 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 9/28/2015 5:48 PM, Luciano Ramalho wrote: > Glyph tweeted yesterday that everyone should watch the "Nothing is > Something" 35' talk by Sandi Metz at RailsConf 2015. It's great and, > in a way, relevant to this discussion. > > https://www.youtube.com/watch?v=29MAL8pJImQ I understood Metz as advocation avoidig the nil (None) problem by giving every class an 'active nothing' that has the methods of the class. We do that for most builtin classes -- 0, (), {}, etc. She also used the identity function with a particular signature in various roles. -- Terry Jan Reedy From random832 at fastmail.com Tue Sep 29 04:22:09 2015 From: random832 at fastmail.com (Random832) Date: Mon, 28 Sep 2015 22:22:09 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> Message-ID: <1443493329.2298568.396067929.05FAD0A3@webmail.messagingengine.com> On Mon, Sep 28, 2015, at 17:48, Guido van Rossum wrote: > > Expr( > > value=Attribute( > > value=Attribute( > > value=Name(id='spam'), attr='eggs', uptalk=True), > > attr='cheese', uptalk=False)) > > > > Hm, I think the problem is that this way of representing the tree > encourages thinking that each attribute (with or without ?) can be > treated > on its own. How else would you represent it? Maybe some sort of expression that represents a _list_ of attribute/item/call "operators" that are each applied, and if one of them results in none and has uptalk=True it can yield early. Something like... AtomExpr(atom=Name('spam'), trailers=[Attribute('eggs', uptalk=True), Attribute('cheese', uptalk=False)]) For a more complex example: a?.b.c?[12](34).f(56)?(78) AtomExpr(Name('a'), [ Attribute('b', True), Attribute('c', False), Subscript(12, True), Call([34], False), Attribute('f', False), Call([56], False), Call([78], True)]) I almost sent this with it called "Thing", but I checked the grammar and found an element this thing actually maps to. From guido at python.org Tue Sep 29 05:11:00 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 20:11:00 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <1443493329.2298568.396067929.05FAD0A3@webmail.messagingengine.com> References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> <1443493329.2298568.396067929.05FAD0A3@webmail.messagingengine.com> Message-ID: I would at least define different classes for the uptalk versions. But my main complaint is using the parse tree as a spec at all -- it has way too much noise for a clear description. We don't describe a < b < c by first translating it to (Comparison(a, Comparison(b, c, chained=False), chained=True) either: the reference manual uses a postfix * (i.e. repetition) operator to describe chained comparisons -- while for other operators it favors a recursive definition. On Mon, Sep 28, 2015 at 7:22 PM, Random832 wrote: > On Mon, Sep 28, 2015, at 17:48, Guido van Rossum wrote: > > > Expr( > > > value=Attribute( > > > value=Attribute( > > > value=Name(id='spam'), attr='eggs', uptalk=True), > > > attr='cheese', uptalk=False)) > > > > > > > Hm, I think the problem is that this way of representing the tree > > encourages thinking that each attribute (with or without ?) can be > > treated > > on its own. > > How else would you represent it? Maybe some sort of expression that > represents a _list_ of attribute/item/call "operators" that are each > applied, and if one of them results in none and has uptalk=True it can > yield early. > > Something like... > > AtomExpr(atom=Name('spam'), trailers=[Attribute('eggs', uptalk=True), > Attribute('cheese', uptalk=False)]) > > For a more complex example: > > a?.b.c?[12](34).f(56)?(78) > > AtomExpr(Name('a'), [ > Attribute('b', True), > Attribute('c', False), > Subscript(12, True), > Call([34], False), > Attribute('f', False), > Call([56], False), > Call([78], True)]) > > I almost sent this with it called "Thing", but I checked the grammar and > found an element this thing actually maps to. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Sep 29 05:43:18 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 29 Sep 2015 13:43:18 +1000 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: <5609BADA.8060801@gmx.com> References: <5609BADA.8060801@gmx.com> Message-ID: <20150929034316.GO23642@ando.pearwood.info> On Tue, Sep 29, 2015 at 12:10:34AM +0200, Niilos wrote: > Hello everyone, > > I was wondering how to split a string with multiple separators. > For instance, if I edit some subtitle file and I want the string > '00:02:34,452 --> 00:02:37,927' to become ['00', '02', '34', '452', > '00', '02', '37', '927'] I have to use split too much time and I didn't > find a "clean" way to do it. > I imagined the split function with an iterator as parameter. The string > would be split each time its substring is in the iterator. > > Here is the syntax I considered for this : > > >>> '00:02:34,452 --> 00:02:37,927'.split([ ':', ' --> ', ',' ]) > ['00', '02', '34', '452', '00', '02', '37', '927'] > > Is it a relevant idea ? What do you think about it ? Quite a few string methods take multiple arguments, e.g.: py> "spam".startswith(("a", "+", "sp")) True and I've often wished that split would be one of them. The substring argument could accept a string (as it does now) or a tuple of strings. There are other solutions, but they have issues: (1) Writing your own custom string mini-parser and getting it right is harder than it sounds. Certainly its not simple enough to reinvent this particular tool each time you want it. (2) Using replace to change all the substrings to one: py> text = "aaa,bbb ccc;ddd,eee fff" py> text.replace(",", " ").replace(";", " ").split() ['aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff'] works well enough for simple cases, but if you have a lot of text, having to call replace multiple times can be expensive. (3) Using a regular expression is probably the "right" answer, at least from a comp sci theorectical perspective. This is precisely the sort of thing that regexes are designed for. Unfortunately, regex syntax is itself a programming language[1], and a particularly cryptic and unforgiving one, so even quite experienced coders can have trouble. At first it seems easy: py> re.split(r";|-|~", "aaa~bbb-ccc;ddd;eee") ['aaa', 'bbb', 'ccc', 'ddd', 'eee'] but then seemingly minor changes makes it misbehave: py> re.split(r";|-|^", "aaa^bbb-ccc^ddd;eee") ['aaa^bbb', 'ccc^ddd', 'eee'] py> re.split(r";|-|.", "aaa.bbb-ccc;ddd;eee") ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] The solution is to escape the metacharacters, but people who aren't familiar with regexes won't necessarily know which they are. So really, in my opinion, there is no good built-in solution to the *general* problem of splitting a string on multiple arbitrary substrings. Perhaps str.split can act as an interface to the re module, automatically escaping the substrings: # Pseudo-implimentation def split(self, substrings, maxsplit=None): if isinstance(substrings, str): # use the current implementation ... elif isinstance(substrings, tuple): regex = '|'.join(re.escape(s) for s in substrings) return re.split(regex, self, maxsplit) [1] Albeit not a Turing Complete one, at least not Python's version. -- Steve From jdhardy at gmail.com Tue Sep 29 06:04:54 2015 From: jdhardy at gmail.com (Jeff Hardy) Date: Mon, 28 Sep 2015 21:04:54 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: On Mon, Sep 28, 2015 at 1:56 PM, Guido van Rossum wrote: > On Mon, Sep 28, 2015 at 1:41 PM, Donald Stufft wrote: > >> On September 28, 2015 at 4:25:12 PM, Guido van Rossum (guido at python.org) >> wrote: >> > On Mon, Sep 28, 2015 at 1:15 PM, Donald Stufft wrote: >> > >> > > The ? Modifying additional attribute accesses beyond just the >> immediate >> > > one bothers me too and feels more ruby than python to me. >> > > >> > >> > Really? Have you thought about it? >> >> Not extensively, mostly this is a gut feeling. >> >> > >> > Suppose I have an object post which may be None or something with a tag >> > attribute which should be a string. And suppose I want to get the >> > lowercased tag, if the object exists, else None. >> > >> > This seems a perfect use case for writing post?.tag.lower() -- this >> > signifies that post may be None but if it exists, post.tag is not >> expected >> > to be None. So basically I want the equivalent of (post.tag.lower() if >> post >> > is not None else None). >> > >> > But if post?.tag.lower() were interpreted strictly as >> (post?.tag).lower(), >> > then I would have to write post?.tag?.lower?(), which is an abomination. >> > OTOH if post?.tag.lower() automatically meant post?.tag?.lower?() then I >> > would silently get no error when post exists but post.tag is None >> (which in >> > this example is an error). >> > >> >> Does ? propagate past a non None value? If it were post?.tag.name.lower() >> and post was not None, but tag was None would that be an error or would the >> ? propagate to the tag as well? >> > > I was trying to clarify that by saying that foo?.bar.baz means > (foo.bar.baz if foo is not None else None). IOW if tag was None that would > be an error. > > The rule then is quite simple: each ? does exactly one None check and > divides the expression into exactly two branches -- one for the case where > the thing preceding ? is None and one for the case where it isn't. > This whole line of discussion is why I'd prefer the PEP be split to have ?? in one and ?., ?[, etc. in another (the thread I linked isn't even the longest one discussing the associativity - there were many that preceded it). I agree that the short circuit behaviour is the only one that makes any sense, but I also don't want to see the very useful ?? operator lost because of discussions over or implementation difficulties of ?. or ?[. And if it's going to be done anyway, I'd to see ?( as well. - Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.com Tue Sep 29 06:22:38 2015 From: random832 at fastmail.com (Random832) Date: Tue, 29 Sep 2015 00:22:38 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> <1443493329.2298568.396067929.05FAD0A3@webmail.messagingengine.com> Message-ID: Guido van Rossum writes: > I would at least define different classes for the uptalk versions. > > But my main complaint is using the parse tree as a spec at all Like I said, I actually came up with that structure *before* seeing that it mirrored a grammar element - it honestly seems like the most natural way to embody the fact that evaluating it requires the whole context as a unit and can short-circuit halfway through the list, depending on if the 'operator' at that position is an uptalk version. The evaluation given this structure can be described in pseudocode: def evaluate(expr): value = expr.atom.evaluate() for trailer in trailers: if trailer.uptalk and value is None: return None value = trailer.evaluate_step(value) The code generation could work the same way, iterating over this and generating whatever instructions each trailer implies. In CPython, The difference between the uptalk and non-uptalk version would be that immediately after the left-hand value is on the stack, insert opcodes: DUP_TOP LOAD_CONST(None) COMPARE_OP(is) POP_JUMP_IF_TRUE(end), with the jump being to the location where the final value of the expression is expected on the stack. Assuming I'm understanding the meaning of each opcode correctly, this sequence would basically be equivalent to a hypothetical JUMP_IF_NONE opcode. I don't think a recursive definition for the structure would work, because evaluating / code-generating an uptalk operator needs to have the top-level expression in order to escape from it to yield None. From guido at python.org Tue Sep 29 06:30:34 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Sep 2015 21:30:34 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> <1443493329.2298568.396067929.05FAD0A3@webmail.messagingengine.com> Message-ID: Sounds like we're in violent agreement. :-) On Monday, September 28, 2015, Random832 wrote: > Guido van Rossum > writes: > > I would at least define different classes for the uptalk versions. > > > > But my main complaint is using the parse tree as a spec at all > > Like I said, I actually came up with that structure *before* seeing that > it mirrored a grammar element - it honestly seems like the most natural > way to embody the fact that evaluating it requires the whole context as > a unit and can short-circuit halfway through the list, depending on if > the 'operator' at that position is an uptalk version. > > The evaluation given this structure can be described in pseudocode: > > def evaluate(expr): > value = expr.atom.evaluate() > for trailer in trailers: > if trailer.uptalk and value is None: > return None > value = trailer.evaluate_step(value) > > The code generation could work the same way, iterating over this and > generating whatever instructions each trailer implies. In CPython, The > difference between the uptalk and non-uptalk version would be that > immediately after the left-hand value is on the stack, insert opcodes: > DUP_TOP LOAD_CONST(None) COMPARE_OP(is) POP_JUMP_IF_TRUE(end), with the > jump being to the location where the final value of the expression is > expected on the stack. > > Assuming I'm understanding the meaning of each opcode correctly, this > sequence would basically be equivalent to a hypothetical JUMP_IF_NONE > opcode. > > I don't think a recursive definition for the structure would work, > because evaluating / code-generating an uptalk operator needs to have > the top-level expression in order to escape from it to yield None. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From 375956667 at qq.com Tue Sep 29 08:31:38 2015 From: 375956667 at qq.com (=?gb18030?B?1cXJ8sX0?=) Date: Tue, 29 Sep 2015 14:31:38 +0800 Subject: [Python-ideas] Maybe python should support arrow syntax for easier use async call ? Message-ID: # The example works with tornado dev verison & python3.5 import tornado from tornado.httpclient import AsyncHTTPClient from tornado.concurrent import Future from tornado.gen import convert_yielded from functools import wraps Future.__or__ = Future.add_done_callback def future(func): @wraps(func) def _(*args, **kwds): return convert_yielded(func(*args, **kwds)) return _ ############## @future async def ping(url): httpclient = AsyncHTTPClient() r = await httpclient.fetch(url) return r.body.decode('utf-8') ping("http://baidu.com") | ( lambda r:print(r.result()) ) """ Maybe python should support arrow syntax for easier use async call ? Now lambda only can write one line and must have parentheses ... FOR EXAMPLE ping("http://baidu.com") | r -> print(r.result()) print("something else") I saw some discuss in https://wiki.python.org/moin/AlternateLambdaSyntax """ tornado.ioloop.IOLoop.instance().start() From abarnert at yahoo.com Tue Sep 29 08:40:12 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 28 Sep 2015 23:40:12 -0700 Subject: [Python-ideas] Maybe python should support arrow syntax for easier use async call ? In-Reply-To: References: Message-ID: From a quick glance, it looks like you're converting from coroutines back to callbacks just so you can partially hide the callbacks. Why not just stick with coroutines? Compare: ping("http://baidu.com") | r -> print(r.result()) print("something else") r = await ping("http://baidu.com") print(r.result()) print("something else") And this doesn't require a new operator, or multiline lambdas, or a new operator that does its thing and also introduces a multiline lambda, or anything else. Sent from my iPhone > On Sep 28, 2015, at 23:31, ??? <375956667 at qq.com> wrote: > > # The example works with tornado dev verison & python3.5 > > > import tornado > from tornado.httpclient import AsyncHTTPClient > from tornado.concurrent import Future > from tornado.gen import convert_yielded > from functools import wraps > > Future.__or__ = Future.add_done_callback > > def future(func): > @wraps(func) > def _(*args, **kwds): > return convert_yielded(func(*args, **kwds)) > return _ > > > > ############## > > @future > async def ping(url): > httpclient = AsyncHTTPClient() > r = await httpclient.fetch(url) > return r.body.decode('utf-8') > > > ping("http://baidu.com") | ( > lambda r:print(r.result()) > ) > > """ > > Maybe python should support arrow syntax for easier use async call ? > > Now lambda only can write one line and must have parentheses ... > > FOR EXAMPLE > > ping("http://baidu.com") | r -> > print(r.result()) > print("something else") > > > I saw some discuss in https://wiki.python.org/moin/AlternateLambdaSyntax > """ > > > tornado.ioloop.IOLoop.instance().start() > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Sep 29 07:55:44 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 29 Sep 2015 18:55:44 +1300 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> Message-ID: <560A27E0.7080800@canterbury.ac.nz> Guido van Rossum wrote: > On Mon, Sep 28, 2015 at 12:47 PM, Andrew Barnert > wrote: > > Expr( > value=Attribute( > value=Attribute( > value=Name(id='spam'), attr='eggs', uptalk=True), > attr='cheese', uptalk=False)) > > Hm, I think the problem is that this way of representing the tree > encourages thinking that each attribute (with or without ?) can be > treated on its own. It's hard to think of any other way of representing this in an AST that makes the short-circuiting behaviour any clearer. I suspect that displaying an AST isn't really going to be helpful as a way of documenting the semantics. Because the semantics aren't really in the AST itself, they're in the compiler code that interprets the AST. -- Greg From ncoghlan at gmail.com Tue Sep 29 10:09:50 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Sep 2015 18:09:50 +1000 Subject: [Python-ideas] Using `or?` as the null coalescing operator In-Reply-To: References: <6C2E5579-42A0-423F-AB8C-01B49FA59D67@gmail.com> Message-ID: On 28 September 2015 at 20:50, Koos Zevenhoven wrote: > On Mon, Sep 28, 2015 at 10:13 AM, Nick Coghlan wrote: >> On 25 September 2015 at 09:07, Alessio Bogon wrote: >>> I really like PEP 0505. The only thing that does not convince me is the `??` operator. I would like to know what you think of an alternative like `or?`: >>> >>> a_list = some_list or? [] >>> a_dict = some_dict or? {} >>> > > And have the following syntax options been considered? > > a_list = some_list else [] In addition to the syntactic ambiguity Ryan notes, there's no hint here that we're using "some_list is not None" as the condition rather than "bool(some_list)" > a_list = some_list or [] if None This one isn't supportable at the language grammar level - by the time we get to the "if" token, the parser will have already interpreted the first part as "some_list or []". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mal at egenix.com Tue Sep 29 11:49:17 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 29 Sep 2015 11:49:17 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> Message-ID: <560A5E9D.3070808@egenix.com> On 28.09.2015 23:49, Guido van Rossum wrote: > On Mon, Sep 28, 2015 at 2:06 PM, Donald Stufft wrote: > >> I?m not a big fan of the punctuation though. It took me a minute to >> realize that post?.tag.lower() was saying if post is None, not if post.tag >> is None and I feel like it?s easy to miss the ?, especially when combined >> with other punctuation. >> > > But that's a different point (for the record I'm not a big fan of the ? > either). Me neither. The proposal simply doesn't have the right balance between usefulness and complexity added to the language (esp. for new Python programmers to learn in order to be able to read a Python program). In practice, you can very often write "x or y" instead of having to use "x if x is None else y", simply because you're not only interested in catching the x is None case, but also want to override an empty string or sequence value with a default. If you really need to specifically check for None, "x if x is None else y" is way more expressive than "x ?? y". For default parameters with mutable types as values, I usually write: def func(x=None): if x is None: x = [] ... IMO, that's better than any of the above, but perhaps that's just because I don't believe in the "write everything in a single line" pattern as something we should strive for in Python. The other variants (member and index access) look like typos to me ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 29 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 22 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From abarnert at yahoo.com Tue Sep 29 12:11:30 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 29 Sep 2015 03:11:30 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560A27E0.7080800@canterbury.ac.nz> References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> <560A27E0.7080800@canterbury.ac.nz> Message-ID: On Sep 28, 2015, at 22:55, Greg Ewing wrote: > > Guido van Rossum wrote: >> On Mon, Sep 28, 2015 at 12:47 PM, Andrew Barnert > wrote: >> Expr( >> value=Attribute( >> value=Attribute( >> value=Name(id='spam'), attr='eggs', uptalk=True), >> attr='cheese', uptalk=False)) >> Hm, I think the problem is that this way of representing the tree encourages thinking that each attribute (with or without ?) can be treated on its own. > > It's hard to think of any other way of representing this in > an AST that makes the short-circuiting behaviour any clearer. > > I suspect that displaying an AST isn't really going to be > helpful as a way of documenting the semantics. Because the > semantics aren't really in the AST itself, they're in the > compiler code that interprets the AST. That's why I gave both an AST and the bytecode (and how it differs from the AST and bytecode with non-uptalked attribution). I think that makes it obvious and unambiguous what the semantics are, to anyone who knows how the compiler handles attribution ASTs, and understands the resulting bytecode. Of course, as Guido points out, that "anyone who..." is a pretty restricted set, so maybe this wasn't as useful as I intended, and we have to wait for someone to write up the details in a way that's still unambiguous, but also human-friendly. From steve at pearwood.info Tue Sep 29 14:43:39 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 29 Sep 2015 22:43:39 +1000 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> Message-ID: <20150929124339.GS23642@ando.pearwood.info> On Mon, Sep 28, 2015 at 01:54:09PM -0700, Guido van Rossum wrote: > If you want to dumb down the feature so that foo?.bar.baz means just > (foo?.bar).baz then it's useless and I should just reject the PEP. In case anyone missed it, according to the PEP author Mark Haase, that's the behaviour of Dart, and it is useless: "Your interpretation of Dart's semantics is correct, and I agree that's absolutely the wrong way to do it. C# does have the short-circuit semantics that you're looking for." https://mail.python.org/pipermail/python-ideas/2015-September/036495.html -- Steve From ericsnowcurrently at gmail.com Tue Sep 29 15:40:21 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 29 Sep 2015 07:40:21 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560A5E9D.3070808@egenix.com> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> Message-ID: On Tue, Sep 29, 2015 at 3:49 AM, M.-A. Lemburg wrote: > On 28.09.2015 23:49, Guido van Rossum wrote: >> But that's a different point (for the record I'm not a big fan of the ? >> either). > > Me neither. Same here. > > The proposal simply doesn't have the right balance between usefulness > and complexity added to the language (esp. for new Python programmers > to learn in order to be able to read a Python program). +1 > > In practice, you can very often write "x or y" instead of > having to use "x if x is None else y", simply because you're > not only interested in catching the x is None case, but also > want to override an empty string or sequence value with > a default. If you really need to specifically check for None, > "x if x is None else y" is way more expressive than "x ?? y". > > For default parameters with mutable types as values, > I usually write: > > def func(x=None): > if x is None: > x = [] > ... I do the same. It has the right amount of explicitness and makes the default-case branch more obvious (subjectively, of course) than the proposed alternative: def func(x=None): x = x ?? [] ... > > IMO, that's better than any of the above, but perhaps that's > just because I don't believe in the "write everything > in a single line" pattern as something we should strive > for in Python. Yeah, the language has been pretty successful at striking the right balance here. IMO, the proposed syntax doesn't pay off. -eric From p.f.moore at gmail.com Tue Sep 29 16:56:23 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 29 Sep 2015 15:56:23 +0100 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> Message-ID: On 29 September 2015 at 14:40, Eric Snow wrote: >> For default parameters with mutable types as values, >> I usually write: >> >> def func(x=None): >> if x is None: >> x = [] >> ... > > I do the same. It has the right amount of explicitness and makes the > default-case branch more obvious (subjectively, of course) than the > proposed alternative: > > def func(x=None): > x = x ?? [] Looking at those two cases in close proximity like that, I have to say that the explicit if statement wins hands down. But it's not quite as obvious with multiple arguments where the target isn't the same as the parameter (for example with a constructor): def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): if vertices is None: self.vertices = [] else: self.vertices = vertices if edges is None: self.edges = [] else: self.edges = edges if weights is None: self.weights = {} else: self.weights = weights if source_nodes is None: self.source_nodes = [] else: self.source_nodes = source_nodes vs def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): self.vertices = vertices or? [] self.edges = edges or? [] self.weights = weights or? {} self.source_nodes = source_nodes or? [] Having said all of that, short circuiting is not important here, so def default(var, dflt): if var is None: return dflt return var def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): self.vertices = default(vertices, []) self.edges = default(edges, []) self.weights = default(weights, {}) self.source_nodes = default(source_nodes, []) is also an option. In this case, my preference is probably (1) a default() function, (2) or?, (3) multi-line if. The default() function approach can be used for cases where the condition is something *other* than "is None" so that one edges ahead of or? because it's more flexible... (although (1) wouldn't be an option if short-circuiting really mattered...) In practice, of course, I never write a default() function at the moment, I just use multi-line ifs. Whether that means I'd use an or? operator, I don't know. Probably - but I'd likely consider it a bit of a "too many ways of doing the same thing" wart at the same time... Paul From chris.barker at noaa.gov Tue Sep 29 17:30:45 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 29 Sep 2015 08:30:45 -0700 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: <20150929034316.GO23642@ando.pearwood.info> References: <5609BADA.8060801@gmx.com> <20150929034316.GO23642@ando.pearwood.info> Message-ID: On Mon, Sep 28, 2015 at 8:43 PM, Steven D'Aprano wrote: > (3) Using a regular expression is probably the "right" answer, at least > from a comp sci theorectical perspective. This is precisely the sort of > thing that regexes are designed for. Unfortunately, regex syntax is > itself a programming language[1], and a particularly cryptic and > unforgiving one, so even quite experienced coders can have trouble. > indeed -- we all know the old maxim: "I had a problem, and thought "I know, I'll use regular expressions" -- now I have two problems. And the Python "obvious way to do it" has always been for simple string manipulation, see if what you need is in a string method before you bring out the big guns of REs After all, if "use REs" was the answer to simple string manipulation problems, the string object would have a lot fewer methods. So: I've frequently had this use-case, too -- it would be a nice enhancement that would had substantial utility to strings. Whether it used an re under the hood or not should be an implementation detail. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Sep 29 17:36:27 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 29 Sep 2015 08:36:27 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <5609993E.9010103@mail.de> References: <560966C1.1040704@mail.de> <20150928163733.GN23642@ando.pearwood.info> <56096F09.40804@mail.de> <3819F6B1-3221-41E2-9103-62B71CBA7708@yahoo.com> <5609993E.9010103@mail.de> Message-ID: On Mon, Sep 28, 2015 at 12:47 PM, Sven R. Kunze wrote: > I've seen experienced coworkers rather adding superfluous pairs of > parentheses just to make sure or because they still don't know better. nothing wrong with superfluous parentheses -- it makes it clear and it's more robust in the face of refactoring. It fact, my answer to "precedence could be confusing here" is "use an extra parentheses and don't worry about it." -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rob.cliffe at btinternet.com Tue Sep 29 18:20:45 2015 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 29 Sep 2015 17:20:45 +0100 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> Message-ID: <560ABA5D.1070602@btinternet.com> On 29/09/2015 15:56, Paul Moore wrote: > On 29 September 2015 at 14:40, Eric Snow wrote: >>> For default parameters with mutable types as values, >>> I usually write: >>> >>> def func(x=None): >>> if x is None: >>> x = [] >>> ... >> I do the same. It has the right amount of explicitness and makes the >> default-case branch more obvious (subjectively, of course) than the >> proposed alternative: >> >> def func(x=None): >> x = x ?? [] > Looking at those two cases in close proximity like that, I have to say > that the explicit if statement wins hands down. > > But it's not quite as obvious with multiple arguments where the target > isn't the same as the parameter (for example with a constructor): > > def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): > if vertices is None: > self.vertices = [] > else: > self.vertices = vertices > if edges is None: > self.edges = [] > else: > self.edges = edges > if weights is None: > self.weights = {} > else: > self.weights = weights > if source_nodes is None: > self.source_nodes = [] > else: > self.source_nodes = source_nodes > vs > > def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): > self.vertices = vertices or? [] > self.edges = edges or? [] > self.weights = weights or? {} > self.source_nodes = source_nodes or? [] > > Having said all of that, short circuiting is not important here, so > > def default(var, dflt): > if var is None: > return dflt > return var > > def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): > self.vertices = default(vertices, []) > self.edges = default(edges, []) > self.weights = default(weights, {}) > self.source_nodes = default(source_nodes, []) > > is also an option. > > Why not def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): self.vertices = vertices if vertices is not None else [] self.edges = edges if edges is not None else [] self.weights = weights if weights is not None else {} self.source_nodes = source_nodes if source_nodes is not None else [] Completely explicit. Self-contained (you don't need to look up a helper function). Reasonably compact (at least vertically). Easy to make a change to one of the lines if the logic of that line changes. Doesn't need a language change. And if you align the lines (as I have attempted to, although different proportional fonts may make it look ragged), it highlights the common structure of the lines *and* their differences (you can see that one line has "{}" instead of "[]" because it stands out). Rob Cliffe From srkunze at mail.de Tue Sep 29 18:35:16 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 29 Sep 2015 18:35:16 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> Message-ID: <560ABDC4.7000308@mail.de> On 29.09.2015 02:38, Terry Reedy wrote: > On 9/28/2015 5:48 PM, Luciano Ramalho wrote: >> Glyph tweeted yesterday that everyone should watch the "Nothing is >> Something" 35' talk by Sandi Metz at RailsConf 2015. It's great and, >> in a way, relevant to this discussion. >> >> https://www.youtube.com/watch?v=29MAL8pJImQ > > I understood Metz as advocation avoidig the nil (None) problem by > giving every class an 'active nothing' that has the methods of the > class. We do that for most builtin classes -- 0, (), {}, etc. She > also used the identity function with a particular signature in various > roles. > I might stress here that nobody said there's a single "active nothing". There are far more "special case objects" (as Robert C. Martin calls it) than 0, (), {}, etc. I fear, however, the stdlib cannot account for every special case object possible. Without None available in the first place, users would be forced to create their domain-specific special case objects. None being available though, people need to be taught to avoid it, which btw. she did a really good job of. Best, Sven From g.brandl at gmx.net Tue Sep 29 18:37:46 2015 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 29 Sep 2015 18:37:46 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> Message-ID: On 09/29/2015 03:40 PM, Eric Snow wrote: > On Tue, Sep 29, 2015 at 3:49 AM, M.-A. Lemburg wrote: >> On 28.09.2015 23:49, Guido van Rossum wrote: >>> But that's a different point (for the record I'm not a big fan of the ? >>> either). >> >> Me neither. > > Same here. > >> >> The proposal simply doesn't have the right balance between usefulness >> and complexity added to the language (esp. for new Python programmers >> to learn in order to be able to read a Python program). > > +1 I agree as well. >> In practice, you can very often write "x or y" instead of >> having to use "x if x is None else y", simply because you're >> not only interested in catching the x is None case, but also >> want to override an empty string or sequence value with >> a default. If you really need to specifically check for None, >> "x if x is None else y" is way more expressive than "x ?? y". >> >> For default parameters with mutable types as values, >> I usually write: >> >> def func(x=None): >> if x is None: >> x = [] >> ... > > I do the same. It has the right amount of explicitness and makes the > default-case branch more obvious (subjectively, of course) than the > proposed alternative: > > def func(x=None): > x = x ?? [] Looking at this, I think people might call ?? the "WTF operator". Not a good sign :) Georg From emile at fenx.com Tue Sep 29 18:58:40 2015 From: emile at fenx.com (Emile van Sebille) Date: Tue, 29 Sep 2015 09:58:40 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560ABA5D.1070602@btinternet.com> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> <560ABA5D.1070602@btinternet.com> Message-ID: On 9/29/2015 9:20 AM, Rob Cliffe wrote: > Why not > > def __init__(self, vertices=None, edges=None, weights=None, > source_nodes=None): > self.vertices = vertices if vertices is not None else [] > self.edges = edges if edges is not None else [] > self.weights = weights if weights is not None else {} > self.source_nodes = source_nodes if source_nodes is not None else [] I don't understand why not: self.vertices = vertices or [] self.edges = edges or [] self.weights = weights or {} self.source_nodes = source_nodes or [] Emile From ericsnowcurrently at gmail.com Tue Sep 29 19:15:36 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 29 Sep 2015 11:15:36 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> Message-ID: On Tue, Sep 29, 2015 at 8:56 AM, Paul Moore wrote: > But it's not quite as obvious with multiple arguments where the target > isn't the same as the parameter (for example with a constructor): > > def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): > if vertices is None: > self.vertices = [] > else: > self.vertices = vertices > if edges is None: > self.edges = [] > else: > self.edges = edges > if weights is None: > self.weights = {} > else: > self.weights = weights > if source_nodes is None: > self.source_nodes = [] > else: > self.source_nodes = source_nodes Personally I usually keep the defaults handling separate, like so: def __init__(self, vertices=None, edges=None, weights=None, source_nodes=None): if vertices is None: vertices = [] if edges is None: edges = [] if weights is None: weights = {} if source_nodes is None: source_nodes = [] self.vertices = vertices self.edges = edges self.weights = weights self.source_nodes = source_nodes ...and given the alternatives presented here, I'd likely continue doing so. To me the others are less distinct about how defaults are set and invite more churn if you have to do anything extra down the road when composing a default. Regardless, YMMV. [snip] > In practice, of course, I never write a default() function at the > moment, I just use multi-line ifs. Whether that means I'd use an or? > operator, I don't know. Probably - but I'd likely consider it a bit of > a "too many ways of doing the same thing" wart at the same time... Right. And it doesn't really pay for itself when measured against that cost. -eric From srkunze at mail.de Tue Sep 29 19:30:21 2015 From: srkunze at mail.de (Sven R. Kunze) Date: Tue, 29 Sep 2015 19:30:21 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> <560ABA5D.1070602@btinternet.com> Message-ID: <560ACAAD.4020705@mail.de> On 29.09.2015 18:58, Emile van Sebille wrote: > On 9/29/2015 9:20 AM, Rob Cliffe wrote: >> Why not >> >> def __init__(self, vertices=None, edges=None, weights=None, >> source_nodes=None): >> self.vertices = vertices if vertices is not None >> else [] >> self.edges = edges if edges is not None >> else [] >> self.weights = weights if weights is not None >> else {} >> self.source_nodes = source_nodes if source_nodes is not None >> else [] > > I don't understand why not: > > self.vertices = vertices or [] > self.edges = edges or [] > self.weights = weights or {} > self.source_nodes = source_nodes or [] People fear that when you pass some special objects into the constructor that behaves like False, this special object is replaced by [] or {}. I for one don't think it's a real issue. However, it has been said that people got bitten by this in the past. I don't know what the heck they did, but I presume they tried something really really nasty. Best, Sven From barry at python.org Tue Sep 29 19:35:42 2015 From: barry at python.org (Barry Warsaw) Date: Tue, 29 Sep 2015 13:35:42 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> Message-ID: <20150929133542.4d04f6dd@anarchist.wooz.org> On Sep 28, 2015, at 03:04 PM, Carl Meyer wrote: >But even if they are rejected, I think a simple `??` or `or?` (or >however it's spelled) operator to reduce the repetition of "x if x is >not None else y" is worth consideration on its own merits. This operator >is entirely unambiguous, and I think would be useful and frequently >used, whether or not ?. and ?[ are added along with it. But why is it an improvement? The ternary operator is entirely obvious and readable, and at least in my experience, is rare enough that the repetition doesn't hurt my fingers that much. It seems like such a radical, ugly new syntax unjustified by the frequency of use and readability improvement. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From carl at oddbird.net Tue Sep 29 19:57:24 2015 From: carl at oddbird.net (Carl Meyer) Date: Tue, 29 Sep 2015 11:57:24 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <20150929133542.4d04f6dd@anarchist.wooz.org> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> Message-ID: <560AD104.8030007@oddbird.net> Hi Barry, On 09/29/2015 11:35 AM, Barry Warsaw wrote: > On Sep 28, 2015, at 03:04 PM, Carl Meyer wrote: > >> But even if they are rejected, I think a simple `??` or `or?` (or >> however it's spelled) operator to reduce the repetition of "x if x is >> not None else y" is worth consideration on its own merits. This operator >> is entirely unambiguous, and I think would be useful and frequently >> used, whether or not ?. and ?[ are added along with it. > > But why is it an improvement? The ternary operator is entirely obvious and > readable, and at least in my experience, is rare enough that the repetition > doesn't hurt my fingers that much. It seems like such a radical, ugly new > syntax unjustified by the frequency of use and readability improvement. I find the repetition irritating enough that I'm tempted to use 'or' instead, even when I know it's not technically the semantics I want. (In most cases, the difference probably doesn't matter, and when it actually does, I probably know that and write out the full ternary.) And I find plenty of other code using `or` when it ought to be using a ternary with `is None` (but again, most of the time in practice it's fine.) Most of this code is in defaults-handling; there've been plenty of examples in the thread. I find the explicit if-block painful if there's more than one argument with a None-default to be handled; YMMV. I agree that I don't love any of the syntax suggestions so far, and without a less ugly syntax, it's probably dead. If it was in the language, I'd use it, but I don't feel strongly about it. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From twangist at gmail.com Tue Sep 29 20:04:38 2015 From: twangist at gmail.com (Brian O'Neill) Date: Tue, 29 Sep 2015 14:04:38 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts Message-ID: <0D5667BE-EF2D-4469-9C61-FB982CE8AE01@gmail.com> > On 9/29/2015 9:20 AM, Rob Cliffe wrote: > Why not > > > > def __init__(self, vertices=None, edges=None, weights=None, > > source_nodes=None): > > self.vertices = vertices if vertices is not None else [] > > self.edges = edges if edges is not None else [] > > self.weights = weights if weights is not None else {} > > self.source_nodes = source_nodes if source_nodes is not None else [] > > I don't understand why not: > > self.vertices = vertices or [] > self.edges = edges or [] > self.weights = weights or {} > self.source_nodes = source_nodes or [] > > > Emile A further virtue of self.vertices = vertices or [] and the like is that they coerce falsy parameters of the wrong type to the falsy object of the correct type. E.g. if vertices is '' or 0, self.vertices will be set to [], whereas the ternary expression only tests for not-None so self.vertices will be set to a probably crazy value. -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Tue Sep 29 20:07:10 2015 From: carl at oddbird.net (Carl Meyer) Date: Tue, 29 Sep 2015 12:07:10 -0600 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <0D5667BE-EF2D-4469-9C61-FB982CE8AE01@gmail.com> References: <0D5667BE-EF2D-4469-9C61-FB982CE8AE01@gmail.com> Message-ID: <560AD34E.8010905@oddbird.net> On 09/29/2015 12:04 PM, Brian O'Neill wrote: > A further virtue of > > self.vertices = vertices or [] > > and the like is that they coerce falsy parameters of the wrong type to the falsy object of the correct type. > > E.g. if vertices is '' or 0, self.vertices will be set to [], whereas the ternary expression only tests > > for not-None so self.vertices will be set to a probably crazy value. Doesn't seem like a virtue to me, seems like it's probably hiding a bug in the calling code, which may have other ramifications. Better to have the "crazy value" visible and fail faster, so you can go fix that bug. Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: OpenPGP digital signature URL: From twangist at gmail.com Tue Sep 29 20:33:36 2015 From: twangist at gmail.com (Brian O'Neill) Date: Tue, 29 Sep 2015 14:33:36 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts Message-ID: <293BDA57-0C74-4241-AEC7-4AE1E7DA66A1@gmail.com> > > A further virtue of > > > > self.vertices = vertices or [] > > > > and the like is that they coerce falsy parameters of the wrong type to the falsy object of the correct type. > > E.g. if vertices is '' or 0, self.vertices will be set to [], whereas the ternary expression only tests > > > > for not-None so self.vertices will be set to a probably crazy value. > > Doesn't seem like a virtue to me, seems like it's probably hiding a bug > in the calling code, which may have other ramifications. Better to have > the "crazy value" visible and fail faster, so you can go fix that bug. > > Carl I have to agree. It isn't a "virtue", and it's best not to mask such mistakes. But it *is* a... property of the shorter construct that it's more forgiving, and doesn't have the same semantics. PS -- My first post, and I lost the "Re:" in the Subject, hence this orphan thread which I'm content to see go no further. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Sep 29 20:48:05 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 29 Sep 2015 19:48:05 +0100 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> Message-ID: On 29 September 2015 at 18:15, Eric Snow wrote: > Personally I usually keep the defaults handling separate, like so: [...] > ...and given the alternatives presented here, I'd likely continue > doing so. To me the others are less distinct about how defaults are > set and invite more churn if you have to do anything extra down the > road when composing a default. Regardless, YMMV. Agreed, there's many ways, and the new operator doesn't really add a huge amount (other than yet another way of doing things). > [snip] >> In practice, of course, I never write a default() function at the >> moment, I just use multi-line ifs. Whether that means I'd use an or? >> operator, I don't know. Probably - but I'd likely consider it a bit of >> a "too many ways of doing the same thing" wart at the same time... > > Right. And it doesn't really pay for itself when measured against that cost. Precisely. Paul From abarnert at yahoo.com Tue Sep 29 22:33:08 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 29 Sep 2015 13:33:08 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> <560ABA5D.1070602@btinternet.com> Message-ID: <87FB8972-4E5B-4E8E-8967-466E6B95FBB6@yahoo.com> On Sep 29, 2015, at 09:58, Emile van Sebille wrote: > >> On 9/29/2015 9:20 AM, Rob Cliffe wrote: >> Why not >> >> def __init__(self, vertices=None, edges=None, weights=None, >> source_nodes=None): >> self.vertices = vertices if vertices is not None else [] >> self.edges = edges if edges is not None else [] >> self.weights = weights if weights is not None else {} >> self.source_nodes = source_nodes if source_nodes is not None else [] > > I don't understand why not: > > self.vertices = vertices or [] > self.edges = edges or [] > self.weights = weights or {} > self.source_nodes = source_nodes or [] Because empty containers are just as falsey as None. So, if I pass in a shared list, your "vertices or []" will replace it with a new, unshared list; if I pass a tuple because I need an immutable graph, you'll replace it with a mutable list; if I pass in a blist.sortedlist, you'll replace it with a plain list. Worse, this will only happen if the argument I pass happens to be empty, which I may not have thought to test for. This is the same reason you don't use "if spam:" when you meant "if spam is not None:", which is explained in PEP 8. Also, I believe the PEP for ternary if-else explains why this is an "attractive nuisance" misuse of or, as one of the major arguments for why a ternary expression should be added. From jdhardy at gmail.com Tue Sep 29 22:43:58 2015 From: jdhardy at gmail.com (Jeff Hardy) Date: Tue, 29 Sep 2015 13:43:58 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <20150929133542.4d04f6dd@anarchist.wooz.org> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> Message-ID: On Tue, Sep 29, 2015 at 10:35 AM, Barry Warsaw wrote: > On Sep 28, 2015, at 03:04 PM, Carl Meyer wrote: > > >But even if they are rejected, I think a simple `??` or `or?` (or > >however it's spelled) operator to reduce the repetition of "x if x is > >not None else y" is worth consideration on its own merits. This operator > >is entirely unambiguous, and I think would be useful and frequently > >used, whether or not ?. and ?[ are added along with it. > > But why is it an improvement? The ternary operator is entirely obvious and > readable, and at least in my experience, is rare enough that the repetition > doesn't hurt my fingers that much. It seems like such a radical, ugly new > syntax unjustified by the frequency of use and readability improvement. > I use it all over the place in C# code, where it makes null checks much cleaner, and the punctuation choice makes sense: var x = foo != null ? foo : ""; var y = foo ?? ""; (it also has other uses in C# relating to nullable types that aren't relevant in Python.) I'd argue the same is true in Python, if a decent way to spell it can be found: x = foo if foo is not None else "" y = foo or? "" It's pure syntactic sugar, but it *is* pretty sweet. (It would also make get-with-default unnecessary, but since it already exists that's not a useful argument.) - Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From emile at fenx.com Tue Sep 29 22:50:43 2015 From: emile at fenx.com (Emile van Sebille) Date: Tue, 29 Sep 2015 13:50:43 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <87FB8972-4E5B-4E8E-8967-466E6B95FBB6@yahoo.com> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> <560ABA5D.1070602@btinternet.com> <87FB8972-4E5B-4E8E-8967-466E6B95FBB6@yahoo.com> Message-ID: Thanks -- I think I've got a better handle now on the why of this discussion. Emile On 9/29/2015 1:33 PM, Andrew Barnert via Python-ideas wrote: > On Sep 29, 2015, at 09:58, Emile van Sebille wrote: >> >>> On 9/29/2015 9:20 AM, Rob Cliffe wrote: >>> Why not >>> >>> def __init__(self, vertices=None, edges=None, weights=None, >>> source_nodes=None): >>> self.vertices = vertices if vertices is not None else [] >>> self.edges = edges if edges is not None else [] >>> self.weights = weights if weights is not None else {} >>> self.source_nodes = source_nodes if source_nodes is not None else [] >> >> I don't understand why not: >> >> self.vertices = vertices or [] >> self.edges = edges or [] >> self.weights = weights or {} >> self.source_nodes = source_nodes or [] > > Because empty containers are just as falsey as None. > > So, if I pass in a shared list, your "vertices or []" will replace it with a new, unshared list; if I pass a tuple because I need an immutable graph, you'll replace it with a mutable list; if I pass in a blist.sortedlist, you'll replace it with a plain list. Worse, this will only happen if the argument I pass happens to be empty, which I may not have thought to test for. > > This is the same reason you don't use "if spam:" when you meant "if spam is not None:", which is explained in PEP 8. > > Also, I believe the PEP for ternary if-else explains why this is an "attractive nuisance" misuse of or, as one of the major arguments for why a ternary expression should be added. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From emile at fenx.com Tue Sep 29 22:55:04 2015 From: emile at fenx.com (Emile van Sebille) Date: Tue, 29 Sep 2015 13:55:04 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> Message-ID: On 9/29/2015 1:43 PM, Jeff Hardy wrote: > I'd argue the same is true in Python, if a decent way to spell it can be > found: > > x = foo if foo is not None else "" > y = foo or? "" > > It's pure syntactic sugar, but it *is* pretty sweet. as to or? variants -- I'd rather see nor: x = foo nor 'foo was None' Just to add my two cents worth. Emile From tjreedy at udel.edu Tue Sep 29 23:48:22 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 29 Sep 2015 17:48:22 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560ABDC4.7000308@mail.de> References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> <560ABDC4.7000308@mail.de> Message-ID: On 9/29/2015 12:35 PM, Sven R. Kunze wrote: > On 29.09.2015 02:38, Terry Reedy wrote: >> On 9/28/2015 5:48 PM, Luciano Ramalho wrote: >>> Glyph tweeted yesterday that everyone should watch the "Nothing is >>> Something" 35' talk by Sandi Metz at RailsConf 2015. It's great and, >>> in a way, relevant to this discussion. >>> >>> https://www.youtube.com/watch?v=29MAL8pJImQ >> >> I understood Metz as advocation avoidig the nil (None) problem by >> giving every class an 'active nothing' that has the methods of the >> class. We do that for most builtin classes -- 0, (), {}, etc. She >> also used the identity function with a particular signature in various >> roles. >> > > I might stress here that nobody said there's a single "active nothing". Ruby's nil and Python's None are passibe nothings. Any operation other than those inherited from Object raise an exception. > There are far more "special case objects" (as Robert C. Martin calls it) > than 0, (), {}, etc. Metz's point is that there is potentially one for most classes than one might write. Some people have wondered why Python does not come with a builtin identity function. The answer has been that one is not needed much and and it is easy to create one. Metz's answer is that they are very useful for generalizing classes. But she also at least implied that they should be specific to each situation. Certainly in Python, if code were to check signature, and even type annotation, then a matching id function would be needed. > I fear, however, the stdlib cannot account for > every special case object possible. Right. It is not possible to create a null instance of a class that does not yet exist. > Without None available in the first place, The problem of a general object is that it is general. It should either be a ghost that does nothing, as with None, or a borg than does everything, as with the Bottom of some languages. > users would be forced to create their domain-specific special > case objects. Metz recomends doing this voluntarily ;-) perhaps after an initial prototype. > None being available though, people need to be taught to avoid it, > which btw. she did a really good job of. I think None works really well as the always-returned value for functions that are really procedures. The problem comes with returning something or None, versus something or raise, or something or null of the class of something. -- Terry Jan Reedy From python-ideas at mgmiller.net Wed Sep 30 00:08:24 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Tue, 29 Sep 2015 15:08:24 -0700 Subject: [Python-ideas] list as parameter for the split function In-Reply-To: References: <5609BADA.8060801@gmx.com> <20150929034316.GO23642@ando.pearwood.info> Message-ID: <560B0BD8.1090203@mgmiller.net> +1 for the feature, given as a tuple. Agreed with original, parent, and grandparent posts, have encountered this numerous times over the years. Reaching for the re module and docs.python and/or stackoverflow to split a string (with two delimiters) feels like swatting a fly with a sledge-hammer. ;) Conversely, I've not encountered the need as often with .startswith, which does support it. -Mike From greg.ewing at canterbury.ac.nz Wed Sep 30 01:27:05 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Sep 2015 12:27:05 +1300 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> Message-ID: <560B1E49.7050102@canterbury.ac.nz> Emile van Sebille wrote: > x = foo nor 'foo was None' Cute, but unfortunately it conflicts with established usage of the word 'nor', which would suggest that a nor b == not (a or b). -- Greg From abarnert at yahoo.com Wed Sep 30 01:53:58 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 29 Sep 2015 16:53:58 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <1082717728.2280152.1443469633455.JavaMail.yahoo@mail.yahoo.com> <560ABDC4.7000308@mail.de> Message-ID: On Sep 29, 2015, at 14:48, Terry Reedy wrote: > >> On 9/29/2015 12:35 PM, Sven R. Kunze wrote: >>> On 29.09.2015 02:38, Terry Reedy wrote: >>>> On 9/28/2015 5:48 PM, Luciano Ramalho wrote: >>>> Glyph tweeted yesterday that everyone should watch the "Nothing is >>>> Something" 35' talk by Sandi Metz at RailsConf 2015. It's great and, >>>> in a way, relevant to this discussion. >>>> >>>> https://www.youtube.com/watch?v=29MAL8pJImQ >>> >>> I understood Metz as advocation avoidig the nil (None) problem by >>> giving every class an 'active nothing' that has the methods of the >>> class. We do that for most builtin classes -- 0, (), {}, etc. She >>> also used the identity function with a particular signature in various >>> roles. >> >> I might stress here that nobody said there's a single "active nothing". > > Ruby's nil and Python's None are passibe nothings. Any operation other than those inherited from Object raise an exception. > >> There are far more "special case objects" (as Robert C. Martin calls it) >> than 0, (), {}, etc. > > Metz's point is that there is potentially one for most classes than one might write. I don't think this is true. First, "int" and "float" are such general-use/low-semantics types that "0" or "0.0" doesn't always mean "nothing". If you're talking about counts, or distances from some preferred origin, then yes, 0 is nothing; if you're talking about Unix timestamps, or ratings from 0 to 5 stars, then it's not. That's exactly why there's so much code in C and such languages that passes around -1 for nothing (but of course that only works when your real data is unsigned but small enough to waste a bit using a signed int), and the fact that Python idiomatically uses None instead of -1 is a strength, not a weakness. Likewise, sometimes "" makes a perfectly good null string, but sometimes it doesn't?it's often worth distinguishing between "" (has no middle name) and None (we haven't asked for the middle name yet), for example. Also, list, set, dict, and most user-defined types are mutable. This means that using [] or Spam() as a type-specific nothing means your nothings are distinct, mutable objects. Sometimes that's OK, sometimes it's even explicitly a good thing, but sometimes it definitely isn't. In a language that encouraged use of more finely-grained types (so you never use "int", you use "Rating", which is constrained to 0-5), the idea that each type that's nullable should have its own null makes some sense, and even more so for a pure-immutable language and idioms around type-driven programming. But that's not even close to Python. > Some people have wondered why Python does not come with a builtin identity function. The answer has been that one is not needed much and and it is easy to create one. Metz's answer is that they are very useful for generalizing classes. But she also at least implied that they should be specific to each situation. Certainly in Python, if code were to check signature, and even type annotation, then a matching id function would be needed. Having to create an identity function for each type seems like a horrible idea. Even more so in a language that encourages granular typing. Fortunately, any such language that anyone would actually use probably has parametric genericity, so you could just write a single id function from any type A to the same A, and let the compiler deal with specializing it for each type instead of the programmer. (Or you could make type class definitions provide an id by default, or something else equivalent.) >> I fear, however, the stdlib cannot account for > > every special case object possible. > > Right. It is not possible to create a null instance of a class that does not yet exist. > >> Without None available in the first place, > > The problem of a general object is that it is general. It should either be a ghost that does nothing, as with None, or a borg than does everything, as with the Bottom of some languages. It might be worth having both. But maybe not?personally, while I've occasionally created a Python-like ghost in Smalltalk, I've never wanted a Smalltalk-like borg in Python. What I have wanted, quite often, is to write code that locally, explicitly, treats the ghost as a borg. And that's exactly what "is not None" tests are for, and we've all used them. This proposal isn't adding the concept to the language or idiom, just providing syntactic sugar to make an already widely-used feature easier to use. > > users would be forced to create their domain-specific special > > case objects. > > Metz recomends doing this voluntarily ;-) > perhaps after an initial prototype. > > > None being available though, people need to be taught to avoid it, > > which btw. she did a really good job of. > > I think None works really well as the always-returned value for functions that are really procedures. The problem comes with returning something or None, versus something or raise, or something or null of the class of something. I think the problem comes with assuming that there is a universal answer here. Sometimes something vs. None is appropriate. Sometimes, raising is appropriate. Sometimes, returning a special value is appropriate. Occasionally, even building a Maybe type and using that (with or without collapsing) is appropriate (even without syntactic support for pattern matching and fmapping, although it's much nicer with...). Python idiomatically uses all of the first three extensively, differently in different situations, but rarely the fourth. Some languages use a different subset. Suggesting that one of these must always be the answer doesn't seem to be motivated by any real concerns. Does Python code really have a lot more problems with this than C or Swift or some other language that only idiomatically uses one or two different answers for? Even if it does (which I doubt), would eliminating one of the three but changing as little else as possible about the language and ecosystem actually help? And would it be even remotely feasible to do so? From abarnert at yahoo.com Wed Sep 30 01:57:39 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 29 Sep 2015 16:57:39 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560B1E49.7050102@canterbury.ac.nz> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: LOn Sep 29, 2015, at 16:27, Greg Ewing wrote: > > Emile van Sebille wrote: > >> x = foo nor 'foo was None' > > Cute, but unfortunately it conflicts with established > usage of the word 'nor', which would suggest that > a nor b == not (a or b). Agreed. If this is going to be a keyword rather than a symbol, it really has to read like English, or at least like abbreviated English, with the right meaning--something like "foo, falling back to 'foo was None' if needed". Something that reads like English with a completely different meaning is a bad idea. From alexander.belopolsky at gmail.com Wed Sep 30 02:24:50 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 29 Sep 2015 20:24:50 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: On Tue, Sep 29, 2015 at 7:57 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > If this is going to be a keyword rather than a symbol, it really has to > read like English, or at least like abbreviated English, I am -1 on this whole idea, but the keyword that comes to mind is "def": x def [] may be read as x DEFaulting to []. If this was a Python 4 idea, I would suggest repurposing the rarely used xor operator: ^ and make x ^ y return the non-None of x and y or None if both are None. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Sep 30 02:27:22 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 29 Sep 2015 20:27:22 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: On Tue, Sep 29, 2015 at 8:24 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > ... suggest repurposing the rarely used xor operator: ^ and make x ^ y > return the non-None of x and y or None if both are None. .. and x if both are not None to allow x ^= y to work as expected. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Sep 30 02:39:11 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 29 Sep 2015 19:39:11 -0500 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: What about 'otherwise'? x = a otherwise b On September 29, 2015 6:57:39 PM CDT, Andrew Barnert via Python-ideas wrote: >LOn Sep 29, 2015, at 16:27, Greg Ewing >wrote: >> >> Emile van Sebille wrote: >> >>> x = foo nor 'foo was None' >> >> Cute, but unfortunately it conflicts with established >> usage of the word 'nor', which would suggest that >> a nor b == not (a or b). > >Agreed. If this is going to be a keyword rather than a symbol, it >really has to read like English, or at least like abbreviated English, >with the right meaning--something like "foo, falling back to 'foo was >None' if needed". Something that reads like English with a completely >different meaning is a bad idea. > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. CURRENTLY LISTENING TO: Vermilion Fire (Final Fantasy Type-0) -------------- next part -------------- An HTML attachment was scrubbed... URL: From neatnate at gmail.com Wed Sep 30 02:46:55 2015 From: neatnate at gmail.com (Nathan Schneider) Date: Wed, 30 Sep 2015 01:46:55 +0100 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: On Wed, Sep 30, 2015 at 1:24 AM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > If this was a Python 4 idea, I would suggest repurposing the rarely used > xor operator: ^ and make x ^ y return the non-None of x and y or None if > both are None. > I find ^ quite useful for sets, and would rather not see it repurposed in this way. Another possibility: There could be a binary version of the tilde operator, which is currently only unary: x ~ y to mean "x if x is not None else y". But I am also -1 on the whole idea, as I rarely encounter situations that would benefit from this construct. Nathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Sep 30 03:02:15 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 30 Sep 2015 11:02:15 +1000 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> Message-ID: <20150930010215.GU23642@ando.pearwood.info> On Tue, Sep 29, 2015 at 06:37:46PM +0200, Georg Brandl wrote: > > x = x ?? [] > > Looking at this, I think people might call ?? the "WTF operator". Not a > good sign :) I see your smiley, but C# has this operator. What do C# programmers call it? ("Null coalescing operator" is the formal name, but that's way too long for everyday use.) -- Steve From steve at pearwood.info Wed Sep 30 03:32:36 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 30 Sep 2015 11:32:36 +1000 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560AD34E.8010905@oddbird.net> References: <0D5667BE-EF2D-4469-9C61-FB982CE8AE01@gmail.com> <560AD34E.8010905@oddbird.net> Message-ID: <20150930013236.GV23642@ando.pearwood.info> On Tue, Sep 29, 2015 at 12:07:10PM -0600, Carl Meyer wrote: > On 09/29/2015 12:04 PM, Brian O'Neill wrote: > > A further virtue of > > > > self.vertices = vertices or [] > > > > and the like is that they coerce falsy parameters of the wrong type to the falsy object of the correct type. > > > > E.g. if vertices is '' or 0, self.vertices will be set to [], whereas the ternary expression only tests > > > > for not-None so self.vertices will be set to a probably crazy value. > > Doesn't seem like a virtue to me, seems like it's probably hiding a bug > in the calling code, which may have other ramifications. Better to have > the "crazy value" visible and fail faster, so you can go fix that bug. Agreed. Assuming the function is intended to, and documented as, using the passed in "vertices", using `or` is simply wrong, in two ways: - if vertices is a falsey value of the wrong type, say, 0.0, it will be silently replaced by [] instead of triggering an exception (usually a TypeError or AttributeError); - if vertices is a falsey value of the right type, say collections.deque(), it will be silently replaced by []. In the first case, the code is hiding a bug in the caller. In the second case, its a bug in the called code. I am very sad to see how many people still use the error-prone `x or y` idiom inappropriately, a full decade after PEP 308 was approved. (Depending on where you are in the world, it was ten years ago today, or yesterday.) `x or y` still has its uses, but testing for None is not one of them. -- Steve From rob.cliffe at btinternet.com Wed Sep 30 13:00:19 2015 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Wed, 30 Sep 2015 12:00:19 +0100 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: <560BC0C3.4040502@btinternet.com> Or: x = a orelse b # Visual Basic has a short-circuiting OrElse operator for boolean operands x = a orifNone b On 30/09/2015 01:39, Ryan Gonzalez wrote: > What about 'otherwise'? > > x = a otherwise b > > > On September 29, 2015 6:57:39 PM CDT, Andrew Barnert via Python-ideas > wrote: > > LOn Sep 29, 2015, at 16:27, Greg Ewing wrote: > > Emile van Sebille wrote: > > x = foo nor 'foo was None' > > Cute, but unfortunately it conflicts with established usage of > the word 'nor', which would suggest that a nor b == not (a or b). > > > Agreed. If this is going to be a keyword rather than a symbol, it really has to read like English, or at least like abbreviated English, with the right meaning--something like "foo, falling back to 'foo was None' if needed". Something that reads like English with a completely different meaning is a bad idea. > > ------------------------------------------------------------------------ > > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct:http://python.org/psf/codeofconduct/ > > -- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity. > CURRENTLY LISTENING TO: Vermilion Fire (Final Fantasy Type-0) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Sep 30 17:28:10 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 30 Sep 2015 08:28:10 -0700 (PDT) Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence Message-ID: What are the pros and cons of making enumerate a sequence if its argument is a sequence? I found myself writing: for vertex, height in zip( self.cache.height_to_vertex[height_slice], range(height_slice.start, height_slice.stop)): I would have preferred: for height, vertex in enumerate( self.cache.height_to_vertex)[height_slice]: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdhardy at gmail.com Wed Sep 30 18:34:06 2015 From: jdhardy at gmail.com (Jeff Hardy) Date: Wed, 30 Sep 2015 09:34:06 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <20150930010215.GU23642@ando.pearwood.info> References: <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <560A5E9D.3070808@egenix.com> <20150930010215.GU23642@ando.pearwood.info> Message-ID: On Tue, Sep 29, 2015 at 6:02 PM, Steven D'Aprano wrote: > On Tue, Sep 29, 2015 at 06:37:46PM +0200, Georg Brandl wrote: > > > > x = x ?? [] > > > > Looking at this, I think people might call ?? the "WTF operator". Not a > > good sign :) > > I see your smiley, but C# has this operator. What do C# programmers call > it? ("Null coalescing operator" is the formal name, but that's way too > long for everyday use.) > I've never seen it referred to as anything other than "the null coalescing operator" (or occasionally the "double question mark operator"). C# devs aren't necessarily the most creative bunch... :) - Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Sep 30 18:36:43 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 30 Sep 2015 19:36:43 +0300 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On 30.09.15 18:28, Neil Girdhar wrote: > What are the pros and cons of making enumerate a sequence if its > argument is a sequence? > > I found myself writing: > > for vertex, height in zip( > self.cache.height_to_vertex[height_slice], > range(height_slice.start, height_slice.stop)): > > I would have preferred: > > for height, vertex in enumerate( > self.cache.height_to_vertex)[height_slice]: You can write: for height, vertex in enumerate( self.cache.height_to_vertex[height_slice], height_slice.start): From jdhardy at gmail.com Wed Sep 30 18:41:53 2015 From: jdhardy at gmail.com (Jeff Hardy) Date: Wed, 30 Sep 2015 09:41:53 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: On Tue, Sep 29, 2015 at 5:24 PM, Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Tue, Sep 29, 2015 at 7:57 PM, Andrew Barnert via Python-ideas < > python-ideas at python.org> wrote: > >> If this is going to be a keyword rather than a symbol, it really has to >> read like English, or at least like abbreviated English, > > > I am -1 on this whole idea, but the keyword that comes to mind is "def": > > x def [] > > may be read as x DEFaulting to []. > 'def' is currently short for 'define', which would be too confusing. Spelling out 'default' isn't so bad, though: self.x = x default [] And if it's going to be that long anyway, we might as well just put a `default` function in the builtins: self.x = default(x, []) I actually really like 'otherwise', but it's certainly not brief: self.x = x if x is not None else [] self.x = x otherwise [] That said, it's not used *that* often either, so maybe it's acceptable. - Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Wed Sep 30 18:43:12 2015 From: brett at python.org (Brett Cannon) Date: Wed, 30 Sep 2015 16:43:12 +0000 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On Wed, 30 Sep 2015 at 09:38 Neil Girdhar wrote: > In fairness, one is a superset of the other. You always get an Iterable. > You sometimes get a Sequence. It's a bit like multiplication? with > integers you get integers, with floats, you get floats. > No, it's not like multiplication. =) I hate saying this since I think it's tossed around too much, but int/float substitution doesn't lead to a Liskov substitution violation like substituting out a sequence for an iterator (which is what will happen if the type of the argument to `enumerate` changes). And since you can just call `list` or `tuple` on enumerate and get exactly what you're after without potential bugs cropping up if you don't realize from afar you're affecting an assumption someone made, I'm -1000 on this idea. -Brett > > On Wed, Sep 30, 2015 at 12:35 PM Brett Cannon wrote: > >> On Wed, 30 Sep 2015 at 08:28 Neil Girdhar wrote: >> >>> What are the pros and cons of making enumerate a sequence if its >>> argument is a sequence? >>> >>> I found myself writing: >>> >>> for vertex, height in zip( >>> self.cache.height_to_vertex[height_slice], >>> range(height_slice.start, height_slice.stop)): >>> >>> I would have preferred: >>> >>> for height, vertex in enumerate( >>> self.cache.height_to_vertex)[height_slice]: >>> >> >> Because you now suddenly have different types and semantics of what >> enumerate() returns based on its argument which is easy to mess up if >> self.cache.height_to_vertex became an iterator object itself instead of a >> sequence object. It's also not hard to simply do `tuple(enumerate(...))` to >> get the exact semantics you want: TOOWTDI. >> >> IOW all I see are cons. =) >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Sep 30 18:47:44 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 1 Oct 2015 02:47:44 +1000 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: On Thu, Oct 1, 2015 at 2:41 AM, Jeff Hardy wrote: > 'def' is currently short for 'define', which would be too confusing. > Spelling out 'default' isn't so bad, though: > > self.x = x default [] > > And if it's going to be that long anyway, we might as well just put a > `default` function in the builtins: > > self.x = default(x, []) I'd prefer it to have language support rather than a builtin, so it can shortcircuit. It won't often be important, but it would be nice to be able to put a function call in there or something. ChrisA From eric at trueblade.com Wed Sep 30 18:49:49 2015 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 30 Sep 2015 12:49:49 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: <560C12AD.90305@trueblade.com> On 09/30/2015 12:41 PM, Jeff Hardy wrote: > 'def' is currently short for 'define', which would be too confusing. > Spelling out 'default' isn't so bad, though: > > self.x = x default [] > > And if it's going to be that long anyway, we might as well just put a > `default` function in the builtins: > > self.x = default(x, []) You lose the short circuiting. > I actually really like 'otherwise', but it's certainly not brief: > > self.x = x if x is not None else [] > self.x = x otherwise [] I'm -1 on needing syntax for this, but if we're going to do it, this is my favorite version I've seen so far. The usual caveats about adding a keyword apply. Eric. From mistersheik at gmail.com Wed Sep 30 18:53:47 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 30 Sep 2015 16:53:47 +0000 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: Can you help understand how this is a Liskov substitution violation? A Sequence is an Iterator. Getting the sequence back should never hurt. The current interface doesn't promise that the returned object won't have additional methods or implement additional interfaces, does it? On Wed, Sep 30, 2015 at 12:43 PM Brett Cannon wrote: > On Wed, 30 Sep 2015 at 09:38 Neil Girdhar wrote: > >> In fairness, one is a superset of the other. You always get an >> Iterable. You sometimes get a Sequence. It's a bit like multiplication? >> with integers you get integers, with floats, you get floats. >> > > No, it's not like multiplication. =) I hate saying this since I think it's > tossed around too much, but int/float substitution doesn't lead to a Liskov > substitution violation like substituting out a sequence for an iterator > (which is what will happen if the type of the argument to `enumerate` > changes). And since you can just call `list` or `tuple` on enumerate and > get exactly what you're after without potential bugs cropping up if you > don't realize from afar you're affecting an assumption someone made, I'm > -1000 on this idea. > > -Brett > > >> >> On Wed, Sep 30, 2015 at 12:35 PM Brett Cannon wrote: >> >>> On Wed, 30 Sep 2015 at 08:28 Neil Girdhar wrote: >>> >>>> What are the pros and cons of making enumerate a sequence if its >>>> argument is a sequence? >>>> >>>> I found myself writing: >>>> >>>> for vertex, height in zip( >>>> self.cache.height_to_vertex[height_slice], >>>> range(height_slice.start, height_slice.stop)): >>>> >>>> I would have preferred: >>>> >>>> for height, vertex in enumerate( >>>> self.cache.height_to_vertex)[height_slice]: >>>> >>> >>> Because you now suddenly have different types and semantics of what >>> enumerate() returns based on its argument which is easy to mess up if >>> self.cache.height_to_vertex became an iterator object itself instead of a >>> sequence object. It's also not hard to simply do `tuple(enumerate(...))` to >>> get the exact semantics you want: TOOWTDI. >>> >>> IOW all I see are cons. =) >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Sep 30 19:03:04 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 30 Sep 2015 10:03:04 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On Wed, Sep 30, 2015 at 9:53 AM, Neil Girdhar wrote: > Can you help understand how this is a Liskov substitution violation? A > Sequence is an Iterator. Getting the sequence back should never hurt. > no but getting a non-sequence iterator back when you expect a sequence sure can hurt. which is why I said that if you want a sequence back from enumerate, it should always return a sequence. which could (should) be lazy-evaluated. I think Neil's point is that calling list() or tuple() on it requires that the entire sequence be evaluated and stored -- if you really only want one item (and especially not one at the end), that could be a pretty big performance hit. Which makes me wonder why ALL iterators couldn't support indexing? It might work like crap in some cases, but wouldn't it always be as good or better than wrapping it in a tuple? And then some cases (like enumerate) could do an index operation efficiently when they are working with "real" sequences. Maybe a generic lazy_sequence object that could be wrapped around an iterator to create a lazy-evaluating sequence?? -CHB > On Wed, Sep 30, 2015 at 12:43 PM Brett Cannon wrote: > >> On Wed, 30 Sep 2015 at 09:38 Neil Girdhar wrote: >> >>> In fairness, one is a superset of the other. You always get an >>> Iterable. You sometimes get a Sequence. It's a bit like multiplication? >>> with integers you get integers, with floats, you get floats. >>> >> >> No, it's not like multiplication. =) I hate saying this since I think >> it's tossed around too much, but int/float substitution doesn't lead to a >> Liskov substitution violation like substituting out a sequence for an >> iterator (which is what will happen if the type of the argument to >> `enumerate` changes). And since you can just call `list` or `tuple` on >> enumerate and get exactly what you're after without potential bugs cropping >> up if you don't realize from afar you're affecting an assumption someone >> made, I'm -1000 on this idea. >> >> -Brett >> >> >>> >>> On Wed, Sep 30, 2015 at 12:35 PM Brett Cannon wrote: >>> >>>> On Wed, 30 Sep 2015 at 08:28 Neil Girdhar >>>> wrote: >>>> >>>>> What are the pros and cons of making enumerate a sequence if its >>>>> argument is a sequence? >>>>> >>>>> I found myself writing: >>>>> >>>>> for vertex, height in zip( >>>>> self.cache.height_to_vertex[height_slice], >>>>> range(height_slice.start, height_slice.stop)): >>>>> >>>>> I would have preferred: >>>>> >>>>> for height, vertex in enumerate( >>>>> self.cache.height_to_vertex)[height_slice]: >>>>> >>>> >>>> Because you now suddenly have different types and semantics of what >>>> enumerate() returns based on its argument which is easy to mess up if >>>> self.cache.height_to_vertex became an iterator object itself instead of a >>>> sequence object. It's also not hard to simply do `tuple(enumerate(...))` to >>>> get the exact semantics you want: TOOWTDI. >>>> >>>> IOW all I see are cons. =) >>>> >>> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Sep 30 19:15:46 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 30 Sep 2015 13:15:46 -0400 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On Wed, Sep 30, 2015 at 12:53 PM, Neil Girdhar wrote: > A Sequence is an Iterator. No, a Sequence is an Iterable, not an Iterator: >>> issubclass(collections.Sequence, collections.Iterator) False >>> issubclass(collections.Sequence, collections.Iterable) True -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Sep 30 19:18:44 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 30 Sep 2015 17:18:44 +0000 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: Ah good point. Well, in the case of a sequence argument, an enumerate object could be both a sequence and an iterator. On Wed, Sep 30, 2015 at 1:15 PM Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Wed, Sep 30, 2015 at 12:53 PM, Neil Girdhar > wrote: > >> A Sequence is an Iterator. > > > No, a Sequence is an Iterable, not an Iterator: > > >>> issubclass(collections.Sequence, collections.Iterator) > False > >>> issubclass(collections.Sequence, collections.Iterable) > True > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Sep 30 19:19:53 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 30 Sep 2015 17:19:53 +0000 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range? On Wed, Sep 30, 2015 at 1:18 PM Neil Girdhar wrote: > Ah good point. Well, in the case of a sequence argument, an enumerate > object could be both a sequence and an iterator. > > On Wed, Sep 30, 2015 at 1:15 PM Alexander Belopolsky < > alexander.belopolsky at gmail.com> wrote: > >> >> On Wed, Sep 30, 2015 at 12:53 PM, Neil Girdhar >> wrote: >> >>> A Sequence is an Iterator. >> >> >> No, a Sequence is an Iterable, not an Iterator: >> >> >>> issubclass(collections.Sequence, collections.Iterator) >> False >> >>> issubclass(collections.Sequence, collections.Iterable) >> True >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Wed Sep 30 19:27:15 2015 From: gokoproject at gmail.com (John Wong) Date: Wed, 30 Sep 2015 13:27:15 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560C12AD.90305@trueblade.com> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> <560C12AD.90305@trueblade.com> Message-ID: On Wed, Sep 30, 2015 at 12:49 PM, Eric V. Smith wrote: > > > > > self.x = x if x is not None else [] > > self.x = x otherwise [] > > I'm -1 on needing syntax for this, but if we're going to do it, this is > my favorite version I've seen so far. The usual caveats about adding a > keyword apply. > > Also feel this is the most intuitive... every other syntax seems really hard to read, however, the caveat I am thinking here is synonym of else and otherwise. We really should go any more complex. Those ? and null with/without exception seems awfully complex to reason. I'd spell out 10 lines if I had to. If fact, is it bad if we make else working for such brevity? BTW, this syntax just defeats the example in the PEP: [PEP 0505 - https://www.python.org/dev/peps/pep-0505/] This particular formulation has the undesirable effect of putting the > operands in an unintuitive order: the brain thinks, "use data if possible > and use [] as a fallback," but the code puts the fallback * before * the > preferred value. John -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Sep 30 19:28:48 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 30 Sep 2015 10:28:48 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On Wed, Sep 30, 2015 at 10:19 AM, Neil Girdhar wrote: > I guess, I'm just asking for enumerate to go through the same change that > range went through. Why wasn't it a problem for range? > well, range is simpler -- you don't pass arbitrary iterables into it. It always has to compute integer values according to start, stop, step -- easy to implement as either iteration or indexing. enumerate, on the other hand, takes an arbitrary iterable -- so it can't just index into that iterable if asked for an index. You are right, of course, that it COULD do that if it was passed a sequence in the first place, but then you have an intera e whereby you get a different kind of object depending on how you created it, which is pretty ugly. But again, we could add indexing to enumerate, and have it do the ugly inefficient thing when it's using an underlying non-indexable iterator, and do the efficient thing when it has a sequence to work with, thereby providing the same API regardless. -CHB > On Wed, Sep 30, 2015 at 1:18 PM Neil Girdhar > wrote: > >> Ah good point. Well, in the case of a sequence argument, an enumerate >> object could be both a sequence and an iterator. >> >> On Wed, Sep 30, 2015 at 1:15 PM Alexander Belopolsky < >> alexander.belopolsky at gmail.com> wrote: >> >>> >>> On Wed, Sep 30, 2015 at 12:53 PM, Neil Girdhar >>> wrote: >>> >>>> A Sequence is an Iterator. >>> >>> >>> No, a Sequence is an Iterable, not an Iterator: >>> >>> >>> issubclass(collections.Sequence, collections.Iterator) >>> False >>> >>> issubclass(collections.Sequence, collections.Iterable) >>> True >>> >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Sep 30 19:33:05 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 30 Sep 2015 20:33:05 +0300 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On 30.09.15 20:18, Neil Girdhar wrote: > Ah good point. Well, in the case of a sequence argument, an enumerate > object could be both a sequence and an iterator. It can't be. For sequence: >>> x = 'abcd' >>> list(zip(x, x)) [('a', 'a'), ('b', 'b'), ('c', 'c'), ('d', 'd')] For iterator: >>> x = iter('abcd') >>> list(zip(x, x)) [('a', 'b'), ('c', 'd')] From anthony at xtfx.me Wed Sep 30 19:35:49 2015 From: anthony at xtfx.me (C Anthony Risinger) Date: Wed, 30 Sep 2015 12:35:49 -0500 Subject: [Python-ideas] Submitting a job to an asyncio event loop In-Reply-To: References: Message-ID: On Sun, Sep 27, 2015 at 11:42 AM, Guido van Rossum wrote: > [...] > > I don't think the use case involving multiple event loops in different > threads is as clear. I am still waiting for someone who is actually trying > to use this. It might be useful on a system where there is a system event > loop that must be used for UI events (assuming this event loop can somehow > be wrapped in a custom asyncio loop) and where an app might want to have a > standard asyncio event loop for network I/O. Come to think of it, the > ProactorEventLoop on Windows has both advantages and disadvantages, and > some app might need to use both that and SelectorEventLoop. But this is a > real pain (because you can't share any mutable state between event loops). > I'm not currently solving the problem this way, but I wanted to do something like this recently for a custom Mesos framework. The framework uses a pure-python library called "pesos" that in turn uses a pure-python libprocess library called "compactor". compactor runs user code in a private event loop (Mesos registration, etc). I also wanted to run my own private loop in another thread that interacts with Redis. This loop is expected to process some incoming updates as commands that must influence the compactor loop (start reconciliation or some other Mesos-related thing) and the most straightforward thing to me sounded exactly like this thread: submitting jobs from one loop to another. I haven't really delved into making the Redis part an async loop (it's just threaded right now) as I'm less experienced with writing such code, so maybe I am overlooking and/or conflating things, but seems reasonable. -- C Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Sep 30 19:47:29 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 30 Sep 2015 10:47:29 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On Wed, Sep 30, 2015 at 10:33 AM, Serhiy Storchaka wrote: > On 30.09.15 20:18, Neil Girdhar wrote: > >> Ah good point. Well, in the case of a sequence argument, an enumerate >> object could be both a sequence and an iterator. >> > > It can't be. > > For sequence: > > >>> x = 'abcd' > >>> list(zip(x, x)) > [('a', 'a'), ('b', 'b'), ('c', 'c'), ('d', 'd')] > > For iterator: > > >>> x = iter('abcd') > >>> list(zip(x, x)) > [('a', 'b'), ('c', 'd')] well, that's because zip is using the same iterator it two places. would that ever be the case with enumerate? -CHB > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Wed Sep 30 19:55:49 2015 From: jimjjewett at gmail.com (Jim J. Jewett) Date: Wed, 30 Sep 2015 13:55:49 -0400 Subject: [Python-ideas] secrets module -- secret.keeper? Message-ID: Will the secrets module offer any building blocks to actually protect a secret? e.g., an easy way to encrypt a file with a given password? an encrypted datastore? a getpass that works even in IDLE? -jJ From mistersheik at gmail.com Wed Sep 30 20:10:33 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 30 Sep 2015 18:10:33 +0000 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: Ah, that's a great point. Thanks. I guess range was never an Iterator, which is a key difference. On Wed, Sep 30, 2015 at 1:33 PM Serhiy Storchaka wrote: > On 30.09.15 20:18, Neil Girdhar wrote: > > Ah good point. Well, in the case of a sequence argument, an enumerate > > object could be both a sequence and an iterator. > > It can't be. > > For sequence: > > >>> x = 'abcd' > >>> list(zip(x, x)) > [('a', 'a'), ('b', 'b'), ('c', 'c'), ('d', 'd')] > > For iterator: > > >>> x = iter('abcd') > >>> list(zip(x, x)) > [('a', 'b'), ('c', 'd')] > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/x1omibxxcMw/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Wed Sep 30 20:11:00 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 30 Sep 2015 20:11:00 +0200 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: <560C25B4.5050000@egenix.com> On 30.09.2015 19:19, Neil Girdhar wrote: > I guess, I'm just asking for enumerate to go through the same change that > range went through. Why wasn't it a problem for range? range() returns a list in Python 2 and a generator in Python 3. enumerate() has never returned a sequence. It was one of the first builtin APIs in Python to return a generator: https://www.python.org/dev/peps/pep-0279/ after iterators and generators were introduced to the language: https://www.python.org/dev/peps/pep-0234/ https://www.python.org/dev/peps/pep-0255/ The main purpose of enumerate() is to allow enumeration of objects in a sequence or other iterable. If you need a sequence, simply wrap it with list(), e.g. list(enumerate(sequence)). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From python-ideas at mgmiller.net Wed Sep 30 20:13:14 2015 From: python-ideas at mgmiller.net (Mike Miller) Date: Wed, 30 Sep 2015 11:13:14 -0700 Subject: [Python-ideas] secrets module -- secret.keeper? In-Reply-To: References: Message-ID: <560C263A.1010600@mgmiller.net> Somewhat related, there is a keyring module, the functionality of which I've sometimes wished were part of the stdlib: https://pypi.python.org/pypi/keyring It supports the big three OSs. -Mike On 2015-09-30 10:55, Jim J. Jewett wrote: > Will the secrets module offer any building blocks to actually protect a secret? > > e.g., > > an easy way to encrypt a file with a given password? > an encrypted datastore? > a getpass that works even in IDLE? > > -jJ From abarnert at yahoo.com Wed Sep 30 20:16:33 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 11:16:33 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: <054DB797-6E9A-46E4-BF5D-05717C5B1060@yahoo.com> On Sep 30, 2015, at 10:47, Chris Barker wrote: > >> On Wed, Sep 30, 2015 at 10:33 AM, Serhiy Storchaka wrote: >>> On 30.09.15 20:18, Neil Girdhar wrote: >>> Ah good point. Well, in the case of a sequence argument, an enumerate >>> object could be both a sequence and an iterator. >> >> It can't be. >> >> For sequence: >> >> >>> x = 'abcd' >> >>> list(zip(x, x)) >> [('a', 'a'), ('b', 'b'), ('c', 'c'), ('d', 'd')] >> >> For iterator: >> >> >>> x = iter('abcd') >> >>> list(zip(x, x)) >> [('a', 'b'), ('c', 'd')] > > well, that's because zip is using the same iterator it two places. would that ever be the case with enumerate? The point is that _nothing_ can be an iterator and a sequence at the same time. (And therefore, an enumerate object can't be both at the same time.) The zip function is just a handy way of demonstrating the problem; it's not the actual problem. You could also demonstrate it by, e.g., calling len(x), next(x), list(x): If x is an iterator, next(x) will use up the 'a' so list will only give you ['b', 'c', 'd'], even though len gave you 4. Conceptually: iterators are inherently one-shot iterables; sequences are inherently reusable iterables. While there's no explicit rule that __iter__ can't return self for a sequence, there's no reasonable way to make a sequence that does so. Which means no sequence can be an iterator. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Sep 30 20:22:29 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 30 Sep 2015 14:22:29 -0400 Subject: [Python-ideas] Fwd: Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On Wed, Sep 30, 2015 at 11:28 AM, Neil Girdhar wrote: > I found myself writing: > > for vertex, height in zip( > self.cache.height_to_vertex[height_slice], > range(height_slice.start, height_slice.stop)): > > I would have preferred: > > for height, vertex in enumerate( > self.cache.height_to_vertex)[height_slice]: > This does not seem to be an big improvement over for height, vertex in enumerate( self.cache.height_to_vertex[height_slice], height_slice.start): -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Sep 30 20:26:03 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 11:26:03 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <560C25B4.5050000@egenix.com> References: <560C25B4.5050000@egenix.com> Message-ID: <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> On Sep 30, 2015, at 11:11, M.-A. Lemburg wrote: > >> On 30.09.2015 19:19, Neil Girdhar wrote: >> I guess, I'm just asking for enumerate to go through the same change that >> range went through. Why wasn't it a problem for range? > > range() returns a list in Python 2 and a generator in Python 3. No it doesn't. It returns a (lazy) sequence. Not a generator, or any other kind of iterator. I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.) There's no conceptual reason that Python couldn't have more lazy sequences, and tools to build your own lazy sequences more easily. However, things do get messy once you get into the details. For example, zip can return a lazy sequence if given only sequences, but what if it's given iterators, or other iterables that aren't sequences; filter can return something that's sort of like a sequence in that it can be repeatedly iterated but it can't be randomly-accessed. You really need a broader concept that integrates iteration and indexing, as in the C++ standard library. Swift provides the perfect example of how you could do something like that without losing the natural features of Python indexing and iteration. But it turns out to be complicated to explain, and to work with, and you end up writing multiple implementations for each iterable-processing function. I don't think the benefit is worth the cost. Another alternative is just to wrap any iterable in a caching LazyList type. This runs into complications because there are different choices that make sense for different uses (obviously you have to handle negative indexing, and obviously you have to handle infinite lists, so... Oops!), so it makes more sense to leave that up to the application to supply whatever lazy list type it needs and use it explicitly. From Nikolaus at rath.org Wed Sep 30 20:27:18 2015 From: Nikolaus at rath.org (Nikolaus Rath) Date: Wed, 30 Sep 2015 11:27:18 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560B1E49.7050102@canterbury.ac.nz> (Greg Ewing's message of "Wed, 30 Sep 2015 12:27:05 +1300") References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: <87r3lf9309.fsf@thinkpad.rath.org> On Sep 30 2015, Greg Ewing wrote: > Emile van Sebille wrote: > >> x = foo nor 'foo was None' > > Cute, but unfortunately it conflicts with established > usage of the word 'nor', which would suggest that > a nor b == not (a or b). The idea of using a named operator instead of some symbol has merit though. What about "a orn b" or "a orin b" (*or* *i*f *n*one)? The latter might be especially appealing to Tolkien readers :-). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F ?Time flies like an arrow, fruit flies like a Banana.? From abarnert at yahoo.com Wed Sep 30 20:32:52 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 18:32:52 +0000 (UTC) Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> References: <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> Message-ID: <1541280537.3561058.1443637972521.JavaMail.yahoo@mail.yahoo.com> I just remembered that the last few times related things came up, I wrote some blog posts going into details that I didn't want to have to dump on the list: * http://stupidpythonideas.blogspot.com/2013/08/lazy-restartable-iteration.html * http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-views.html * http://stupidpythonideas.blogspot.com/2014/07/lazy-cons-lists.html * http://stupidpythonideas.blogspot.com/2014/07/lazy-python-lists.html * http://stupidpythonideas.blogspot.com/2015/07/creating-new-sequence-type-is-easy.html The one about Swift-style map and filter views is, I think, the most interesting here. The tl;dr is that views (lazy sequences) are nifty, and there's nothing actually stopping Python for using them in more places, but they do add complexity, and the benefits probably don't outweigh the costs. > On Wednesday, September 30, 2015 11:26 AM, Andrew Barnert wrote: > > On Sep 30, 2015, at 11:11, M.-A. Lemburg wrote: > >> >>> On 30.09.2015 19:19, Neil Girdhar wrote: >>> I guess, I'm just asking for enumerate to go through the same > change that >>> range went through. Why wasn't it a problem for range? >> >> range() returns a list in Python 2 and a generator in Python 3. > > No it doesn't. It returns a (lazy) sequence. Not a generator, or any other > kind of iterator. > > I don't know why so many people seem to believe it returns a generator. > (And, when you point out what it returns, most of them say, "Why was that > changed from 2.x xrange, which returned a generator?" but xrange never > returned a generator either--it returned a lazy almost-a-sequence from the > start.) > > There's no conceptual reason that Python couldn't have more lazy > sequences, and tools to build your own lazy sequences more easily. > > However, things do get messy once you get into the details. For example, zip can > return a lazy sequence if given only sequences, but what if it's given > iterators, or other iterables that aren't sequences; filter can return > something that's sort of like a sequence in that it can be repeatedly > iterated but it can't be randomly-accessed. You really need a broader > concept that integrates iteration and indexing, as in the C++ standard library. > Swift provides the perfect example of how you could do something like that > without losing the natural features of Python indexing and iteration. But it > turns out to be complicated to explain, and to work with, and you end up writing > multiple implementations for each iterable-processing function. I don't > think the benefit is worth the cost. > > Another alternative is just to wrap any iterable in a caching LazyList type. > This runs into complications because there are different choices that make sense > for different uses (obviously you have to handle negative indexing, and > obviously you have to handle infinite lists, so... Oops!), so it makes more > sense to leave that up to the application to supply whatever lazy list type it > needs and use it explicitly. > From mistersheik at gmail.com Wed Sep 30 20:37:24 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 30 Sep 2015 18:37:24 +0000 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <1541280537.3561058.1443637972521.JavaMail.yahoo@mail.yahoo.com> References: <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> <1541280537.3561058.1443637972521.JavaMail.yahoo@mail.yahoo.com> Message-ID: Yup, the swift-style map is a great blog entry Andrew and exactly what I was proposing for enumerate. I 100% agree that "views (lazy sequences) are nifty, and there's nothing actually stopping Python for using them in more places, but they do add complexity, and the benefits probably don't outweigh the costs." However, I wonder what Python will look like 5 years from now. Maybe it will be time for more sequences. On Wed, Sep 30, 2015 at 2:32 PM Andrew Barnert wrote: > I just remembered that the last few times related things came up, I wrote > some blog posts going into details that I didn't want to have to dump on > the list: > > * > http://stupidpythonideas.blogspot.com/2013/08/lazy-restartable-iteration.html > * > > http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-views.html > * > http://stupidpythonideas.blogspot.com/2014/07/lazy-cons-lists.html > * > http://stupidpythonideas.blogspot.com/2014/07/lazy-python-lists.html > * > > http://stupidpythonideas.blogspot.com/2015/07/creating-new-sequence-type-is-easy.html > > The one about Swift-style map and filter views is, I think, the most > interesting here. The tl;dr is that views (lazy sequences) are nifty, and > there's nothing actually stopping Python for using them in more places, but > they do add complexity, and the benefits probably don't outweigh the costs. > > > > > On Wednesday, September 30, 2015 11:26 AM, Andrew Barnert < > abarnert at yahoo.com> wrote: > > > On Sep 30, 2015, at 11:11, M.-A. Lemburg wrote: > > > >> > >>> On 30.09.2015 19:19, Neil Girdhar wrote: > >>> I guess, I'm just asking for enumerate to go through the same > > change that > >>> range went through. Why wasn't it a problem for range? > >> > >> range() returns a list in Python 2 and a generator in Python 3. > > > > No it doesn't. It returns a (lazy) sequence. Not a generator, or any > other > > kind of iterator. > > > > I don't know why so many people seem to believe it returns a generator. > > (And, when you point out what it returns, most of them say, "Why was that > > changed from 2.x xrange, which returned a generator?" but xrange never > > returned a generator either--it returned a lazy almost-a-sequence from > the > > start.) > > > > There's no conceptual reason that Python couldn't have more lazy > > sequences, and tools to build your own lazy sequences more easily. > > > > However, things do get messy once you get into the details. For example, > zip can > > return a lazy sequence if given only sequences, but what if it's given > > iterators, or other iterables that aren't sequences; filter can return > > something that's sort of like a sequence in that it can be repeatedly > > iterated but it can't be randomly-accessed. You really need a broader > > concept that integrates iteration and indexing, as in the C++ standard > library. > > Swift provides the perfect example of how you could do something like > that > > without losing the natural features of Python indexing and iteration. > But it > > turns out to be complicated to explain, and to work with, and you end up > writing > > multiple implementations for each iterable-processing function. I don't > > think the benefit is worth the cost. > > > > Another alternative is just to wrap any iterable in a caching LazyList > type. > > This runs into complications because there are different choices that > make sense > > for different uses (obviously you have to handle negative indexing, and > > obviously you have to handle infinite lists, so... Oops!), so it makes > more > > sense to leave that up to the application to supply whatever lazy list > type it > > needs and use it explicitly. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Sep 30 20:39:04 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 11:39:04 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> Message-ID: <1B6D6E1C-6B5A-45C7-B59F-D671B857BADC@yahoo.com> On Sep 30, 2015, at 09:41, Jeff Hardy wrote: > > I actually really like 'otherwise', but it's certainly not brief: > > self.x = x if x is not None else [] > self.x = x otherwise [] > > That said, it's not used *that* often either, so maybe it's acceptable. The big problem with "otherwise", "orelse", or anything else that's a synonym of "or" is that it's a synonym of "or". There is nothing to tell the novice, or the sometime Python user who's just come back from 3 months with Ruby or Objective C or whatever, which one means a falsey check and which one means a None check. They both read the same in English, and the difference would be unique to Python rather than a general programming thing. So you'd end up with hundreds of articles and blog posts explaining, often poorly, the difference and when to use each--just as you see for == vs. ===, eq vs. eql, etc. in other languages. From mal at egenix.com Wed Sep 30 20:43:30 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 30 Sep 2015 20:43:30 +0200 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> References: <560C25B4.5050000@egenix.com> <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> Message-ID: <560C2D52.3080809@egenix.com> On 30.09.2015 20:26, Andrew Barnert via Python-ideas wrote: > On Sep 30, 2015, at 11:11, M.-A. Lemburg wrote: >> >>> On 30.09.2015 19:19, Neil Girdhar wrote: >>> I guess, I'm just asking for enumerate to go through the same change that >>> range went through. Why wasn't it a problem for range? >> >> range() returns a list in Python 2 and a generator in Python 3. > > No it doesn't. It returns a (lazy) sequence. Not a generator, or any other kind of iterator. You are right that it's not of a generator type and more like a lazy sequence. To be exact, it returns a range object and does implement the iter protocol via a range_iterator object. In Python 2 we have the xrange object which has similar properties, but not the same, e.g. you can't slice it. > I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.) Perhaps because it behaves like one ? :-) Unlike an iterator, it doesn't iterate over a sequence, but instead generates the values on the fly. FWIW: I don't think many people use the lazy sequence features of range(), e.g. the slicing or index support. By far most uses are in for-loops. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mistersheik at gmail.com Wed Sep 30 20:46:04 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 30 Sep 2015 18:46:04 +0000 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <560C2D52.3080809@egenix.com> References: <560C25B4.5050000@egenix.com> <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> <560C2D52.3080809@egenix.com> Message-ID: It doesn't behave like a generator because it doesn't implement send, throw, or close. It's a sequence because it implements: __getitem__, __len__ __contains__, __iter__, __reversed__, index, and count. On Wed, Sep 30, 2015 at 2:43 PM M.-A. Lemburg wrote: > > > On 30.09.2015 20:26, Andrew Barnert via Python-ideas wrote: > > On Sep 30, 2015, at 11:11, M.-A. Lemburg wrote: > >> > >>> On 30.09.2015 19:19, Neil Girdhar wrote: > >>> I guess, I'm just asking for enumerate to go through the same change > that > >>> range went through. Why wasn't it a problem for range? > >> > >> range() returns a list in Python 2 and a generator in Python 3. > > > > No it doesn't. It returns a (lazy) sequence. Not a generator, or any > other kind of iterator. > > You are right that it's not of a generator type > and more like a lazy sequence. To be exact, it returns > a range object and does implement the iter protocol via > a range_iterator object. > > In Python 2 we have the xrange object which has similar > properties, but not the same, e.g. you can't slice it. > > > I don't know why so many people seem to believe it returns a generator. > (And, when you point out what it returns, most of them say, "Why was that > changed from 2.x xrange, which returned a generator?" but xrange never > returned a generator either--it returned a lazy almost-a-sequence from the > start.) > > Perhaps because it behaves like one ? :-) > > Unlike an iterator, it doesn't iterate over a sequence, but instead > generates the values on the fly. > > FWIW: I don't think many people use the lazy sequence features > of range(), e.g. the slicing or index support. By far most > uses are in for-loops. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Sep 30 2015) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > 2015-09-25: Started a Python blog ... ... http://malemburg.com/ > 2015-10-21 : Python Meeting Duesseldorf > ... 21 days to go > > ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Wed Sep 30 20:49:53 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 30 Sep 2015 14:49:53 -0400 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: <560C25B4.5050000@egenix.com> <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> <560C2D52.3080809@egenix.com> Message-ID: On Wed, Sep 30, 2015 at 2:46 PM, Neil Girdhar wrote: > It doesn't behave like a generator because it doesn't implement send, > throw, or close. It is not a generator because Python says it is not: >>> isinstance(range(0), collections.Generator) False > It's a sequence because it implements: __getitem__, __len__ > __contains__, __iter__, __reversed__, index, and count. Ditto >>> isinstance(range(0), collections.Sequence) True -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Sep 30 21:19:05 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 12:19:05 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <560C2D52.3080809@egenix.com> References: <560C25B4.5050000@egenix.com> <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> <560C2D52.3080809@egenix.com> Message-ID: <08E63279-5DFF-4F7A-9B7B-B927D34BC4FA@yahoo.com> On Sep 30, 2015, at 11:43, M.-A. Lemburg wrote: > >> On 30.09.2015 20:26, Andrew Barnert via Python-ideas wrote: >>> On Sep 30, 2015, at 11:11, M.-A. Lemburg wrote: >>> >>>> On 30.09.2015 19:19, Neil Girdhar wrote: >>>> I guess, I'm just asking for enumerate to go through the same change that >>>> range went through. Why wasn't it a problem for range? >>> >>> range() returns a list in Python 2 and a generator in Python 3. >> >> No it doesn't. It returns a (lazy) sequence. Not a generator, or any other kind of iterator. > > You are right that it's not of a generator type > and more like a lazy sequence. To be exact, it returns > a range object and does implement the iter protocol via > a range_iterator object. To be exact, it returns an object which returns True for isinstance(r, Sequence), which offers correct implementations of the entire sequence protocol. In other words, it's not "more like a lazy sequence", it's _exactly_ a lazy sequence. In 2.3-2.5, xrange was a lazy "sequence-like object", and the docs explained how it didn't have all the methods of a sequence but otherwise was like one. When the collections ABCs were added, xrange (2.x)/range (3.x) started claiming to be a sequence, but the implementation was incomplete, so it was defective. This was fixed in 3.2 (which also made all of the sequence methods efficient?e.g., a range that fits into C longs can test an int for __contains__ in constant time). >> I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.) > > Perhaps because it behaves like one ? :-) > > Unlike an iterator, it doesn't iterate over a sequence, but instead > generates the values on the fly. You're confusing things even worse here. A generator is an iterator. It's a perfect subtype relationship. A range does not behave like a generator, or like any other kind of iterator. It behaves like a sequence. Laziness is orthogonal to the iterator-vs.-sequenceness. Dictionary views are also lazy but not iterators, for example. And there's nothing stopping you from writing a generator with "yield from f.readlines()" (except that it would be stupid), which would be an iterator despite being not lazy in any useful sense. Maybe the problem is that we don't have enough words. I've tried to use "view" to refer to a lazy non-iterator iterable (dict views, range, NumPy slices), which seems to help within the context of a single long explanation for a single user's problem, but I'm not sure that's something we'd want enshrined in the glossary, since it's a general English word that probably has wider usefulness. > FWIW: I don't think many people use the lazy sequence features > of range(), e.g. the slicing or index support. By far most > uses are in for-loops. I've used range as a sequence (or at least a reusable iterable, a sized object, and a container). I've answered questions from people on StackOverflow who are doing so, and seen the highest-rep Python answerer on SO suggest such uses to other people. I don't think I'd ever use the index method (although I did see one SO user who was doing so, to wrap up some arithmetic in a way that avoids a possibly off-by-one error, and wanted to know why it was so slow in 3.1 but worked fine in 3.2...), but there's no reason range should be a defective "not-quite-sequence" instead of a sequence. What would be the point of that? From random832 at fastmail.com Wed Sep 30 21:25:54 2015 From: random832 at fastmail.com (Random832) Date: Wed, 30 Sep 2015 15:25:54 -0400 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: <1443641154.129845.397904033.0BBCF99E@webmail.messagingengine.com> On Wed, Sep 30, 2015, at 13:19, Neil Girdhar wrote: > I guess, I'm just asking for enumerate to go through the same change that > range went through. Why wasn't it a problem for range? Range has always returned a sequence. Anyway, why stop there? Why not have map return a sequence? Zip? Anything that is a 1:1 mapping (or 1+1:1 in zip's case) could in principle be changed to return a sequence when given one. Who decides what does and doesn't benefit from random access? Or sliceability. It wouldn't be hard, in principle, to write a general-purpose function for slicing an iterator (i.e. returning an iterator that yields the elements that slicing a list of the same length would have given), particularly if it's limited to positive values. From abarnert at yahoo.com Wed Sep 30 21:31:39 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 12:31:39 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <08E63279-5DFF-4F7A-9B7B-B927D34BC4FA@yahoo.com> References: <560C25B4.5050000@egenix.com> <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> <560C2D52.3080809@egenix.com> <08E63279-5DFF-4F7A-9B7B-B927D34BC4FA@yahoo.com> Message-ID: <0BF836B6-7451-46E8-8DCB-25270348254E@yahoo.com> On Sep 30, 2015, at 12:19, Andrew Barnert via Python-ideas wrote: > > Maybe the problem is that we don't have enough words. I've tried to use "view" to refer to a lazy non-iterator iterable (dict views, range, NumPy slices), which seems to help within the context of a single long explanation for a single user's problem, but I'm not sure that's something we'd want enshrined in the glossary, since it's a general English word that probably has wider usefulness. I've just remembered that I said the exact same thing last time this discussion came up (less than 4 months ago), and someone pointed out to me that the docs already define the word "view" in the glossary specifically for dict/mapping views, and use the term "lazy sequence" in that definition, and use the term "virtual sequence" elsewhere. It's worth noting that dict views are not actually sequences, so defining view in terms of lazy sequence is probably not a good idea... Anyway, we probably don't need to invent any new terms; maybe we just need to pick some wording, define it clearly, and use it consistently throughout the docs. From abarnert at yahoo.com Wed Sep 30 21:42:05 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 12:42:05 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <1443641154.129845.397904033.0BBCF99E@webmail.messagingengine.com> References: <1443641154.129845.397904033.0BBCF99E@webmail.messagingengine.com> Message-ID: <2432916E-9398-45F9-BBB9-A49696837282@yahoo.com> On Sep 30, 2015, at 12:25, Random832 wrote: > >> On Wed, Sep 30, 2015, at 13:19, Neil Girdhar wrote: >> I guess, I'm just asking for enumerate to go through the same change that >> range went through. Why wasn't it a problem for range? > > Range has always returned a sequence. > > Anyway, why stop there? Why not have map return a sequence? Even when it's called with a set, or an iterator? Yes, you _could_ do that by lazily adding values to a list as needed, but that could lead to some confusing behavior. For example, len(m) or m[-1] has to evaluate the rest of the input, which could take infinite time (well, it'll run out of memory first?). > Zip? > Anything that is a 1:1 mapping (or 1+1:1 in zip's case) could in > principle be changed to return a sequence when given one. Who decides > what does and doesn't benefit from random access? The end user, of course. Some applications will never pass an infinite, or even very long, iterable into map, so they'd want random access and size and reversibility. Others won't ever want those features, but would want to pass in infinite iterators. That's why I think the best answer is to let people write (or install from PyPI) LazyList classes that fit their use cases, instead of trying to come up with one that tries to do everything and is misleading as often as it's useful. It's not actually impossible to design something that does a lot more without being inconsistent or confusing, but it's a bigger change than it appears at first glance, and would add a lot more complexity to the language than I think is worth it for the benefits. Again, see http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-views.html for details. > Or sliceability. It wouldn't be hard, in principle, to write a > general-purpose function for slicing an iterator (i.e. returning an > iterator that yields the elements that slicing a list of the same length > would have given), particularly if it's limited to positive values. You mean itertools.islice? From mal at egenix.com Wed Sep 30 21:47:17 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 30 Sep 2015 21:47:17 +0200 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <08E63279-5DFF-4F7A-9B7B-B927D34BC4FA@yahoo.com> References: <560C25B4.5050000@egenix.com> <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> <560C2D52.3080809@egenix.com> <08E63279-5DFF-4F7A-9B7B-B927D34BC4FA@yahoo.com> Message-ID: <560C3C45.1070107@egenix.com> On 30.09.2015 21:19, Andrew Barnert wrote: > On Sep 30, 2015, at 11:43, M.-A. Lemburg wrote: >> >>> On 30.09.2015 20:26, Andrew Barnert via Python-ideas wrote: >>>> On Sep 30, 2015, at 11:11, M.-A. Lemburg wrote: >>>> >>>>> On 30.09.2015 19:19, Neil Girdhar wrote: >>>>> I guess, I'm just asking for enumerate to go through the same change that >>>>> range went through. Why wasn't it a problem for range? >>>> >>>> range() returns a list in Python 2 and a generator in Python 3. >>> >>> No it doesn't. It returns a (lazy) sequence. Not a generator, or any other kind of iterator. >> >> You are right that it's not of a generator type >> and more like a lazy sequence. To be exact, it returns >> a range object and does implement the iter protocol via >> a range_iterator object. > > To be exact, it returns an object which returns True for isinstance(r, Sequence), which offers correct implementations of the entire sequence protocol. In other words, it's not "more like a lazy sequence", it's _exactly_ a lazy sequence. > > In 2.3-2.5, xrange was a lazy "sequence-like object", and the docs explained how it didn't have all the methods of a sequence but otherwise was like one. When the collections ABCs were added, xrange (2.x)/range (3.x) started claiming to be a sequence, but the implementation was incomplete, so it was defective. This was fixed in 3.2 (which also made all of the sequence methods efficient?e.g., a range that fits into C longs can test an int for __contains__ in constant time). > >>> I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.) >> >> Perhaps because it behaves like one ? :-) >> >> Unlike an iterator, it doesn't iterate over a sequence, but instead >> generates the values on the fly. > > You're confusing things even worse here. I guess I used the wrong level of detail. I was trying explain things in terms of concepts, not object types, isinstance() and ABCs. The reason was that the subject line makes a suggestion which simply doesn't fit the main concept behind enumerate: that of generating values on the fly instead of allocating them as sequence. We just got side tracked with range(), since Neil brought this up as example of why changing enumerate() should be possible. Back on the topic: >>> arg = range(10) >>> e = enumerate(arg) >>> e >>> import collections >>> isinstance(e, collections.Sequence) False >>> isinstance(e, collections.Iterator) True The way I understand the proposal is that Neil wants the above to return: >>> isinstance(e, collections.Sequence) True >>> isinstance(e, collections.Iterator) False iff isinstance(arg, collections.Sequence) and because this only makes sense iff e doesn't actually create a list, enumerate(arg) would have to return a lazy/virtual/whatever-term-you-use-for-generated-on-the-fly sequence :-) Regardless of this breaking backwards compatibility, what's the benefit of such a change ? Just like range(), enumerate() is most commonly used in for-loops, so the added sequence-ishness doesn't buy you anything much (except the need for more words in the glossary :-)). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ 2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From tjreedy at udel.edu Wed Sep 30 21:57:03 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 30 Sep 2015 15:57:03 -0400 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: References: Message-ID: On 9/30/2015 1:28 PM, Chris Barker wrote: > But again, we could add indexing to enumerate, and have it do the ugly > inefficient thing when it's using an underlying non-indexable iterator, If the ugly inefficient thing is to call list(iterable), then that does not work with unbounded iterables. Or the input iterable might produce inputs at various times in the future. -- Terry Jan Reedy From duda.piotr at gmail.com Wed Sep 30 22:58:33 2015 From: duda.piotr at gmail.com (Piotr Duda) Date: Wed, 30 Sep 2015 22:58:33 +0200 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: <560C12AD.90305@trueblade.com> References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> <560C12AD.90305@trueblade.com> Message-ID: What about something like: z = x if is not None else [] -- ???????? ?????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokoproject at gmail.com Wed Sep 30 23:15:11 2015 From: gokoproject at gmail.com (John Wong) Date: Wed, 30 Sep 2015 17:15:11 -0400 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> <560C12AD.90305@trueblade.com> Message-ID: On Wed, Sep 30, 2015 at 4:58 PM, Piotr Duda wrote: > What about something like: > z = x if is not None else [] > > > Pretty hard to read. z x are short, but in many real code that sentence has more characters and actually better off with today's anyway. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Sep 30 23:33:44 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 14:33:44 -0700 Subject: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence In-Reply-To: <560C3C45.1070107@egenix.com> References: <560C25B4.5050000@egenix.com> <147B2DEC-310B-498A-B468-03F0053F55B7@yahoo.com> <560C2D52.3080809@egenix.com> <08E63279-5DFF-4F7A-9B7B-B927D34BC4FA@yahoo.com> <560C3C45.1070107@egenix.com> Message-ID: <9E2C1054-551B-41B1-A0D1-2E54A1FA8BF3@yahoo.com> On Sep 30, 2015, at 12:47, M.-A. Lemburg wrote: > >> On 30.09.2015 21:19, Andrew Barnert wrote: >>> On Sep 30, 2015, at 11:43, M.-A. Lemburg wrote: >>> >>>>> On 30.09.2015 20:26, Andrew Barnert via Python-ideas wrote: >>>>>> On Sep 30, 2015, at 11:11, M.-A. Lemburg wrote: >>>>>> >>>>>> On 30.09.2015 19:19, Neil Girdhar wrote: >>>>>> I guess, I'm just asking for enumerate to go through the same change that >>>>>> range went through. Why wasn't it a problem for range? >>>>> >>>>> range() returns a list in Python 2 and a generator in Python 3. >>>> >>>> No it doesn't. It returns a (lazy) sequence. Not a generator, or any other kind of iterator. >>> >>> You are right that it's not of a generator type >>> and more like a lazy sequence. To be exact, it returns >>> a range object and does implement the iter protocol via >>> a range_iterator object. >> >> To be exact, it returns an object which returns True for isinstance(r, Sequence), which offers correct implementations of the entire sequence protocol. In other words, it's not "more like a lazy sequence", it's _exactly_ a lazy sequence. >> >> In 2.3-2.5, xrange was a lazy "sequence-like object", and the docs explained how it didn't have all the methods of a sequence but otherwise was like one. When the collections ABCs were added, xrange (2.x)/range (3.x) started claiming to be a sequence, but the implementation was incomplete, so it was defective. This was fixed in 3.2 (which also made all of the sequence methods efficient?e.g., a range that fits into C longs can test an int for __contains__ in constant time). >> >>>> I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.) >>> >>> Perhaps because it behaves like one ? :-) >>> >>> Unlike an iterator, it doesn't iterate over a sequence, but instead >>> generates the values on the fly. >> >> You're confusing things even worse here. > > I guess I used the wrong level of detail. I was trying > explain things in terms of concepts, not object types, > isinstance() and ABCs. But you're conflating the concept of "lazy" with the concept of "iterator". While generators, and iterators in general, are always technically lazy and nearly-always practically lazy, lazy things are not always iterators. Range, dict views, memoryview/buffer objects, NumPy slices, third-party lazy-list types, etc. are not generators, nor are they like generators in any way, except for being lazy. They're lazy sequences (well, except for the ones that aren't sequences, but they're still lazy containers, or lazy non-iterator iterables if you want to stick to terms in the glossary). And I think experienced developers conflating the two orthogonal concepts is part of what leads to novices getting confused. They think that if they want laziness, they need a generator. That makes them unable to even form the notion that what they really want is a view/lazy container/virtual container even when that's what they want. And it makes it hard to discuss issues like this thread clearly. (The fact that we don't have a term for "non-iterator iterable", and that experienced users and even the documentation sometimes use the term "sequence" for that, only makes things worse. For example, a dict_keys is not a sequence in any useful sense, but the glossary says it is, because there is no word for what it wants to say.) > Back on the topic: > > The way I understand the proposal is that Neil wants the > above to return: > >>>> isinstance(e, collections.Sequence) > True >>>> isinstance(e, collections.Iterator) > False > > iff isinstance(arg, collections.Sequence) That's one way to give him what he wants. But another option would be to always return a lazy sequence--the same kind you'd get if you picked one of the LazyList classes off PyPI (which provide a sequence interface by iterating and caching an iterable), and just wrote "e = LazyList(enumerate(arg))". This is still only creating the values on demand, and only consuming the iterator (if that's what it's given) as needed. (Of course it does mean you can now demand multiple values at once from that iterator, e.g., by calling e[10] or len(e) when arg was an iterator.) Or you could be even cleverer: enumerate always returns a lazy sequence, which uses random access if given a sequence, cached iteration if given any other iterable. That gives you the best of both worlds, right? Either of these avoids the problem that the type of enumerate depends on the type of its input, and the more serious problem that you can't tell from inspection whether what it returns is reusable or one-shot, but of course they introduce other problems. I don't think any of the three is worth doing. The three most consistent ways of doing this, if you were designing a language from scratch, seem to be: 1. Python: Always return an iterator; if people want sequence behavior (with whatever variety of laziness they desire), they can wrap it. 2. Haskell: Make everything in the language as lazy as possible, so you can just always return a list, and it will automatically be as lazy as possible. 3. Swift: Merge indexing and iteration, and bake in views as a fundamental concept, so you can always return a view, but whether its indices are random-access or not depends on whether its input's indices are. I'm not sure that #1 is the best of the three, but it is exactly what Python already has, and the other two would be very hard to get to from here, so I think #1 is the best for Python 3.6 (or 4.0). (The blog post I referenced earlier in the thread explores whether we could get to #3, or get part-way there, from here; if you don't agree that it would be harder than is worth doing, please read it and point out where I went wrong. Because that could be pretty cool.) From abarnert at yahoo.com Wed Sep 30 23:40:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 30 Sep 2015 14:40:15 -0700 Subject: [Python-ideas] PEP 505 (None coalescing operators) thoughts In-Reply-To: References: <56097AFB.1040906@oddbird.net> <5609985C.40603@oddbird.net> <56099C6F.90700@oddbird.net> <36AB4531-96BD-4D22-A957-B2199BA7912E@stufft.io> <85oagm2saa.fsf@benfinney.id.au> <5609AB62.5040503@oddbird.net> <20150929133542.4d04f6dd@anarchist.wooz.org> <560B1E49.7050102@canterbury.ac.nz> <560C12AD.90305@trueblade.com> Message-ID: <653DC9A1-3C9B-4E66-AA97-4CA24F21F9E9@yahoo.com> On Sep 30, 2015, at 13:58, Piotr Duda wrote: > > What about something like: > z = x if is not None else [] For something as simple as "x", this doesn't seem much better than what we can already do: z = x if x is not None else [] For a complex expression that might be incorrect/expensive/dangerous to call multiple times, it might be useful, but I think it would read a lot better with an explicit pronoun: z = dangerous_thing(arg) if it is not None else [] In natural languages, "it" is already complex enough; adding in subject elision makes parsing even harder. I think the same would be true here. Also, explicit "it" would be usable in other situations: z = dangerous_thing(arg) if it.value() > 3 else DummyValue(3) And it gives you something to look up in the docs: help(it) can tell me how to figure out what "it" refers to, but how would I find that out with your version? Anyway, I still don't like it even with the explicit pronoun, but maybe that's just AppleScript flashbacks. :)