From ethan at stoneleaf.us Wed Mar 1 00:34:34 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 28 Feb 2017 21:34:34 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <58B6229D.10203@stoneleaf.us> <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> Message-ID: <58B65D6A.8090806@stoneleaf.us> On 02/28/2017 06:02 PM, Michel Desmoulin wrote: > Le 01/03/2017 ? 02:23, Ethan Furman a ?crit : >> On 02/28/2017 05:18 PM, Michel Desmoulin wrote: >>> I love this proposal but Guido rejected it. Fighting for it right now >>> would probably be detrimental to the current proposed feature which >>> could potentially be more easily accepted. >> >> PEP 463 has a better chance of being accepted than this one does, for >> reasons that D'Aprano succinctly summarized. > > [...] not really a good reason to reject things for Python > because it's a language with a very diverse user base. Some bankers, > some web dev, some geographers, some mathematicians, some students, some > 3D graphists, etc. And the language value obvious, readable, predictable > code for all. True, but this means that a feature needs to apply to more than one group of folks. While I sympathize (truly!) with the nightmare you have to deal with, I don't think (and I certainly hope) that that quagmire is not common enough to justify adding .get() to list/tuple. An idea has to be useful for more than a small group of code bases to warrant inclusion in the stdlib, and even more so for a built-in. > Most people on this list have a specialty, because their speciality > don't see a use for the feature doesn't mean there is not one. > > So I provided on my last answer an explanation of what I would use it for. On the bright side, if enough use-cases of this type come up (pesky try/except for a simple situation), we may be able to get Guido to reconsider PEP 463. I certainly think PEP 463 makes a lot more sense that adding list.get(). -- ~Ethan~ From levkivskyi at gmail.com Wed Mar 1 03:40:09 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Wed, 1 Mar 2017 09:40:09 +0100 Subject: [Python-ideas] Expose a child factory using MappingProxyType in builtins In-Reply-To: References: Message-ID: On 28 February 2017 at 23:19, Victor Stinner wrote: > 2017-02-28 13:17 GMT+01:00 Michel Desmoulin : > > We have the immutable frozenset for sets and and tuples for lists. > > > > But we also have something to manipulate dict as immutable > datastructures: > > ... Sorry, I don't understand your proposition. > My interpretation of the idea is to reconsider https://www.python.org/dev/peps/pep-0416/ but put frozendict in collections, not in builtins. MappingProxyType could be a possible implementation (plus copying in constructor and hashing, as proposed above by Matt), but not necessarily. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Mar 1 04:31:25 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 1 Mar 2017 09:31:25 +0000 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> Message-ID: On 1 March 2017 at 01:31, qhlonline wrote: > My code example is not proper, Yes, may be this is better: > list.sort().revers( We can already do this - reversed(sorted(lst)) This is a long-established design decision in Python. It would need a *very* compelling use case to even think about changing it. Paul From wolfgang.maier at biologie.uni-freiburg.de Wed Mar 1 04:37:17 2017 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Wed, 1 Mar 2017 10:37:17 +0100 Subject: [Python-ideas] for/except/else Message-ID: I know what the regulars among you will be thinking (time machine, high bar for language syntax changes, etc.) so let me start by assuring you that I'm well aware of all of this, that I did research the topic before posting and that this is not the same as a previous suggestion using almost the same subject line. Now here's the proposal: allow an except (or except break) clause to follow for/while loops that will be executed if the loop was terminated by a break statement. The idea is certainly not new. In fact, Nick Coghlan, in his blog post http://python-notes.curiousefficiency.org/en/latest/python_concepts/break_else.html, uses it to provide a mental model for the meaning of the else following for/while, but, as far as I'm aware, he never suggested to make it legal Python syntax. Now while it's possible that Nick had a good reason not to do so, I think there would be three advantages to this: - as explained by Nick, the existence of "except break" would strengthen the analogy with try/except/else and help people understand what the existing else clause after a loop is good for. There has been much debate over the else clause in the past, most prominently, a long discussion on this list back in 2009 (I recommend interested people to start with Steven D'Aprano's Summary of it at https://mail.python.org/pipermail/python-ideas/2009-October/006155.html) that shows that for/else is misunderstood by/unknown to many Python programmers. - in some situations for/except/else would make code more readable by bringing logical alternatives closer together and to the same indentation level in the code. Consider a simple example (taken from the docs.python Tutorial: for n in range(2, 10): for x in range(2, n): if n % x == 0: print(n, 'equals', x, '*', n//x) break else: # loop fell through without finding a factor print(n, 'is a prime number') There are two logical outcomes of the inner for loop here - a given number can be either prime or not. However, the two code branches dealing with them end up at different levels of indentation and in different places, one inside and one outside the loop block. This second issue can become much more annoying in more complex code where the loop may contain additional code after the break statement. Now compare this to: for n in range(2, 10): for x in range(2, n): if n % x == 0: break except break: print(n, 'equals', x, '*', n//x) else: # loop fell through without finding a factor print(n, 'is a prime number') IMO, this reflects the logic better. - it could provide an elegant solution for the How to break out of two loops issue. This is another topic that comes up rather regularly (python-list, stackoverflow) and there is again a very good blog post about it, this time from Ned Batchelder at https://nedbatchelder.com/blog/201608/breaking_out_of_two_loops.html. Stealing his example, here's code (at least) a newcomer may come up with before realizing it can't work: s = "a string to examine" for i in range(len(s)): for j in range(i+1, len(s)): if s[i] == s[j]: answer = (i, j) break # How to break twice??? with for/except/else this could be written as: s = "a string to examine" for i in range(len(s)): for j in range(i+1, len(s)): if s[i] == s[j]: break except break: answer = (i, j) break So much for the pros. Of course there are cons, too. The classical one for any syntax change, of course, is: - burden on developers who have to implement and maintain the new syntax. Specifically, this proposal would make parsing/compiling of loops more complicated. Others include: - using except will make people think of exceptions and that may cause new confusion; while that's true, I would argue that, in fact, break and exceptions are rather similar features in that they are gotos in disguise, so except will still be used to catch an interruption in normal control flow. - the new syntax will not help people understand for/else if except is not used; importantly, I'm *not* proposing to disallow the use of for/else without except (if that would ever happen it would be in the *very* distant future) so that would indeed mean that people would encounter for/else, not only in legacy, but also in newly written code. However, I would expect that they would also start seeing for/except increasingly (not least because it solves the "break out of two loops" issue) so they would be nudged towards thinking of the else after for/while more like the else in try/except/else just as Nick proposes it. Interestingly, there has been another proposal on this list several years ago about allowing try/else without except, which I liked at the time and which would have made try/except/]else work exactly as my proposed for/except/else. Here it is: https://mail.python.org/pipermail/python-ideas/2011-November/012875.html - as a result of previous discussions about for/else a section was added to PEP3099 saying: "The else clause in while and for loops will not change semantics, or be removed." However, the proposal here is not to change the else clause semantics, but add an additional except clause. So that's it and while I'm well aware of the slim chances of this getting legal syntax, I would still be happy to get feedback from you :) Best, Wolfgang From stephanh42 at gmail.com Wed Mar 1 05:10:12 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Wed, 1 Mar 2017 11:10:12 +0100 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> Message-ID: It's even in the Programming FAQ: "In general in Python (and in all cases in the standard library) a method that mutates an object will return None to help avoid getting the two types of operations confused. So if you mistakenly write y.sort() thinking it will give you a sorted copy of y, you?ll instead end up with None, which will likely cause your program to generate an easily diagnosed error." Stephan 2017-03-01 10:31 GMT+01:00 Paul Moore : > On 1 March 2017 at 01:31, qhlonline wrote: > > My code example is not proper, Yes, may be this is better: > > list.sort().revers( > > We can already do this - reversed(sorted(lst)) > > This is a long-established design decision in Python. It would need a > *very* compelling use case to even think about changing it. > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sf at fermigier.com Wed Mar 1 05:26:00 2017 From: sf at fermigier.com (=?UTF-8?Q?St=C3=A9fane_Fermigier?=) Date: Wed, 1 Mar 2017 11:26:00 +0100 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> Message-ID: Cf. https://martinfowler.com/bliki/CommandQuerySeparation.html But: >>> l = [1,2,3] >>> l.pop() 3 >>> l [1, 2] => Not so true. S. On Wed, Mar 1, 2017 at 11:10 AM, Stephan Houben wrote: > It's even in the Programming FAQ: > > "In general in Python (and in all cases in the standard library) a method > that mutates an object will return None to help avoid getting the two > types of operations confused. So if you mistakenly write y.sort() thinking > it will give you a sorted copy of y, you?ll instead end up with None, > which will likely cause your program to generate an easily diagnosed error." > > Stephan > > 2017-03-01 10:31 GMT+01:00 Paul Moore : > >> On 1 March 2017 at 01:31, qhlonline wrote: >> > My code example is not proper, Yes, may be this is better: >> > list.sort().revers( >> >> We can already do this - reversed(sorted(lst)) >> >> This is a long-established design decision in Python. It would need a >> *very* compelling use case to even think about changing it. >> Paul >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- ?You never change things by ?ghting the existing reality. To change something, build a new model that makes the existing model obsolete.? ? R. Buckminster Fuller -------------- next part -------------- An HTML attachment was scrubbed... URL: From berker.peksag at gmail.com Wed Mar 1 05:49:15 2017 From: berker.peksag at gmail.com (=?UTF-8?Q?Berker_Peksa=C4=9F?=) Date: Wed, 1 Mar 2017 13:49:15 +0300 Subject: [Python-ideas] lazy use for optional import In-Reply-To: References: Message-ID: On Wed, Mar 1, 2017 at 2:31 AM, Nicolas Cellier wrote: > For example: > >> lazy import pylab as pl # do nothing for now >> >> # do stuff >> >> def plot(*args): >> pl.figure() # Will raise an ImportError at this point >> pl.plot(...) This can already be achieved without introducing a new keyword by using LazyLoader: https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader --Berker From steve at pearwood.info Wed Mar 1 06:56:04 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 1 Mar 2017 22:56:04 +1100 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: <20170301115603.GY5689@ando.pearwood.info> On Wed, Mar 01, 2017 at 10:37:17AM +0100, Wolfgang Maier wrote: > Now here's the proposal: allow an except (or except break) clause to > follow for/while loops that will be executed if the loop was terminated > by a break statement. Let me see if I understand the proposal in full. You would allow: for i in (1, 2, 3): print(i) if i == 2: break except break: # or just except assert i == 2 print("a break was executed") else: print("never reached") # this is never reached print("for loop is done") as an alternative to something like: broke_out = False for i in (1, 2, 3): print(i) if i == 2: broke_out = True break else: print("never reached") # this is never reached if broke_out: assert i == 2 print("a break was executed") print("for loop is done") I must admit the suggestion seems a little bit neater than having to manage a flag myself, but on the other hand I can't remember the last time I've needed to manage a flag like that. And on the gripping hand, this is even simpler than both alternatives: for i in (1, 2, 3): print(i) if i == 2: assert i == 2 print("a break was executed") break else: print("never reached") # this is never reached print("for loop is done") There are some significant unanswered questions: - Does it matter which order the for...except...else are in? Obviously the for block must come first, but apart from that? - How is this implemented? Currently "break" is a simple unconditional GOTO which jumps past the for block. This will need to change to something significantly more complex. - There are other ways to exit a for-loop than just break. Which of them, if any, will also run the except block? -- Steve From rhodri at kynesim.co.uk Wed Mar 1 06:44:09 2017 From: rhodri at kynesim.co.uk (Rhodri James) Date: Wed, 1 Mar 2017 11:44:09 +0000 Subject: [Python-ideas] a bad feature in Python syntax In-Reply-To: <75139e7e.34e0.15a87cc97f4.Coremail.mlet_it_bew@126.com> References: <75139e7e.34e0.15a87cc97f4.Coremail.mlet_it_bew@126.com> Message-ID: On 01/03/17 02:56, ????? wrote: > I'm bited once: > >>> '' in {} == False > False > >>> ('' in {}) == False > True > > # '' in {} == False ==>> ('' in {}) and ({} == False) ==>> False! > > I think only compare operations should be chained. I think comparing against False (or True) is bad idea. I would certainly reject any code doing it that came past me for review. Use "not" instead. -- Rhodri James *-* Kynesim Ltd From rhodri at kynesim.co.uk Wed Mar 1 07:07:12 2017 From: rhodri at kynesim.co.uk (Rhodri James) Date: Wed, 1 Mar 2017 12:07:12 +0000 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: Much snippage; apologies, Wolfgang! On 01/03/17 09:37, Wolfgang Maier wrote: > Now here's the proposal: allow an except (or except break) clause to > follow for/while loops that will be executed if the loop was terminated > by a break statement. [snip] > - in some situations for/except/else would make code more readable by > bringing logical alternatives closer together and to the same > indentation level in the code. Consider a simple example (taken from the > docs.python Tutorial: > > for n in range(2, 10): > for x in range(2, n): > if n % x == 0: > print(n, 'equals', x, '*', n//x) > break > else: > # loop fell through without finding a factor > print(n, 'is a prime number') > > There are two logical outcomes of the inner for loop here - a given > number can be either prime or not. However, the two code branches > dealing with them end up at different levels of indentation and in > different places, one inside and one outside the loop block. This second > issue can become much more annoying in more complex code where the loop > may contain additional code after the break statement. > > Now compare this to: > > for n in range(2, 10): > for x in range(2, n): > if n % x == 0: > break > except break: > print(n, 'equals', x, '*', n//x) > else: > # loop fell through without finding a factor > print(n, 'is a prime number') > > IMO, this reflects the logic better. It reads worse to me, I'm afraid. Moving the "print" disassociates it from the condition that caused it, making it that bit harder to understand. You'd have a more compelling case with a complex loop with multiple breaks all requiring identical processing. However my experience is that such cases are rare, and are usually attempts to do exception handling with out actually using exceptions. I'm not terribly inclined to help people make more work for themselves. > - it could provide an elegant solution for the How to break out of two > loops issue. This is another topic that comes up rather regularly > (python-list, stackoverflow) and there is again a very good blog post > about it, this time from Ned Batchelder at > https://nedbatchelder.com/blog/201608/breaking_out_of_two_loops.html. > Stealing his example, here's code (at least) a newcomer may come up with > before realizing it can't work: > > s = "a string to examine" > for i in range(len(s)): > for j in range(i+1, len(s)): > if s[i] == s[j]: > answer = (i, j) > break # How to break twice??? > > with for/except/else this could be written as: > > s = "a string to examine" > for i in range(len(s)): > for j in range(i+1, len(s)): > if s[i] == s[j]: > break > except break: > answer = (i, j) > break That is a better use case. I must admit I normally handle this sort of thing by putting the loops in a function and returning out of the inner loop. -- Rhodri James *-* Kynesim Ltd From wolfgang.maier at biologie.uni-freiburg.de Wed Mar 1 07:16:11 2017 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Wed, 1 Mar 2017 13:16:11 +0100 Subject: [Python-ideas] for/except/else In-Reply-To: <5e58cb7c-e811-95bd-85ae-1d3e29290be2@biologie.uni-freiburg.de> References: <5e58cb7c-e811-95bd-85ae-1d3e29290be2@biologie.uni-freiburg.de> Message-ID: On 01.03.2017 12:56, Steven D'Aprano wrote: > On Wed, Mar 01, 2017 at 10:37:17AM +0100, Wolfgang Maier wrote: > >> Now here's the proposal: allow an except (or except break) clause to >> follow for/while loops that will be executed if the loop was terminated >> by a break statement. > > Let me see if I understand the proposal in full. You would allow: > > > for i in (1, 2, 3): > print(i) > if i == 2: > break > except break: # or just except > assert i == 2 > print("a break was executed") > else: > print("never reached") # this is never reached > print("for loop is done") > > > as an alternative to something like: > > > broke_out = False > for i in (1, 2, 3): > print(i) > if i == 2: > broke_out = True > break > else: > print("never reached") # this is never reached > if broke_out: > assert i == 2 > print("a break was executed") > print("for loop is done") > > correct. > I must admit the suggestion seems a little bit neater than having to > manage a flag myself, but on the other hand I can't remember the last > time I've needed to manage a flag like that. > > And on the gripping hand, this is even simpler than both alternatives: > > for i in (1, 2, 3): > print(i) > if i == 2: > assert i == 2 > print("a break was executed") > break > else: > print("never reached") # this is never reached > print("for loop is done") > Right, that's how you'd likely implement the behavior today, but see my argument about the two alternative code branches not ending up together at the same level of indentation. > > > There are some significant unanswered questions: > > - Does it matter which order the for...except...else are in? > Obviously the for block must come first, but apart from that? > Just like in try/except/else, the order would be for (or while)/except/else with the difference that both except and else would be optional. > - How is this implemented? Currently "break" is a simple > unconditional GOTO which jumps past the for block. This will > need to change to something significantly more complex. > Yeah, I know that's why I listed this under cons. > - There are other ways to exit a for-loop than just break. Which > of them, if any, will also run the except block? > None of them (though, honestly, I cannot think of anything but exceptions here; what do you have in mind?) > > From wolfgang.maier at biologie.uni-freiburg.de Wed Mar 1 07:33:41 2017 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Wed, 1 Mar 2017 13:33:41 +0100 Subject: [Python-ideas] for/except/else In-Reply-To: <20170301115603.GY5689@ando.pearwood.info> References: <20170301115603.GY5689@ando.pearwood.info> Message-ID: On 01.03.2017 12:56, Steven D'Aprano wrote: > > - How is this implemented? Currently "break" is a simple > unconditional GOTO which jumps past the for block. This will > need to change to something significantly more complex. > one way to implement this with unconditional GOTOs would be (in pseudocode): LOOP: on break GOTO EXCEPT ELSE: ... GOTO THEN EXCEPT: ... THEN: ... So at the byte-code level (but only there) the order of except and else would be reversed. Was that a reason why you were asking about the order of except and else in my proposal? Anyway, I'm sure there are people much more skilled at compiler programming than me here. From clint.hepner at gmail.com Wed Mar 1 08:20:30 2017 From: clint.hepner at gmail.com (Clint Hepner) Date: Wed, 1 Mar 2017 08:20:30 -0500 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: > On 2017 Mar 1 , at 4:37 a, Wolfgang Maier wrote: > > I know what the regulars among you will be thinking (time machine, high bar for language syntax changes, etc.) so let me start by assuring you that I'm well aware of all of this, that I did research the topic before posting and that this is not the same as a previous suggestion using almost the same subject line. > > Now here's the proposal: allow an except (or except break) clause to follow for/while loops that will be executed if the loop was terminated by a break statement. > > The idea is certainly not new. In fact, Nick Coghlan, in his blog post > http://python-notes.curiousefficiency.org/en/latest/python_concepts/break_else.html, uses it to provide a mental model for the meaning of the else following for/while, but, as far as I'm aware, he never suggested to make it legal Python syntax. > > Now while it's possible that Nick had a good reason not to do so, I think there would be three advantages to this: > > - as explained by Nick, the existence of "except break" would strengthen the analogy with try/except/else and help people understand what the existing else clause after a loop is good for. > There has been much debate over the else clause in the past, most prominently, a long discussion on this list back in 2009 (I recommend interested people to start with Steven D'Aprano's Summary of it at https://mail.python.org/pipermail/python-ideas/2009-October/006155.html) that shows that for/else is misunderstood by/unknown to many Python programmers. > I?d like to see some examples where nested for loops couldn?t easily be avoided in the first place. > for n in range(2, 10): > for x in range(2, n): > if n % x == 0: > print(n, 'equals', x, '*', n//x) > break > else: > # loop fell through without finding a factor > print(n, 'is a prime number') Replace the inner loop with a call to any consuming a generator for n in range(2,10): if any(n % x == 0 for x in range(2,n)): print('{} equals {} * {}'.format(n, x, n//x)) else: print('{} is prime'.format(n)) > > - it could provide an elegant solution for the How to break out of two loops issue. This is another topic that comes up rather regularly (python-list, stackoverflow) and there is again a very good blog post about it, this time from Ned Batchelder at https://nedbatchelder.com/blog/201608/breaking_out_of_two_loops.html. > Stealing his example, here's code (at least) a newcomer may come up with before realizing it can't work: > > s = "a string to examine" > for i in range(len(s)): > for j in range(i+1, len(s)): > if s[i] == s[j]: > answer = (i, j) > break # How to break twice??? Replace the inner loop with a call to str.find for i, c in enumerate(s): j = s.find(c, i+1) if j >= 0: answer = (i, j) break From steve at pearwood.info Wed Mar 1 08:25:12 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 2 Mar 2017 00:25:12 +1100 Subject: [Python-ideas] add a always turn on "assert" In-Reply-To: <3c613d9b.30cc.15a87c11743.Coremail.mlet_it_bew@126.com> References: <3c613d9b.30cc.15a87c11743.Coremail.mlet_it_bew@126.com> Message-ID: <20170301132512.GZ5689@ando.pearwood.info> On Wed, Mar 01, 2017 at 10:44:22AM +0800, ????? wrote: > "assert" is good, but it is used as a guard frequently. Using assert as a guard is an abuse of assert. We should not encourage it. assert has many uses already, but runtime guards is not one of them: http://import-that.dreamwidth.org/676.html > We can make such usage legal by adding a new syntax: > assert bool_expr, ExceptionType, True What is the purpose of the constant True? Why is there no way to set the exception message? Of course assert-as-guard is possible, but why should we do that? There is already a perfectly good way of doing guards which is more general and more powerful than anything you can do with a single assert statement. if condition: if something(): raise TypeError("message") elif another(): raise ValueError(argument) else: raise RuntimeError("error...") How can you write that with your syntax? assert condition and something(), TypeError, True assert condition and another(), ValueError, True assert condition, RuntimeError, True Your syntax has much more boilerplate, it repeats itself, and you can't specify the error message. > suggest reasons: > 1) even "__debug__" turn off, assert is working > assertion as guard. This goes against the purpose of assert. The reason assert exists is so that it can be disabled according to __debug__. > 2) to avoid boilerplate code > I write code like this: > if pred(....) or pred(....): > raise ValueError('pred(....) or pred(....)') > > Simplifed: > assert pred(...), ValueError, True > # the above line will be printed when error. > # I need not to copy the condition! What if the source code is not available? Then all you will see is a mystery ValueError, with no message, and no source. If the condition is long: # indent # indent # indent # indent assert (long_function_name(alpha, beta, gamma, delta) or another_function(a, b, c) == something(x, y , z) or some_condition), ValueError, True then all you will see if the assertion fails is something like: Traceback (most recent call last): or some_condition), ValueError, True ValueError It is better to be explicit about creating good, useful error messages, not to rely on Python printing the source code. > 3) future: "assert bool_expr, ET, False" > To aid static tool, like Proof System. What does that mean? What is the purpose of the False? -- Steve From steve at pearwood.info Wed Mar 1 08:50:26 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 2 Mar 2017 00:50:26 +1100 Subject: [Python-ideas] __repr__: to support pprint In-Reply-To: References: <320dd54e.260.15a86f38c41.Coremail.mlet_it_bew@126.com> <4e5598e6.1ea2.15a87727516.Coremail.qhlonline@163.com> Message-ID: <20170301135025.GA5689@ando.pearwood.info> On Wed, Mar 01, 2017 at 01:04:00PM +1100, Chris Angelico wrote: > On Wed, Mar 1, 2017 at 12:58 PM, Matthias welp wrote: > > You are free to experiment with overriding/extending __repr__ for your > > internal usage, but please note that it might break external libraries > > depending on obj.__repr__ or repr(obj), and print() might break when > > using built-in types as containers for your objects. > > Given that this started out with a post about pprint, maybe a more > viable approach would be a dedicated __pprint__ hook? That might be > easier to push through. (I'm not pushing for it though.) I like the idea of a __pprint__ dunder method in principle. But... what exactly is it supposed to do? In general, `self` cannot pretty-print itself, since it doesn't know if it is embedded inside another object, or whether the pretty-printer wants to constrain it to a certain width or number of lines. For example, if the printer wants to display a list using (say) three columns, aligned at the decimal points: [ 31.08713241722979, 983.3425750471824, -7234.474117265795, 0.7563959975516478, 21.08150956898832, 98.85759870687133, 219.76826846350912, -7.640051348488824, 0.5731518549129719, 32.961711789297816, 0.7563959975516478, 953487.1772710333 ] how would each float know how many leading spaces to use to align with the rest of the column? I don't think it could. Only the pretty printer itself would know how to align the float reprs, whether to truncate the displays, etc. It is possible that we could come up with a pretty-printing protocol, but that wouldn't be a trivial job. -- Steve From mathieu.beal at gmail.com Wed Mar 1 09:04:29 2017 From: mathieu.beal at gmail.com (Mathieu BEAL) Date: Wed, 1 Mar 2017 15:04:29 +0100 Subject: [Python-ideas] PEP 8 coding style included in grammar ? Message-ID: I was wondering why the PEP coding style ( https://www.python.org/dev/peps/pep-0008/) is not natively included in python grammar ? For instance, both, *function definition* and *class definition*, are using the same ?NAME? token. ((see, https://docs.python.org/3/reference/grammar.html). classdef: 'class' NAME ['(' [arglist] ')'] ':' suite funcdef: 'def' NAME parameters ['->' test] ':' suite Then, we are using libraries like pyflake, flake8, pylint, to check the coding style. It seems useful to natively make distinction between NAME of classdef and NAME of funcdef, with something like: classdef: 'class' NAME_CLASS ['(' [arglist] ')'] ':' suite funcdef: 'def' NAME_FUNC parameters ['->' test] ':' suite NAME_CLASS: ? Class name should normally use the CapWord convention; NAME_FUNC: ? Function name should be lowercase, with words separated by underscore as necessary to improve readability, ? mixedCase is allowed; STYLE_CAP_WORDS = r?([A-Z]{1}[a-z]+(_[A-Z]{1}[a-z]+)*)? STYLE_UNDERSCORE_LOW_WORDS = r?([a-z]+(_[a-z]+)*)? STYLE_MIXED_CASE = r?([a-z]{1,}([A-Z]{1}[a-z]+)*?) NAME_FUNC = STYLE_LOW_WORDS | STYLE_UNDERSCORE_LOW_WORDS | STYLE_MIXED_CASE NAME_CLASS = STYLE_CAP_WORDS | NAME_FUNC # naming convention for functions may be used in cases where the interface is documented and used primarily as a callable (pep8 source). I didn't find any information about this point in a previous post or elsewhere. Is there any reason for such a choice ? or is it a dark old discussion we never talk about ? Mathieu ------------- import re # Testing the first part of the class def, only. style_cap_words = r'^class ([A-Z]{1}[a-z]+(_?[A-Z]{1}[a-z]+)*)$' compiled_style_cap_words = re.compile(style_cap_words) status_samples = {True:['class Hello', 'class Hel_Lo', # ok if Hel and Lo are 2 words 'class HelLo', # same. ], False:['class HellO', 'class _He', 'class HEllo', 'class HE', 'class H', 'class h', 'class Hell_oo', 'class Hell_', 'class _hell', 'class H_E_L_L_O', 'class _H']} def is_matched(sample, pattern=compiled_style_cap_words): matched = pattern.match(sample) return matched is not None for status, samples in status_samples.items(): for sample in samples: is_correct = status == is_matched(sample) assert is_correct, '%s is not correct, required %s' % (sample, status) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abrault at mapgears.com Wed Mar 1 09:09:21 2017 From: abrault at mapgears.com (Alexandre Brault) Date: Wed, 1 Mar 2017 09:09:21 -0500 Subject: [Python-ideas] PEP 8 coding style included in grammar ? In-Reply-To: References: Message-ID: <6477c334-0330-f0a2-87e0-c2f6fac120dc@mapgears.com> Long story short, it's because there can be good reasons to ignore PEP8 naming conventions. Linting tools can be taught to skip over an intentional PEP8 violation. A grammar rule can't Alex On 2017-03-01 09:04 AM, Mathieu BEAL wrote: > > I was wondering why the PEP coding style > (https://www.python.org/dev/peps/pep-0008/) is not natively included > in python grammar ? > > For instance, both, /function definition/ and /class definition/, are > using the same ?NAME? token. ((see, > https://docs.python.org/3/reference/grammar.html). > > classdef: 'class' NAME ['(' [arglist] ')'] ':' suite > > funcdef:'def' NAME parameters ['->' test] ':' suite > > > Then, we are using libraries like pyflake, flake8, pylint, to check > the coding style. > > > It seems useful to natively make distinction between NAME of classdef > and NAME of funcdef, with something like: > > classdef: 'class' NAME_CLASS ['(' [arglist] ')'] ':' suite > > funcdef:'def' NAME_FUNC parameters ['->' test] ':' suite > > > NAME_CLASS: > > ? Class name should normally use the CapWord convention; > > NAME_FUNC: > > ? Function name should be lowercase, with words separated by > underscore as necessary to improve readability, > > ? mixedCase is allowed; > > STYLE_CAP_WORDS = r?([A-Z]{1}[a-z]+(_[A-Z]{1}[a-z]+)*)? > > STYLE_UNDERSCORE_LOW_WORDS = r?([a-z]+(_[a-z]+)*)? > > STYLE_MIXED_CASE = r?([a-z]{1,}([A-Z]{1}[a-z]+)*?) > > NAME_FUNC = STYLE_LOW_WORDS > > | STYLE_UNDERSCORE_LOW_WORDS > > | STYLE_MIXED_CASE > > NAME_CLASS = STYLE_CAP_WORDS > > | NAME_FUNC # naming convention for functions may be > used in cases where the interface is documented and used primarily as > a callable (pep8 source). > > > > I didn't find any information about this point in a previous post or > elsewhere. Is there any reason for such a choice ? or is it a dark old > discussion we never talk about ? > > > Mathieu > > > ------------- > > import re > > > > # Testing the first part of the class def, only. > > style_cap_words = r'^class ([A-Z]{1}[a-z]+(_?[A-Z]{1}[a-z]+)*)$' > > compiled_style_cap_words = re.compile(style_cap_words) > > > > status_samples = {True:['class Hello', > > 'class Hel_Lo', # ok if Hel and Lo are 2 words > > 'class HelLo', # same. > > ], > > False:['class HellO', > > 'class _He', > > 'class HEllo', > > 'class HE', > > 'class H', > > 'class h', > > 'class Hell_oo', > > 'class Hell_', > > 'class _hell', > > 'class H_E_L_L_O', > > 'class _H']} > > > > def is_matched(sample, pattern=compiled_style_cap_words): > > matched = pattern.match(sample) > > return matched is not None > > > > for status, samples in status_samples.items(): > > for sample in samples: > > is_correct = status == is_matched(sample) > > assert is_correct, '%s is not correct, required %s' % (sample, > status) > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Mar 1 09:35:35 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 1 Mar 2017 14:35:35 +0000 Subject: [Python-ideas] __repr__: to support pprint In-Reply-To: <20170301135025.GA5689@ando.pearwood.info> References: <320dd54e.260.15a86f38c41.Coremail.mlet_it_bew@126.com> <4e5598e6.1ea2.15a87727516.Coremail.qhlonline@163.com> <20170301135025.GA5689@ando.pearwood.info> Message-ID: On 1 March 2017 at 13:50, Steven D'Aprano wrote: > It is possible that we could come up with a pretty-printing protocol, > but that wouldn't be a trivial job. I'd be inclined to do this via simplegeneric. Let pprint do what it currently does, but allow users to register implementations for specific classes as they wish. I'm not sure how much work this would be - maybe a 3rd party "better pprint" module could prototype the approach to see if it's practical. Paul From eric at trueblade.com Wed Mar 1 09:48:22 2017 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 1 Mar 2017 09:48:22 -0500 Subject: [Python-ideas] __repr__: to support pprint In-Reply-To: References: <320dd54e.260.15a86f38c41.Coremail.mlet_it_bew@126.com> <4e5598e6.1ea2.15a87727516.Coremail.qhlonline@163.com> <20170301135025.GA5689@ando.pearwood.info> Message-ID: <37AB7F87-9186-49D9-95F8-945FF318D3F2@trueblade.com> > On Mar 1, 2017, at 9:35 AM, Paul Moore wrote: > >> On 1 March 2017 at 13:50, Steven D'Aprano wrote: >> It is possible that we could come up with a pretty-printing protocol, >> but that wouldn't be a trivial job. > > I'd be inclined to do this via simplegeneric. Let pprint do what it > currently does, but allow users to register implementations for > specific classes as they wish. If you mean functools.singledispatch, then I agree. It's even mentioned as a motivating case in PEP 443. > I'm not sure how much work this would be - maybe a 3rd party "better > pprint" module could prototype the approach to see if it's practical. > Paul That seems reasonable. Eric. From p.f.moore at gmail.com Wed Mar 1 10:02:10 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 1 Mar 2017 15:02:10 +0000 Subject: [Python-ideas] __repr__: to support pprint In-Reply-To: <37AB7F87-9186-49D9-95F8-945FF318D3F2@trueblade.com> References: <320dd54e.260.15a86f38c41.Coremail.mlet_it_bew@126.com> <4e5598e6.1ea2.15a87727516.Coremail.qhlonline@163.com> <20170301135025.GA5689@ando.pearwood.info> <37AB7F87-9186-49D9-95F8-945FF318D3F2@trueblade.com> Message-ID: On 1 March 2017 at 14:48, Eric V. Smith wrote: > > If you mean functools.singledispatch, then I agree. It's even mentioned as a motivating case in PEP 443. Sorry - that was a thinko on my part - yes, functools.singledispatch. Paul From ethan at stoneleaf.us Wed Mar 1 12:01:10 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 01 Mar 2017 09:01:10 -0800 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: <58B6FE56.8090204@stoneleaf.us> On 03/01/2017 01:37 AM, Wolfgang Maier wrote: > Now here's the proposal: allow an except (or except break) clause to follow for/while loops that will be executed if the > loop was terminated by a break statement. I find the proposal interesting. More importantly, the proposal is well written and clear -- thank you! -- ~Ethan~ From cory at lukasa.co.uk Wed Mar 1 12:23:20 2017 From: cory at lukasa.co.uk (Cory Benfield) Date: Wed, 1 Mar 2017 17:23:20 +0000 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> Message-ID: <4F5D02DD-425F-484A-9BAD-7A0476853A16@lukasa.co.uk> > On 1 Mar 2017, at 10:26, St?fane Fermigier wrote: > > Cf. https://martinfowler.com/bliki/CommandQuerySeparation.html > > But: > > >>> l = [1,2,3] > >>> l.pop() > 3 > >>> l > [1, 2] > > => Not so true. > > S. This is naturally a different circumstance: pop must return the element it popped, otherwise it would just be del. Surely you aren?t suggesting that pop should return self? Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: From sf at fermigier.com Wed Mar 1 12:26:48 2017 From: sf at fermigier.com (=?UTF-8?Q?St=C3=A9fane_Fermigier?=) Date: Wed, 1 Mar 2017 18:26:48 +0100 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: <4F5D02DD-425F-484A-9BAD-7A0476853A16@lukasa.co.uk> References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> <4F5D02DD-425F-484A-9BAD-7A0476853A16@lukasa.co.uk> Message-ID: Definitively not, just like M. Fowler: "Meyer likes to use command-query separation absolutely, but there are exceptions. Popping a stack is a good example of a query that modifies state. Meyer correctly says that you can avoid having this method, but it is a useful idiom. So I prefer to follow this principle when I can, but I'm prepared to break it to get my pop." What I wanted to point out is that the paragraph quoted by Stephan ("In general in Python (and in all cases in the standard library) a method that mutates an object will return None to help avoid getting the two types of operations confused. So if you mistakenly write y.sort() thinking it will give you a sorted copy of y, you?ll instead end up with None, which will likely cause your program to generate an easily diagnosed error.") doesn't seem to be true in this case. S. On Wed, Mar 1, 2017 at 6:23 PM, Cory Benfield wrote: > > On 1 Mar 2017, at 10:26, St?fane Fermigier wrote: > > Cf. https://martinfowler.com/bliki/CommandQuerySeparation.html > > But: > > >>> l = [1,2,3] > >>> l.pop() > 3 > >>> l > [1, 2] > > => Not so true. > > S. > > > This is naturally a different circumstance: pop must return the element it > popped, otherwise it would just be del. Surely you aren?t suggesting that > pop should return self? > > Cory > -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- ?You never change things by ?ghting the existing reality. To change something, build a new model that makes the existing model obsolete.? ? R. Buckminster Fuller -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Wed Mar 1 12:39:11 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Wed, 1 Mar 2017 11:39:11 -0600 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> <4F5D02DD-425F-484A-9BAD-7A0476853A16@lukasa.co.uk> Message-ID: >From my experience teaching Python to non-programmers, it's a huge hurdle/nightmare to teach functions/methods that modify objects in-place vs. return a value that must be reassigned. Behold Pandas's DataFrame's sort method, which has an optional `in_place` argument that defaults to *False*, which despite being a method that looks like mylist.sort(), works differently from the method on lists, but more like the sorted *function*, arrrgh! That was a fun session... I think for consistency, having object methods that can act in-place (semantics like mylist.pop, mylist.append are nice as St?fane suggests) only act in-place, and functions return a new object for reassignment would help new users. Maybe I'm teaching it poorly, suggestions welcome. Nick On Wed, Mar 1, 2017 at 11:26 AM, St?fane Fermigier wrote: > Definitively not, just like M. Fowler: "Meyer likes to use command-query > separation absolutely, but there are exceptions. Popping a stack is a good > example of a query that modifies state. Meyer correctly says that you can > avoid having this method, but it is a useful idiom. So I prefer to follow > this principle when I can, but I'm prepared to break it to get my pop." > > What I wanted to point out is that the paragraph quoted by Stephan ("In > general in Python (and in all cases in the standard library) a method that > mutates an object will return None to help avoid getting the two types of > operations confused. So if you mistakenly write y.sort() thinking it will > give you a sorted copy of y, you?ll instead end up with None, which will > likely cause your program to generate an easily diagnosed error.") doesn't > seem to be true in this case. > > S. > > On Wed, Mar 1, 2017 at 6:23 PM, Cory Benfield wrote: > >> >> On 1 Mar 2017, at 10:26, St?fane Fermigier wrote: >> >> Cf. https://martinfowler.com/bliki/CommandQuerySeparation.html >> >> But: >> >> >>> l = [1,2,3] >> >>> l.pop() >> 3 >> >>> l >> [1, 2] >> >> => Not so true. >> >> S. >> >> >> This is naturally a different circumstance: pop must return the element >> it popped, otherwise it would just be del. Surely you aren?t suggesting >> that pop should return self? >> >> Cory >> > > > > -- > Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier > - http://linkedin.com/in/sfermigier > Founder & CEO, Abilian - Enterprise Social Software - > http://www.abilian.com/ > Chairman, Free&OSS Group / Systematic Cluster - > http://www.gt-logiciel-libre.org/ > Co-Chairman, National Council for Free & Open Source Software (CNLL) - > http://cnll.fr/ > Founder & Organiser, PyData Paris - http://pydata.fr/ > --- > ?You never change things by ?ghting the existing reality. To change > something, build a new model that makes the existing model obsolete.? ? > R. Buckminster Fuller > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Wed Mar 1 12:59:03 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Wed, 1 Mar 2017 18:59:03 +0100 Subject: [Python-ideas] PEP 8 coding style included in grammar ? In-Reply-To: <6477c334-0330-f0a2-87e0-c2f6fac120dc@mapgears.com> References: <6477c334-0330-f0a2-87e0-c2f6fac120dc@mapgears.com> Message-ID: Of course, in Python 8 this naming convention will indeed be enforced. https://mail.python.org/pipermail/python-dev/2016-March/143603.html Op 1 mrt. 2017 15:16 schreef "Alexandre Brault" : > Long story short, it's because there can be good reasons to ignore PEP8 > naming conventions. Linting tools can be taught to skip over an intentional > PEP8 violation. A grammar rule can't > > > Alex > > On 2017-03-01 09:04 AM, Mathieu BEAL wrote: > > I was wondering why the PEP coding style (https://www.python.org/dev/ > peps/pep-0008/) is not natively included in python grammar ? > > For instance, both, *function definition* and *class definition*, are > using the same ?NAME? token. ((see, https://docs.python.org/3/ > reference/grammar.html). > > classdef: 'class' NAME ['(' [arglist] ')'] ':' suite > > funcdef: 'def' NAME parameters ['->' test] ':' suite > > > Then, we are using libraries like pyflake, flake8, pylint, to check the > coding style. > > > It seems useful to natively make distinction between NAME of classdef and > NAME of funcdef, with something like: > > classdef: 'class' NAME_CLASS ['(' [arglist] ')'] ':' suite > > funcdef: 'def' NAME_FUNC parameters ['->' test] ':' suite > > > NAME_CLASS: > > ? Class name should normally use the CapWord convention; > > NAME_FUNC: > > ? Function name should be lowercase, with words separated by > underscore as necessary to improve readability, > > ? mixedCase is allowed; > > STYLE_CAP_WORDS = r?([A-Z]{1}[a-z]+(_[A-Z]{1}[a-z]+)*)? > > STYLE_UNDERSCORE_LOW_WORDS = r?([a-z]+(_[a-z]+)*)? > > STYLE_MIXED_CASE = r?([a-z]{1,}([A-Z]{1}[a-z]+)*?) > > NAME_FUNC = STYLE_LOW_WORDS > > | STYLE_UNDERSCORE_LOW_WORDS > > | STYLE_MIXED_CASE > > NAME_CLASS = STYLE_CAP_WORDS > > | NAME_FUNC # naming convention for functions may be used > in cases where the interface is documented and used primarily as a callable > (pep8 source). > > > > I didn't find any information about this point in a previous post or > elsewhere. Is there any reason for such a choice ? or is it a dark old > discussion we never talk about ? > > > Mathieu > > > ------------- > > import re > > > > # Testing the first part of the class def, only. > > style_cap_words = r'^class ([A-Z]{1}[a-z]+(_?[A-Z]{1}[a-z]+)*)$' > > compiled_style_cap_words = re.compile(style_cap_words) > > > > status_samples = {True:['class Hello', > > 'class Hel_Lo', # ok if Hel and Lo are 2 words > > 'class HelLo', # same. > > ], > > False:['class HellO', > > 'class _He', > > 'class HEllo', > > 'class HE', > > 'class H', > > 'class h', > > 'class Hell_oo', > > 'class Hell_', > > 'class _hell', > > 'class H_E_L_L_O', > > 'class _H']} > > > > def is_matched(sample, pattern=compiled_style_cap_words): > > matched = pattern.match(sample) > > return matched is not None > > > > for status, samples in status_samples.items(): > > for sample in samples: > > is_correct = status == is_matched(sample) > > assert is_correct, '%s is not correct, required %s' % (sample, > status) > > > _______________________________________________ > Python-ideas mailing listPython-ideas at python.orghttps://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Wed Mar 1 13:06:26 2017 From: barry at barrys-emacs.org (Barry) Date: Wed, 1 Mar 2017 18:06:26 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <6b254e05-b4b3-ed3d-0279-6276e77337e1@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <1efd6852-d481-20ad-a485-2e552200215c@mail.de> <20170301000254.GQ5689@ando.pearwood.info> <6b254e05-b4b3-ed3d-0279-6276e77337e1@gmail.com> Message-ID: <35C39E33-5EDF-4F9B-8852-1459C7C26B7D@barrys-emacs.org> > On 1 Mar 2017, at 01:26, Michel Desmoulin wrote: > > - you can iterate on both Maybe, bit do you want the keys, values or (key, value) items? Keys being the deafult. > - you can index both Maybe as you cannot in the general case know the index. Need keys(). > - you can size both Yes I think this duck cannot swim or quack. Barry From abedillon at gmail.com Wed Mar 1 14:13:44 2017 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 1 Mar 2017 13:13:44 -0600 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <35C39E33-5EDF-4F9B-8852-1459C7C26B7D@barrys-emacs.org> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <1efd6852-d481-20ad-a485-2e552200215c@mail.de> <20170301000254.GQ5689@ando.pearwood.info> <6b254e05-b4b3-ed3d-0279-6276e77337e1@gmail.com> <35C39E33-5EDF-4F9B-8852-1459C7C26B7D@barrys-emacs.org> Message-ID: Barry, you're taking the metaphor too far. Duct typing is about presenting a certain interface. If your function takes an object that has a get(key, default) method, the rest doesn't matter. That's the only way in which the object needs to resemble a duck in your function. I'd like to +1 this proposal. It should be trivial to implement. It won't break backward compatibility. It's intuitive. I can think of several places I would use it. I can't think of a good reason not to include it. On Wed, Mar 1, 2017 at 12:06 PM, Barry wrote: > > > On 1 Mar 2017, at 01:26, Michel Desmoulin > wrote: > > > > - you can iterate on both > Maybe, bit do you want the keys, values or (key, value) items? Keys being > the deafult. > > - you can index both > Maybe as you cannot in the general case know the index. Need keys(). > > - you can size both > Yes > > I think this duck cannot swim or quack. > > Barry > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Mar 1 14:25:24 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 1 Mar 2017 21:25:24 +0200 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: On 28.02.17 23:17, Victor Stinner wrote: > My question is: would it make sense to implement this feature in > Python directly? If yes, what should be the syntax? Use "/" marker? > Use the @positional() decorator? I'm strongly +1 for supporting positional-only parameters. The main benefit to me is that this allows to declare functions that takes arbitrary keyword arguments like Formatter.format() or MutableMapping.update(). Now we can't use even the "self" parameter and need to use a trick with parsing *args manually. This harms clearness and performance. The problem with the "/" marker is that it looks ugly. There was an excuse for the "*" marker -- it came from omitting the name in "*args". The "*" prefix itself means an iterable unpacking, but "/" is not used neither as prefix nor suffix. > Do you see concrete cases where it's a deliberate choice to deny > passing arguments as keywords? dict.__init__(), dict.update(), partial.__new__() and partial.__call__() are obvious examples. There are others. And there was performance reason. Just making the function supporting keyword arguments added an overhead even to calls with only positional arguments. This was changed recently, but I didn't checked whether some overhead is left. > Don't you like writing int(x="123") instead of int("123")? :-) (I know > that Serhiy Storshake hates the name of the "x" parameter of the int > constructor ;-)) I believe weird names like "x" was added when the support of "base" keyword was added due to the limitation of PyArg_ParseTupleAndKeywords(). All or nothing, either builtin function didn't support keyword arguments, or it supported passing by keyword for all arguments. But now it is possible to support passing by keyword only the part of parameters. I want to propose to deprecate badly designed keyword names of builtins. > By the way, I read that "/" marker is unknown by almost all Python > developers, and [...] syntax should be preferred, but > inspect.signature() doesn't support this syntax. Maybe we should fix > signature() and use [...] format instead? [...] is not Python syntax too. And it is orthogonal to positional-only parameters. [...] doesn't mean that parameters are positional-only. They can be passed by keyword, but just don't have default value. On other side, mandatory parameters can be positional-only. From storchaka at gmail.com Wed Mar 1 14:38:24 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 1 Mar 2017 21:38:24 +0200 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> Message-ID: On 01.03.17 11:31, Paul Moore wrote: > On 1 March 2017 at 01:31, qhlonline wrote: >> My code example is not proper, Yes, may be this is better: >> list.sort().revers( > > We can already do this - reversed(sorted(lst)) Or just sorted(lst, reverse=True). From storchaka at gmail.com Wed Mar 1 14:49:39 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 1 Mar 2017 21:49:39 +0200 Subject: [Python-ideas] __repr__: to support pprint In-Reply-To: References: <320dd54e.260.15a86f38c41.Coremail.mlet_it_bew@126.com> <4e5598e6.1ea2.15a87727516.Coremail.qhlonline@163.com> <20170301135025.GA5689@ando.pearwood.info> Message-ID: On 01.03.17 16:35, Paul Moore wrote: > On 1 March 2017 at 13:50, Steven D'Aprano wrote: >> It is possible that we could come up with a pretty-printing protocol, >> but that wouldn't be a trivial job. > > I'd be inclined to do this via simplegeneric. Let pprint do what it > currently does, but allow users to register implementations for > specific classes as they wish. That is how pprint is implemented now. It register implementations for specific classes and dispatch them by type (or rather by __repr__ implementation). But these functions have too complex and non-extensible signatures and that is why this is an implementation detail. There is an open issue for a pretty-printing protocol, but it is far from any progress. From guido at python.org Wed Mar 1 14:53:45 2017 From: guido at python.org (Guido van Rossum) Date: Wed, 1 Mar 2017 11:53:45 -0800 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: On Wed, Mar 1, 2017 at 11:25 AM, Serhiy Storchaka wrote: > On 28.02.17 23:17, Victor Stinner wrote: > >> My question is: would it make sense to implement this feature in >> Python directly? If yes, what should be the syntax? Use "/" marker? >> Use the @positional() decorator? >> > > I'm strongly +1 for supporting positional-only parameters. The main > benefit to me is that this allows to declare functions that takes arbitrary > keyword arguments like Formatter.format() or MutableMapping.update(). Now > we can't use even the "self" parameter and need to use a trick with parsing > *args manually. This harms clearness and performance. > Agreed. > The problem with the "/" marker is that it looks ugly. There was an excuse > for the "*" marker -- it came from omitting the name in "*args". The "*" > prefix itself means an iterable unpacking, but "/" is not used neither as > prefix nor suffix. > It's in a sense a pun -- * and / are "opposites" in mathematics, and so are the usages here. > Do you see concrete cases where it's a deliberate choice to deny >> passing arguments as keywords? >> > > dict.__init__(), dict.update(), partial.__new__() and partial.__call__() > are obvious examples. There are others. > > And there was performance reason. Just making the function supporting > keyword arguments added an overhead even to calls with only positional > arguments. This was changed recently, but I didn't checked whether some > overhead is left. > > Don't you like writing int(x="123") instead of int("123")? :-) (I know >> that Serhiy Storshake hates the name of the "x" parameter of the int >> constructor ;-)) >> > > I believe weird names like "x" was added when the support of "base" > keyword was added due to the limitation of PyArg_ParseTupleAndKeywords(). > All or nothing, either builtin function didn't support keyword arguments, > or it supported passing by keyword for all arguments. > > But now it is possible to support passing by keyword only the part of > parameters. I want to propose to deprecate badly designed keyword names of > builtins. > +1 > By the way, I read that "/" marker is unknown by almost all Python >> developers, and [...] syntax should be preferred, but >> inspect.signature() doesn't support this syntax. Maybe we should fix >> signature() and use [...] format instead? >> > > [...] is not Python syntax too. And it is orthogonal to positional-only > parameters. [...] doesn't mean that parameters are positional-only. They > can be passed by keyword, but just don't have default value. On other side, > mandatory parameters can be positional-only. FWIW in typeshed we've started using double leading underscore as a convention for positional-only parameters, e.g. here: https://github.com/python/typeshed/blob/master/stdlib/3/builtins.pyi#L936 FWIW I think 'self' should also be special-cased as positional-only. Nobody wants to write 'C.foo(self=C())'. :-) -- --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Mar 1 14:41:21 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 02 Mar 2017 08:41:21 +1300 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> Message-ID: <58B723E1.6060501@canterbury.ac.nz> St?fane Fermigier wrote: > Cf. https://martinfowler.com/bliki/CommandQuerySeparation.html Python's convention is less extreme than this, since it only applies to methods that, under the conventions of some other languages, would return self to facilitate chaining. There's no rule against a mutating method returning some other value if that's convenient. -- Greg From tjreedy at udel.edu Wed Mar 1 15:52:59 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 1 Mar 2017 15:52:59 -0500 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: On 3/1/2017 2:53 PM, Guido van Rossum wrote: > On Wed, Mar 1, 2017 at 11:25 AM, Serhiy Storchaka > > wrote: > > On 28.02.17 23:17, Victor Stinner wrote: > > My question is: would it make sense to implement this feature in > Python directly? If yes, what should be the syntax? Use "/" marker? > Use the @positional() decorator? > > > I'm strongly +1 for supporting positional-only parameters. The main > benefit to me is that this allows to declare functions that takes > arbitrary keyword arguments like Formatter.format() or > MutableMapping.update(). Now we can't use even the "self" parameter > and need to use a trick with parsing *args manually. This harms > clearness and performance. > Agreed. + 1 also. When people write a Python equivalent of a built-in function for documentation or teaching purposes, they should be able to exactly mimic the API. > The problem with the "/" marker is that it looks ugly. There was an > excuse for the "*" marker -- it came from omitting the name in > "*args". The "*" prefix itself means an iterable unpacking, but "/" > is not used neither as prefix nor suffix. > > > It's in a sense a pun -- * and / are "opposites" in mathematics, and so > are the usages here. Besides which, '/' has traditionally be used in non-numeric contexts as a separator, and in unix paths. -- Terry Jan Reedy From mertz at gnosis.cx Wed Mar 1 16:03:32 2017 From: mertz at gnosis.cx (David Mertz) Date: Wed, 1 Mar 2017 13:03:32 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <1efd6852-d481-20ad-a485-2e552200215c@mail.de> <20170301000254.GQ5689@ando.pearwood.info> <6b254e05-b4b3-ed3d-0279-6276e77337e1@gmail.com> <35C39E33-5EDF-4F9B-8852-1459C7C26B7D@barrys-emacs.org> Message-ID: On Wed, Mar 1, 2017 at 11:13 AM, Abe Dillon wrote: > I'd like to +1 this proposal. It should be trivial to implement. It won't > break backward compatibility. It's intuitive. I can think of several places > I would use it. I can't think of a good reason not to include it. > I've yet to see in this thread a use case where list.get() would make sense. Specifically, I've yet to see a case where there is straightforward duck-typing substitutability between dicts and lists where you'd want that. I saw something that said "semi-structured data sources like JSON are often messy and you need lots of try/except blocks." But that's really not the same thing. Even if some node in a structure might variously be a list, dict, or scalar (or other types; sets?), I haven't seen any code where this hypothetical list.get() would improve that problem. In contrast, *iteration* is definitely a case where I often want to freely substitute lists, sets, dicts, and other collections. They share that natural capability, so being able to type `for x in collection:` is a good generic win to have. As I've said, the huge difference is that the "keys" to a list have a clear and obvious relationship amongst themselves. They are always successive non-negative integers, with the minimum always being zero. That makes a whole lot of operations and assumptions very different from dictionaries whose keys are completely independent of each other. If `mylist[N]` works, `mylist[N-1]` cannot fail with an IndexError (obviously unless the list is mutated in between those operations); there's nothing remotely analogous for dictionaries, and that's the reason we have dict.get(). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Mar 1 16:32:42 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 01 Mar 2017 13:32:42 -0800 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: <58B73DFA.3010801@stoneleaf.us> On 03/01/2017 11:53 AM, Guido van Rossum wrote: > FWIW in typeshed we've started using double leading underscore as a convention for positional-only parameters, e.g. here: > > https://github.com/python/typeshed/blob/master/stdlib/3/builtins.pyi#L936 I would much rather have a single '/' to denote where positional-only ends, than have multiple leading '__'. Newest-member-of-the-society-for-the-preservation-of-underscores-ly yrs, -- ~Ethan~ From tjreedy at udel.edu Wed Mar 1 16:42:34 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 1 Mar 2017 16:42:34 -0500 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> <4F5D02DD-425F-484A-9BAD-7A0476853A16@lukasa.co.uk> Message-ID: On 3/1/2017 12:26 PM, St?fane Fermigier wrote: > What I wanted to point out is that the paragraph quoted by Stephan ("In > general in Python (and in all cases in the standard library) a method > that mutates an object will return None to help avoid getting the two > types of operations confused. So if you mistakenly write y.sort() > thinking it will give you a sorted copy of y, you?ll instead end up with > None, which will likely cause your program to generate an easily > diagnosed error.") doesn't seem to be true in this case. What is true AFAIK, is that stdlib collection mutation methods never return 'self'. There is usually no need to return self as one will already have a reference to the collection. A disadvantage (to some) of this policy is to not be able to chain calls so neatly. (But Python is not about writing everything as one-line expressions.) An advantage of not returning 'self' when mutating is the possibility of returning something from the collection without resorting to returning tuples. While most mutation methods return None, there are exceptions where the primary purpose of the mutation is to return something other than self and None. The two that come to mind are various mutable_collection.pop methods (sometimes with a suffix) and iterator.__next__ ('pop front'). Note that an iterator is a mutable virtual collection, which may or may not be dependent on a concrete collection object that is not mutated by the iteration. In functional languages, iteration through a sequence 'alist' may be done something as follows, with pair assignment. a_it = alist while a_it: a, a_it = first(a_it), rest(a_it) process(a) To be time efficiency, a_list must be a linked list, so that all tail-slices (returned by rest), already exist. While this can be done in Python, I generally prefer Python's iteration protocol. The closest equivalent to the above is the admittedly clumsy a_it = iter(alist) while True: try: a = next(a_it) except StopIteration: break process(a) But with the boilerplate code hidden, this becomes one of Python's gems. for a in alist: process(a) The above works for iterable linked lists but is MUCH more flexible and general. -- Terry Jan Reedy From python at mrabarnett.plus.com Wed Mar 1 16:57:41 2017 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 1 Mar 2017 21:57:41 +0000 Subject: [Python-ideas] Positional-only parameters In-Reply-To: <58B73DFA.3010801@stoneleaf.us> References: <58B73DFA.3010801@stoneleaf.us> Message-ID: On 2017-03-01 21:32, Ethan Furman wrote: > On 03/01/2017 11:53 AM, Guido van Rossum wrote: > >> FWIW in typeshed we've started using double leading underscore as a convention for positional-only parameters, e.g. here: >> >> https://github.com/python/typeshed/blob/master/stdlib/3/builtins.pyi#L936 > > I would much rather have a single '/' to denote where positional-only ends, than have multiple leading '__'. > +1 This also not clear what would happen if some of the parameters had the leading underscores and others didn't. From barry at barrys-emacs.org Wed Mar 1 16:04:34 2017 From: barry at barrys-emacs.org (Barry) Date: Wed, 1 Mar 2017 21:04:34 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <1efd6852-d481-20ad-a485-2e552200215c@mail.de> <20170301000254.GQ5689@ando.pearwood.info> <6b254e05-b4b3-ed3d-0279-6276e77337e1@gmail.com> <35C39E33-5EDF-4F9B-8852-1459C7C26B7D@barrys-emacs.org> Message-ID: <32D34893-3A77-49FD-9695-F1BFD12E57B8@barrys-emacs.org> > On 1 Mar 2017, at 19:13, Abe Dillon wrote: > > Barry, you're taking the metaphor too far. Duct typing is about presenting a certain interface. If your function takes an object that has a get(key, default) method, the rest doesn't matter. That's the only way in which the object needs to resemble a duck in your function. In support of get() I found that list of ducks a poor agument for the reasons I stated. That's not to say that get() on lists may well have value for other reasons, but I find the duck typing a very weak argument. > > I'd like to +1 this proposal. It should be trivial to implement. It won't break backward compatibility. It's intuitive. I can think of several places I would use it. I can't think of a good reason not to include it. I do not think I have encounted any use cases where I would have used this my self. Maybe in command line processing, I have to dig in my repos and check. Might be able to short cut a length check with a get with default None or "". Barry > >> On Wed, Mar 1, 2017 at 12:06 PM, Barry wrote: >> >> > On 1 Mar 2017, at 01:26, Michel Desmoulin wrote: >> > >> > - you can iterate on both >> Maybe, bit do you want the keys, values or (key, value) items? Keys being the deafult. >> > - you can index both >> Maybe as you cannot in the general case know the index. Need keys(). >> > - you can size both >> Yes >> >> I think this duck cannot swim or quack. >> >> Barry >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Wed Mar 1 17:16:47 2017 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Wed, 01 Mar 2017 22:16:47 +0000 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: On Wed, Mar 1, 2017 at 9:26 PM Serhiy Storchaka wrote: > On 28.02.17 23:17, Victor Stinner wrote: > > My question is: would it make sense to implement this feature in > > Python directly? If yes, what should be the syntax? Use "/" marker? > > Use the @positional() decorator? > > > The problem with the "/" marker is that it looks ugly. There was an > excuse for the "*" marker -- it came from omitting the name in "*args". > The "*" prefix itself means an iterable unpacking, but "/" is not used > neither as prefix nor suffix. > I like the idea, but I wanted to note that since it has no meaning from the point of view of the defined function, it can be done with a magic decorator, so new syntax is not required: @positional_only[:4] def replace(self, old, new, count=-1): ... It may ease googling and backporting, by defining positional_only[slice] to be the identity function. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Mar 1 19:04:15 2017 From: barry at python.org (Barry Warsaw) Date: Wed, 1 Mar 2017 19:04:15 -0500 Subject: [Python-ideas] PEP 8 coding style included in grammar ? References: Message-ID: <20170301190415.2b836651@subdivisions.wooz.org> On Mar 01, 2017, at 03:04 PM, Mathieu BEAL wrote: >I was wondering why the PEP coding style ( >https://www.python.org/dev/peps/pep-0008/) is not natively included in python >grammar ? Well, the simple answer is that the grammar predates PEP 8 (or any PEP) by many years. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From rosuav at gmail.com Wed Mar 1 19:27:43 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 2 Mar 2017 11:27:43 +1100 Subject: [Python-ideas] PEP 8 coding style included in grammar ? In-Reply-To: <20170301190415.2b836651@subdivisions.wooz.org> References: <20170301190415.2b836651@subdivisions.wooz.org> Message-ID: On Thu, Mar 2, 2017 at 11:04 AM, Barry Warsaw wrote: > On Mar 01, 2017, at 03:04 PM, Mathieu BEAL wrote: > >>I was wondering why the PEP coding style ( >>https://www.python.org/dev/peps/pep-0008/) is not natively included in python >>grammar ? > > Well, the simple answer is that the grammar predates PEP 8 (or any PEP) by > many years. Also, the grammar has a much stricter backward compatibility guarantee than any style guide ever can (or should). After some discussion about this and related topics on python-list, I posted this collection: http://rosuav.blogspot.com/2016/04/falsehoods-programmers-believe-about.html ChrisA From chris.barker at noaa.gov Wed Mar 1 21:58:56 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 1 Mar 2017 18:58:56 -0800 Subject: [Python-ideas] lazy use for optional import In-Reply-To: References: Message-ID: Going through machinations to satisfy PEP 8 makes no sense -- it's s style *guide* -- that's it. -CHB On Tue, Feb 28, 2017 at 3:31 PM, Nicolas Cellier < contact at nicolas-cellier.net> wrote: > I have seen some interest into lazy functionality implementation. > > I wondered if it can be linked with optional import. > > PEP 8 authoritatively states: > > Imports are always put at the top of the file, just after any module > comments and docstrings, and before module globals and constants. > > So, if we want to stick to PEP8 with non mandatory import, we have to > catch the import errors, or jail the class or function using extra > functionnality. > > Why not using the potential lazy keyword to have a nice way to deal with > it? > > For example: > > lazy import pylab as pl # do nothing for now >> >> # do stuff >> >> def plot(*args): >> pl.figure() # Will raise an ImportError at this point >> pl.plot(...) >> > > That way, our library will raise an ImportError only on plot func usage > with an explicit traceback : if matplotlib is not installed, we will have > the line where it is used for the first time and we will have the name of > the faulty library. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Mar 1 22:16:11 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 1 Mar 2017 19:16:11 -0800 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: On Wed, Mar 1, 2017 at 2:16 PM, ????? wrote: > I like the idea, but I wanted to note that since it has no meaning from > the point of view of the defined function, it can be done with a magic > decorator, so new syntax is not required: > > @positional_only[:4] > def replace(self, old, new, count=-1): > ... > I"m confused, what does the [:4] mean? if you want old and new to be positional only, wouldn't it be something like: @positional_only(3) def replace(self, old, new, count=-1): ... i.e. the first three parameters are positional only. and why indexing/slice syntax??? +! on the idea -- still on the fence about syntax. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Mar 1 22:28:27 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 1 Mar 2017 19:28:27 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <6b254e05-b4b3-ed3d-0279-6276e77337e1@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <1efd6852-d481-20ad-a485-2e552200215c@mail.de> <20170301000254.GQ5689@ando.pearwood.info> <6b254e05-b4b3-ed3d-0279-6276e77337e1@gmail.com> Message-ID: On Tue, Feb 28, 2017 at 5:26 PM, Michel Desmoulin wrote: > Duck typing is precesily about incomplete but good enough similar API. > yes, though ideally one API is a subset of the other -- if they have the same method, it should mean the same thing: > For the dict and list: > > - you can iterate on both > But you get different things -- dicts iterate on the keys, which wold be the equivalent of lists iterating on the indexes -- no one wants that! - you can index both > the indexing is only kinda the same, though, and you certainly can't slice dicts... > - you can size both > huh? what does sizing mean? you mean get the length? OK, that's similar. > Hence I can see very well functions working with both. E.G: helper to > extract x elements or a default value: > > def extract(data, *args, default="None"): > for x in args: > try: > yield data[x] > except (KeyError, ValueError): > yield default > > Usage: > > > a, b, c = extract(scores, "foo", "bar", "doh") > x, y, z = extract(items, 2, 5, 8, default=0) > really? when would you not know if your "keys" are indexes or arbitrary keys? or your data a sequence or mapping? I actually have this helper function. > > With list.get and tuple.get, this would become: > > def extract(data, *args, default="None"): > return (data.get(x, default) for x in args) > as a helper function, then it's OK if it's a bit more verbose. If you were to argue that you wouldn't need the helper function at all, then that might make sense, but this still seems a dangerous and hopefully rare thing to do! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Mar 1 22:41:54 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 1 Mar 2017 19:41:54 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <58e46182-1210-438f-b85a-02aa1bd9dc9e@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <20170228235616.GP5689@ando.pearwood.info> <58e46182-1210-438f-b85a-02aa1bd9dc9e@gmail.com> Message-ID: On Tue, Feb 28, 2017 at 5:56 PM, Michel Desmoulin wrote: > Me, I have to deal SOAP government systems, mongodb based API built by > teenagers, geographer data set exports and FTP + CSV in marina systems > (which I happen to work on right now). > > 3rd party CSV, XML and JSON processing are just a hundred of lines of > try/except on indexing because they have many listings, data positions > is important and a lot of system got it wrong, giving you inconsistent > output with missing data and terrible labeling. > I feel your pain -- data munging is often a major mess! > And because life is unfair, the data you can extract is often a mix of > heterogeneous mappings and lists / tuples. And your tool must manage the > various versions of the data format they send to you, some with > additional fields, or missing ones. Some named, other found by position. > If I were dealing with a mix of mappings and index-able data, and the index-able data were often poorly formed (items missing), I think I"d put it all in dicts -- some of which happened to have integers as keys. Or just put a None in everywhere there should be a value in a sequence that is missing. if data is coming in from a "schema-less" system, then what CAN you do with a sequence that is inconsistent? How can yo possible know which are missing if the sequence is too short? If it is always the last N times then nit' snot hard to pad the sequence. if it's not -- then what you have is a mapping that happens to have integers as keys. Not trying to be harsh here -- I'm just not at all sure that adding a get() to sequences is the right solution to these problems. Maybe someone else will chime in with more "I'd really have a use for this" examples. -CHB This summer, I had to convert a data set provided by polls in africa > through an android form, generated from an XML schema, Actually, I'm surprised that the XML schema step didn't enforce that the data be well formed. ISn't that the whole point of an XML schema? -- but you're point is well taken -- data are often not well formed. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jae+python at jaerhard.com Wed Mar 1 22:23:08 2017 From: jae+python at jaerhard.com (=?iso-8859-1?B?SvxyZ2VuIEEu?= Erhard) Date: Thu, 2 Mar 2017 04:23:08 +0100 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> Message-ID: <20170302032308.g4hsqdmhacycaqgn@jaerhard.com> On Tue, Feb 28, 2017 at 03:35:31PM -0800, Jelle Zijlstra wrote: > 2017-02-28 15:12 GMT-08:00 Steven D'Aprano : > > On Wed, Mar 01, 2017 at 07:02:23AM +0800, ????? wrote: > >> > >> where we use types? > >> almost: > >> isinstance(obj, T); > >> # issubclass(S, T); > >> > >> Note that TYPE is SET; > > > > What does that mean? I don't understand. > > > > > >> if we add __contains__ and __le__ into "type", > >> then things become: > >> obj in T; > > > > But obj is **not** in T, since T is a type, not a container. > > > > But in type theory, types are sets in some sense. For example, the > bool type is the set {True, False}, and the int type is the infinite > set {..., -1, 0, 1, ...}. Similarly, typing.py has a Union type: > Union[A, B] is the union of the types A and B. Subclasses are subsets > of their parent classes, because their set of possible values is a > subset of the possible values of their parent class. > > The OP seems to be proposing that we reflect this identity between > types and sets in Python by spelling "isinstance(obj, T)" as "obj in > T" and "issubclass(S, T)" as "S <= T". This proposal has some solid > theory behind it and I don't think it would be hard to implement, but > it doesn't seem like a particularly useful change to me. It wouldn't > really enable anything we can't do now, and it may be confusing to > people reading code that "obj in list" does something completely > different from "obj in list()". So? Compare to "fn" vs "fn()" now. Yes, some people are confused. So what. You *do* have to learn things. And "enable anything we can't do now". That argument was used any number of times on this list, and even before this very list even existed. Still, we got decorators (they don't enable anything we couldn't do without them, and we actually can still do what they do without using them). "isinstane" and "issubclass" are explicit! Yay!... and decorators are "implicit", and wouldn't you know it, they *do* confuse people. I'm +.05 (and no, that's not because I finally see an idea of that I actually think has some merit). Maybe even +1, given that isinstance(obj, class) is rather bulky. Okay, make that a +1. Bye, J From pavol.lisy at gmail.com Wed Mar 1 23:38:37 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Thu, 2 Mar 2017 05:38:37 +0100 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: <20170228231250.GN5689@ando.pearwood.info> References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> Message-ID: On 3/1/17, Steven D'Aprano wrote: > On Wed, Mar 01, 2017 at 07:02:23AM +0800, ????? wrote: >> >> where we use types? >> almost: >> isinstance(obj, T); >> # issubclass(S, T); >> >> Note that TYPE is SET; > > What does that mean? I don't understand. Maybe she/he wants to say that it is natural to see class as a collection (at least in set theory https://en.wikipedia.org/wiki/Class_(set_theory) ) From ncoghlan at gmail.com Thu Mar 2 00:46:57 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 2 Mar 2017 15:46:57 +1000 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 1 March 2017 at 19:37, Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: > I know what the regulars among you will be thinking (time machine, high > bar for language syntax changes, etc.) so let me start by assuring you that > I'm well aware of all of this, that I did research the topic before posting > and that this is not the same as a previous suggestion using almost the > same subject line. > > Now here's the proposal: allow an except (or except break) clause to > follow for/while loops that will be executed if the loop was terminated by > a break statement. > > The idea is certainly not new. In fact, Nick Coghlan, in his blog post > http://python-notes.curiousefficiency.org/en/latest/python_ > concepts/break_else.html, uses it to provide a mental model for the > meaning of the else following for/while, but, as far as I'm aware, he never > suggested to make it legal Python syntax. > > Now while it's possible that Nick had a good reason not to do so, I never really thought about it, as I only use the "else:" clause for search loops where there aren't any side effects in the "break" case (other than the search result being bound to the loop variable), so while I find "except break:" useful as an explanatory tool, I don't have any practical need for it. I think you've made as strong a case for the idea as could reasonably be made :) However, Steven raises a good point that this would complicate the handling of loops in the code generator a fair bit, as it would add up to two additional jump targets in cases wherever the new clause was used. Currently, compiling loops only needs to track the start of the loop (for continue), and the first instruction after the loop (for break). With this change, they'd also need to track: - the start of the "except break" clause (for break when the clause is used) - the start of the "else" clause (for the non-break case when both trailing clauses are present) The design level argument against adding the clause is that it breaks the "one obvious way" principle, as the preferred form for search loops look like this: for item in iterable: if condition(item): break else: # Else clause either raises an exception or sets a default value item = get_default_value() # If we get here, we know "item" is a valid reference operation(item) And you can easily switch the `break` out for a suitable `return` if you move this into a helper function: def find_item_of_interest(iterable): for item in iterable: if condition(item): return item # The early return means we can skip using "else" return get_default_value() Given that basic structure as a foundation, you only switch to the "nested side effect" form if you have to: for item in iterable: if condition(item): operation(item) break else: # Else clause neither raises an exception nor sets a default value condition_was_never_true(iterable) This form is generally less amenable to being extracted into a reusable helper function, since it couples the search loop directly to the operation performed on the bound item, whereas decoupling them gives you a lot more flexibility in the eventual code structure. The proposal in this thread then has the significant downside of only covering the "nested side effect" case: for item in iterable: if condition(item): break except break: operation(item) else: condition_was_never_true(iterable) While being even *less* amenable to being pushed down into a helper function (since converting the "break" to a "return" would bypass the "except break" clause). So while it is cool to see this written up as a concrete proposal (thank you!), I don't think it makes the grade as an actual potential syntax change. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Mar 2 01:10:55 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 2 Mar 2017 16:10:55 +1000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <1efd6852-d481-20ad-a485-2e552200215c@mail.de> <20170301000254.GQ5689@ando.pearwood.info> <6b254e05-b4b3-ed3d-0279-6276e77337e1@gmail.com> <35C39E33-5EDF-4F9B-8852-1459C7C26B7D@barrys-emacs.org> Message-ID: On 2 March 2017 at 07:03, David Mertz wrote: > On Wed, Mar 1, 2017 at 11:13 AM, Abe Dillon wrote: > >> I'd like to +1 this proposal. It should be trivial to implement. It won't >> break backward compatibility. It's intuitive. I can think of several places >> I would use it. I can't think of a good reason not to include it. >> > > I've yet to see in this thread a use case where list.get() would make > sense. Specifically, I've yet to see a case where there is straightforward > duck-typing substitutability between dicts and lists where you'd want that. > I've never wanted a `list.get`, but I have occassionally wished that: 1. operator.get/set/delitem were available as builtins (like get/set/delattr) 2. the builtin getitem accepted an optional "default" argument the way getattr does That is, the desired common operation isn't specifically "obj.get(subscript)" or "obj.get(subscript, default)", it's: _raise = object() def getitem(container, subscript, default=_raise): try: return container[subscript] except LookupError: if default is _raise: raise return default Mappings just happen to already offer that functionality as a method. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Thu Mar 2 02:41:17 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 2 Mar 2017 08:41:17 +0100 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: Hi all, I have a slight variant of the decorator proposal. Rather than specify a count, let the decorator implement the typeshed dunder convention: @positional_only def replace(self, __old, __new, count=-1): (I imagine this decorator would also treat "self" as position_only, so no need for __self.) Pros: 1. Consistent with the typeshed convention. 2. Avoids a count. 3. Strictly opt-in, so hopefully keeps those @#?! underscore preservationists from picketing my lawn (again!). Stephan 2017-03-02 4:16 GMT+01:00 Chris Barker : > On Wed, Mar 1, 2017 at 2:16 PM, ????? wrote: > >> I like the idea, but I wanted to note that since it has no meaning from >> the point of view of the defined function, it can be done with a magic >> decorator, so new syntax is not required: >> >> @positional_only[:4] >> def replace(self, old, new, count=-1): >> ... >> > > I"m confused, what does the [:4] mean? > > if you want old and new to be positional only, wouldn't it be something > like: > > @positional_only(3) > def replace(self, old, new, count=-1): > ... > > i.e. the first three parameters are positional only. > > and why indexing/slice syntax??? > > +! on the idea -- still on the fence about syntax. > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Thu Mar 2 02:53:10 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 2 Mar 2017 08:53:10 +0100 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> Message-ID: A crucial difference between a set and a type is that you cannot explicitly iterate over the elements of a type, so while we could implement x in int to do something useful, we cannot make for x in int: print(x) Because if we could, we could implement Russell's paradox in Python: R = set(x for x in object if x not in x) print(R in R) Bottom line: a set is not a type, even in mathematics. Stephan 2017-03-02 5:38 GMT+01:00 Pavol Lisy : > On 3/1/17, Steven D'Aprano wrote: > > On Wed, Mar 01, 2017 at 07:02:23AM +0800, ????? wrote: > >> > >> where we use types? > >> almost: > >> isinstance(obj, T); > >> # issubclass(S, T); > >> > >> Note that TYPE is SET; > > > > What does that mean? I don't understand. > > Maybe she/he wants to say that it is natural to see class as a > collection (at least in set theory > https://en.wikipedia.org/wiki/Class_(set_theory) ) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Mar 2 03:03:29 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 2 Mar 2017 10:03:29 +0200 Subject: [Python-ideas] Optional parameters without default value Message-ID: Function implemented in Python can have optional parameters with default value. It also can accept arbitrary number of positional and keyword arguments if use var-positional or var-keyword parameters (*args and **kwargs). But there is no way to declare an optional parameter that don't have default value. Currently you need to use the sentinel idiom for implementing this: _sentinel = object() def get(store, key, default=_sentinel): if store.exists(key): return store.retrieve(key) if default is _sentinel: raise LookupError else: return default There are drawback of this: * Module's namespace is polluted with sentinel's variables. * You need to check for the sentinel before passing it to other function by accident. * Possible name conflicts between sentinels for different functions of the same module. * Since the sentinel is accessible outside of the function, it possible to pass it to the function. * help() of the function shows reprs of default values. "foo(bar=)" looks ugly. I propose to add a new syntax for optional parameters. If the argument corresponding to the optional parameter without default value is not specified, the parameter takes no value. As well as the "*" prefix means "arbitrary number of positional parameters", the prefix "?" can mean "single optional parameter". Example: def get(store, key, ?default): if store.exists(key): return store.retrieve(key) try: return default except NameError: raise LookupError Alternative syntaxes: * "=" not followed by an expression: "def get(store, key, default=)". * The "del" keyword: "def get(store, key, del default)". This feature is orthogonal to supporting positional-only parameters. Optional parameters without default value can be positional-or-keyword, keyword-only or positional-only (if the latter is implemented). From mal at egenix.com Thu Mar 2 03:26:25 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 2 Mar 2017 09:26:25 +0100 Subject: [Python-ideas] PEP 8 coding style included in grammar ? In-Reply-To: <20170301190415.2b836651@subdivisions.wooz.org> References: <20170301190415.2b836651@subdivisions.wooz.org> Message-ID: <15c62e58-3a6f-42c3-99cb-1d9c8510066f@egenix.com> On 02.03.2017 01:04, Barry Warsaw wrote: > On Mar 01, 2017, at 03:04 PM, Mathieu BEAL wrote: > >> I was wondering why the PEP coding style ( >> https://www.python.org/dev/peps/pep-0008/) is not natively included in python >> grammar ? > > Well, the simple answer is that the grammar predates PEP 8 (or any PEP) by > many years. ... plus PEP 8 is a style guide, not a fixed set of rules. You are free to extend it, mix and match it, to suit your own needs. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From elazarg at gmail.com Thu Mar 2 03:35:31 2017 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 02 Mar 2017 08:35:31 +0000 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> Message-ID: This suggestion is really problematic IMHO. "isinstance" is a nominal check. I can't ask "isinstance(x, Callable[int, int])" because that would imply solving the halting problem. so "isinstance(x, Y)" does not mean "is it true that x is an element of the type Y" but rather "is it true that x was created by a constructor of some superclass of Y". It is not a type-theoretic question but a question of origin and intent. With regard to readability, this will be completely confusing for me. "in" is a question about inclusion in a collection, not some set-theoretic inclusion. Otherwise we should also as "x in f" as an equivalent to "not not f(x)", as is in set theory. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Mar 2 03:36:07 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 2 Mar 2017 09:36:07 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> On 02.03.2017 09:03, Serhiy Storchaka wrote: > Function implemented in Python can have optional parameters with default > value. It also can accept arbitrary number of positional and keyword > arguments if use var-positional or var-keyword parameters (*args and > **kwargs). But there is no way to declare an optional parameter that > don't have default value. Currently you need to use the sentinel idiom > for implementing this: > > _sentinel = object() > def get(store, key, default=_sentinel): > if store.exists(key): > return store.retrieve(key) > if default is _sentinel: > raise LookupError > else: > return default > > There are drawback of this: > > * Module's namespace is polluted with sentinel's variables. > > * You need to check for the sentinel before passing it to other function > by accident. > > * Possible name conflicts between sentinels for different functions of > the same module. > > * Since the sentinel is accessible outside of the function, it possible > to pass it to the function. > > * help() of the function shows reprs of default values. "foo(bar= object at 0xb713c698>)" looks ugly. > > > I propose to add a new syntax for optional parameters. If the argument > corresponding to the optional parameter without default value is not > specified, the parameter takes no value. As well as the "*" prefix means > "arbitrary number of positional parameters", the prefix "?" can mean > "single optional parameter". > > Example: > > def get(store, key, ?default): > if store.exists(key): > return store.retrieve(key) > try: > return default > except NameError: > raise LookupError Why a new syntax ? Can't we just have a pre-defined sentinel singleton NoDefault and use that throughout the code (and also special case it in argument parsing/handling)? def get(store, key, default=NoDefault): if store.exists(key): return store.retrieve(key) ... I added a special singleton NotGiven to our mxTools long ago for this purpose. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From levkivskyi at gmail.com Thu Mar 2 03:45:09 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Thu, 2 Mar 2017 09:45:09 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> Message-ID: On 2 March 2017 at 09:36, M.-A. Lemburg wrote: > On 02.03.2017 09:03, Serhiy Storchaka wrote: > > Function implemented in Python can have optional parameters with default > [...] > Why a new syntax ? Can't we just have a pre-defined sentinel > singleton NoDefault and use that throughout the code (and also > special case it in argument parsing/handling)? > I think for the sane reason that we didn't add Undefined to PEP 484 and PEP 526: Having "another kind of None" will cause code everywhere to expect it. (Plus Guido didn't like it) -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Mar 2 03:57:35 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 02 Mar 2017 00:57:35 -0800 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: <58B7DE7F.70505@stoneleaf.us> On 03/01/2017 11:41 PM, Stephan Houben wrote: > I have a slight variant of the decorator proposal. > Rather than specify a count, let the decorator implement the typeshed dunder convention: > > @positional_only > def replace(self, __old, __new, count=-1): > > (I imagine this decorator would also treat "self" as position_only, > so no need for __self.) > > Pros: > 1. Consistent with the typeshed convention. Only a pro if you like that convention. ;) > 2. Avoids a count. > 3. Strictly opt-in, so hopefully keeps those @#?! underscore preservationists from picketing my lawn (again!). Using a decorator is also strictly opt-in. Oh, and did someone say it was time for the protest? [----------------------] [----------------------] [ ] [ ] [ NO MORE UNDERSCORES! ] [ NO MORE UNDERSCORES! ] [ ] [ ] [----------------------] [----------------------] | | | | | | | | | | | | | | | | |-| |-| -- ~Ethan~ From storchaka at gmail.com Thu Mar 2 04:06:55 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 2 Mar 2017 11:06:55 +0200 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> Message-ID: On 02.03.17 10:36, M.-A. Lemburg wrote: > Why a new syntax ? Can't we just have a pre-defined sentinel > singleton NoDefault and use that throughout the code (and also > special case it in argument parsing/handling)? > > def get(store, key, default=NoDefault): > if store.exists(key): > return store.retrieve(key) > ... This means adding a new syntax. NoDefault should be a keyword (we can reuse existing keyword couldn't be used in expression), and it should be accepted only in the specific context of declaring function parameter. From p.f.moore at gmail.com Thu Mar 2 04:08:51 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Mar 2017 09:08:51 +0000 Subject: [Python-ideas] suggestion about the sort() function of the list instance In-Reply-To: References: <10473a54.3a6d.15a7d899712.Coremail.qhlonline@163.com> <20170301001338.GR5689@ando.pearwood.info> <1b70bbd4.2499.15a877e9111.Coremail.qhlonline@163.com> Message-ID: On 1 March 2017 at 19:38, Serhiy Storchaka wrote: > On 01.03.17 11:31, Paul Moore wrote: >> >> On 1 March 2017 at 01:31, qhlonline wrote: >>> >>> My code example is not proper, Yes, may be this is better: >>> list.sort().revers( >> >> >> We can already do this - reversed(sorted(lst)) > > > Or just sorted(lst, reverse=True). Indeed. Which illustrates nicely why learning Python's approach to doing things rather than translating idioms from other languages will get cleaner code. Paul From mal at egenix.com Thu Mar 2 04:59:17 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 2 Mar 2017 10:59:17 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> Message-ID: <1d5cd531-288c-bb42-7dea-b9d5309f0779@egenix.com> On 02.03.2017 09:45, Ivan Levkivskyi wrote: > On 2 March 2017 at 09:36, M.-A. Lemburg wrote: > >> On 02.03.2017 09:03, Serhiy Storchaka wrote: >>> Function implemented in Python can have optional parameters with default >> [...] >> > Why a new syntax ? Can't we just have a pre-defined sentinel >> singleton NoDefault and use that throughout the code (and also >> special case it in argument parsing/handling)? >> > > I think for the sane reason that we didn't add Undefined to PEP 484 > and PEP 526: > Having "another kind of None" will cause code everywhere to expect it. But that's exactly the point :-) Code should be made aware of such a special value and act accordingly. I had introduced NotGiven in our code to be able to differentiate between having a parameter provided to a method/function or not, and I needed a new singleton, because None was in fact a permitted value for the parameters, but I still had to detect whether this parameter was passed in or not. Example: >>> import mx.Tools >>> mx.Tools.NotGiven NotGiven >>> def f(x=mx.Tools.NotGiven): pass ... >>> help(f) Help on function f in module __main__: f(x=NotGiven) Because it's a singleton, you can use "is" for test which is very fast. BTW: NotGiven was named after NotImplemented, another singleton we have in Python. I had introduced this a long time ago to implement better coercion logic: http://web.archive.org/web/20011222024710/http://www.lemburg.com/files/python/CoercionProposal.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From mal at egenix.com Thu Mar 2 05:04:53 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 2 Mar 2017 11:04:53 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> Message-ID: <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> On 02.03.2017 10:06, Serhiy Storchaka wrote: > On 02.03.17 10:36, M.-A. Lemburg wrote: >> Why a new syntax ? Can't we just have a pre-defined sentinel >> singleton NoDefault and use that throughout the code (and also >> special case it in argument parsing/handling)? >> >> def get(store, key, default=NoDefault): >> if store.exists(key): >> return store.retrieve(key) >> ... > > This means adding a new syntax. NoDefault should be a keyword (we can > reuse existing keyword couldn't be used in expression), and it should be > accepted only in the specific context of declaring function parameter. This is not new syntax, nor is it a keyword. It's only a new singleton and it is well usable outside of function declarations as well, e.g. for class attributes which are not yet initialized (and which can accept None as value). The only special casing would be in function call parameter parsing to signal errors when the parameter is used as keyword parameter. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From stephanh42 at gmail.com Thu Mar 2 05:22:00 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 2 Mar 2017 11:22:00 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: In cases like this I would recommend creating the sentinel yourself: NoDefault = object() def get(store, key, default=NoDefault): if default is NoDefault: # do something You can arrange to not export NoDefault so that the client code cannot even access the sentinel value. This is strictly preferable over having yet another global value meaning "no value", since that just moves the goal posts: clients will complain they cannot pass in a default=NoDefault and get back NoDefault. Stephan 2017-03-02 11:04 GMT+01:00 M.-A. Lemburg : > On 02.03.2017 10:06, Serhiy Storchaka wrote: > > On 02.03.17 10:36, M.-A. Lemburg wrote: > >> Why a new syntax ? Can't we just have a pre-defined sentinel > >> singleton NoDefault and use that throughout the code (and also > >> special case it in argument parsing/handling)? > >> > >> def get(store, key, default=NoDefault): > >> if store.exists(key): > >> return store.retrieve(key) > >> ... > > > > This means adding a new syntax. NoDefault should be a keyword (we can > > reuse existing keyword couldn't be used in expression), and it should be > > accepted only in the specific context of declaring function parameter. > > This is not new syntax, nor is it a keyword. It's only a > new singleton and it is well usable outside of function > declarations as well, e.g. for class attributes which are > not yet initialized (and which can accept None as value). > > The only special casing would be in function call > parameter parsing to signal errors when the parameter > is used as keyword parameter. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Mar 02 2017) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Thu Mar 2 06:06:15 2017 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 2 Mar 2017 12:06:15 +0100 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 02.03.2017 06:46, Nick Coghlan wrote: > On 1 March 2017 at 19:37, Wolfgang Maier > > > wrote: > > Now here's the proposal: allow an except (or except break) clause to > follow for/while loops that will be executed if the loop was > terminated by a break statement. > > Now while it's possible that Nick had a good reason not to do so, > > > I never really thought about it, as I only use the "else:" clause for > search loops where there aren't any side effects in the "break" case > (other than the search result being bound to the loop variable), so > while I find "except break:" useful as an explanatory tool, I don't have > any practical need for it. > > I think you've made as strong a case for the idea as could reasonably be > made :) > > However, Steven raises a good point that this would complicate the > handling of loops in the code generator a fair bit, as it would add up > to two additional jump targets in cases wherever the new clause was used. > > Currently, compiling loops only needs to track the start of the loop > (for continue), and the first instruction after the loop (for break). > With this change, they'd also need to track: > > - the start of the "except break" clause (for break when the clause is used) > - the start of the "else" clause (for the non-break case when both > trailing clauses are present) > I think you could get away with only one additional jump target as I showed in my previous reply to Steven. The heavier burden would be on the parser, which would have to distinguish the existing and the two new loop variants (loop with except clause, loop with except and else clause) but, anyway, that's probably not really the point. What weighs heavier, I think, is your design argument. > The design level argument against adding the clause is that it breaks > the "one obvious way" principle, as the preferred form for search loops > look like this: > > for item in iterable: > if condition(item): > break > else: > # Else clause either raises an exception or sets a default value > item = get_default_value() > > # If we get here, we know "item" is a valid reference > operation(item) > > And you can easily switch the `break` out for a suitable `return` if you > move this into a helper function: > > def find_item_of_interest(iterable): > for item in iterable: > if condition(item): > return item > # The early return means we can skip using "else" > return get_default_value() > > Given that basic structure as a foundation, you only switch to the > "nested side effect" form if you have to: > > for item in iterable: > if condition(item): > operation(item) > break > else: > # Else clause neither raises an exception nor sets a default value > condition_was_never_true(iterable) > > This form is generally less amenable to being extracted into a reusable > helper function, since it couples the search loop directly to the > operation performed on the bound item, whereas decoupling them gives you > a lot more flexibility in the eventual code structure. > > The proposal in this thread then has the significant downside of only > covering the "nested side effect" case: > > for item in iterable: > if condition(item): > break > except break: > operation(item) > else: > condition_was_never_true(iterable) > > While being even *less* amenable to being pushed down into a helper > function (since converting the "break" to a "return" would bypass the > "except break" clause). I'm actually not quite buying this last argument. If you wanted to refactor this to "return" instead of "break", you could simply put the return into the except break block. In many real-world situations with multiple breaks from a loop this could actually make things easier instead of worse. Personally, the "nested side effect" form makes me uncomfortable every time I use it because the side effects on breaking or not breaking the loop don't end up at the same indentation level and not necessarily together. However, I'm gathering from the discussion so far that not too many people are thinking like me about this point, so maybe I should simply adjust my mind-set. All that said, this is a very nice abstract view on things! I really learned quite a bit from this, thank you :) As always though, reality can be expected to be quite a bit more complicated than theory so I decided to check the stdlib for real uses of break. This is quite a tedious task since break is used in many different ways and I couldn't come up with a good automated way of classifying them. So what I did is just go through stdlib code (in reverse alphabetical order) containing the break keyword and put it into categories manually. I only got up to socket.py before losing my enthusiasm, but here's what I found: - overall I looked at 114 code blocks that contain one or more breaks - 84 of these are trivial use cases that simply break out of a while True block or terminate a while/for loop prematurely (no use for any follow-up clause there) - 8 more are causing a side-effect before a single break, and it would be pointless to put this into an except break clause - 3 more cause different, non-redundant side-effects before different breaks from the same loop and, obviously, an except break clause would not help them either => So the vast majority of breaks does *not* need an except break *nor* an else clause, but that's just as expected. Of the remaining 19 non-trivial cases - 9 are variations of your classical search idiom above, i.e., there's an else clause there and nothing more is needed - 6 are variations of your "nested side-effects" form presented above with debatable (see above) benefit from except break - 2 do not use an else clause currently, but have multiple breaks that do partly redundant things that could be combined in a single except break clause - 1 is an example of breaking out of two loops; from sre_parse._parse_sub: [...] # check if all items share a common prefix while True: prefix = None for item in items: if not item: break if prefix is None: prefix = item[0] elif item[0] != prefix: break else: # all subitems start with a common "prefix". # move it out of the branch for item in items: del item[0] subpatternappend(prefix) continue # check next one break [...] This could have been written as: [...] # check if all items share a common prefix while True: prefix = None for item in items: if not item: break if prefix is None: prefix = item[0] elif item[0] != prefix: break except break: break # all subitems start with a common "prefix". # move it out of the branch for item in items: del item[0] subpatternappend(prefix) [...] - finally, 1 is a complicated break dance to achieve sth that clearly would have been easier with except break; from typing.py: [...] def __subclasscheck__(self, cls): if cls is Any: return True if isinstance(cls, GenericMeta): # For a class C(Generic[T]) where T is co-variant, # C[X] is a subclass of C[Y] iff X is a subclass of Y. origin = self.__origin__ if origin is not None and origin is cls.__origin__: assert len(self.__args__) == len(origin.__parameters__) assert len(cls.__args__) == len(origin.__parameters__) for p_self, p_cls, p_origin in zip(self.__args__, cls.__args__, origin.__parameters__): if isinstance(p_origin, TypeVar): if p_origin.__covariant__: # Covariant -- p_cls must be a subclass of p_self. if not issubclass(p_cls, p_self): break elif p_origin.__contravariant__: # Contravariant. I think it's the opposite. :-) if not issubclass(p_self, p_cls): break else: # Invariant -- p_cls and p_self must equal. if p_self != p_cls: break else: # If the origin's parameter is not a typevar, # insist on invariance. if p_self != p_cls: break else: return True # If we break out of the loop, the superclass gets a chance. if super().__subclasscheck__(cls): return True if self.__extra__ is None or isinstance(cls, GenericMeta): return False return issubclass(cls, self.__extra__) [...] which could be rewritten as: [...] def __subclasscheck__(self, cls): if cls is Any: return True if isinstance(cls, GenericMeta): # For a class C(Generic[T]) where T is co-variant, # C[X] is a subclass of C[Y] iff X is a subclass of Y. origin = self.__origin__ if origin is not None and origin is cls.__origin__: assert len(self.__args__) == len(origin.__parameters__) assert len(cls.__args__) == len(origin.__parameters__) for p_self, p_cls, p_origin in zip(self.__args__, cls.__args__, origin.__parameters__): if isinstance(p_origin, TypeVar): if p_origin.__covariant__: # Covariant -- p_cls must be a subclass of p_self. if not issubclass(p_cls, p_self): break elif p_origin.__contravariant__: # Contravariant. I think it's the opposite. :-) if not issubclass(p_self, p_cls): break else: # Invariant -- p_cls and p_self must equal. if p_self != p_cls: break else: # If the origin's parameter is not a typevar, # insist on invariance. if p_self != p_cls: break except break: # If we break out of the loop, the superclass gets a chance. if super().__subclasscheck__(cls): return True if self.__extra__ is None or isinstance(cls, GenericMeta): return False return issubclass(cls, self.__extra__) return True [...] My summary: I do see use-cases for the except break clause, but, admittedly, they are relatively rare and may be not worth the hassle of introducing new syntax. From elazarg at gmail.com Thu Mar 2 06:09:32 2017 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Thu, 02 Mar 2017 11:09:32 +0000 Subject: [Python-ideas] Positional-only parameters In-Reply-To: <58B7DE7F.70505@stoneleaf.us> References: <58B7DE7F.70505@stoneleaf.us> Message-ID: Here's a proof-of-concept for the decorator. It does not address the issue of passing aliases to positional arguments to **kwargs - I guess this requires changes in the CPython's core. (Sorry about the coloring, that's how it's pasted) from inspect import signature, Parameter from functools import wraps def positional_only(n): def wrap(f): s = signature(f) params = list(s.parameters.values()) for i in range(n): if params[i].kind != Parameter.POSITIONAL_OR_KEYWORD: raise TypeError('{} has less than {} positional arguments'.format(f.__name__, n)) params[i] = params[i].replace(kind=Parameter.POSITIONAL_ONLY) f.__signature__ = s.replace(parameters=params) @wraps(f) def inner(*args, **kwargs): if len(args) < n: raise TypeError('{} takes at least {} positional arguments'.format(f.__name__, n)) return f(*args, **kwargs) return inner return wrap @positional_only(2) def f(a, b, c): print(a, b, c) help(f) # f(a, b, /, c, **kwargs) f(1, 2, c=2) # f(1, b=2, c=3) # TypeError: f takes at least 2 positional arguments @positional_only(3) def g(a, b, *, c): print(a, b, c) # TypeError: g has less than 3 positional arguments Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Mar 2 06:15:57 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 2 Mar 2017 12:15:57 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On 02.03.2017 11:22, Stephan Houben wrote: > In cases like this I would recommend creating the sentinel yourself: > > NoDefault = object() > > def get(store, key, default=NoDefault): > if default is NoDefault: > # do something > > You can arrange to not export NoDefault so that the client code cannot even > access > the sentinel value. Yes, I know... I've been using the mxTools NotGiven since 1998. > This is strictly preferable over having yet another global > value meaning "no value", since that just moves the goal posts: > clients will complain they cannot pass in a default=NoDefault and get back > NoDefault. Not really. NoDefault would mean: no value provided, not that you don't want a value. As a result, passing NoDefault would not be allowed, since then you'd be providing a value :-) > Stephan > > > 2017-03-02 11:04 GMT+01:00 M.-A. Lemburg : > >> On 02.03.2017 10:06, Serhiy Storchaka wrote: >>> On 02.03.17 10:36, M.-A. Lemburg wrote: >>>> Why a new syntax ? Can't we just have a pre-defined sentinel >>>> singleton NoDefault and use that throughout the code (and also >>>> special case it in argument parsing/handling)? >>>> >>>> def get(store, key, default=NoDefault): >>>> if store.exists(key): >>>> return store.retrieve(key) >>>> ... >>> >>> This means adding a new syntax. NoDefault should be a keyword (we can >>> reuse existing keyword couldn't be used in expression), and it should be >>> accepted only in the specific context of declaring function parameter. >> >> This is not new syntax, nor is it a keyword. It's only a >> new singleton and it is well usable outside of function >> declarations as well, e.g. for class attributes which are >> not yet initialized (and which can accept None as value). >> >> The only special casing would be in function call >> parameter parsing to signal errors when the parameter >> is used as keyword parameter. >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Experts (#1, Mar 02 2017) >>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>>> Python Database Interfaces ... http://products.egenix.com/ >>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ >> ________________________________________________________________________ >> >> ::: We implement business ideas - efficiently in both time and costs ::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> http://www.malemburg.com/ >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From stephanh42 at gmail.com Thu Mar 2 06:31:58 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 2 Mar 2017 12:31:58 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: I am not sure if I fully understand the proposal then. NoDefault would be special syntax so that this would be disallowed: f(NoDefault) but this would be allowed: def f(x=NoDefault): ... and also this: x is NoDefault So this would seem to require an exhaustive list of syntactic contexts in which NoDefault is allowed. I mean, can I do: x = NoDefault ? I observe that I can always get to the underlying NoDefault object in this way: (lambda x=NoDefault:x)() So what happens if I do: f((lambda x=NoDefault:x)()) ? Stephan 2017-03-02 12:15 GMT+01:00 M.-A. Lemburg : > On 02.03.2017 11:22, Stephan Houben wrote: > > In cases like this I would recommend creating the sentinel yourself: > > > > NoDefault = object() > > > > def get(store, key, default=NoDefault): > > if default is NoDefault: > > # do something > > > > You can arrange to not export NoDefault so that the client code cannot > even > > access > > the sentinel value. > > Yes, I know... I've been using the mxTools NotGiven since 1998. > > > This is strictly preferable over having yet another global > > value meaning "no value", since that just moves the goal posts: > > clients will complain they cannot pass in a default=NoDefault and get > back > > NoDefault. > > Not really. NoDefault would mean: no value provided, not that > you don't want a value. As a result, passing NoDefault would > not be allowed, since then you'd be providing a value :-) > > > Stephan > > > > > > 2017-03-02 11:04 GMT+01:00 M.-A. Lemburg : > > > >> On 02.03.2017 10:06, Serhiy Storchaka wrote: > >>> On 02.03.17 10:36, M.-A. Lemburg wrote: > >>>> Why a new syntax ? Can't we just have a pre-defined sentinel > >>>> singleton NoDefault and use that throughout the code (and also > >>>> special case it in argument parsing/handling)? > >>>> > >>>> def get(store, key, default=NoDefault): > >>>> if store.exists(key): > >>>> return store.retrieve(key) > >>>> ... > >>> > >>> This means adding a new syntax. NoDefault should be a keyword (we can > >>> reuse existing keyword couldn't be used in expression), and it should > be > >>> accepted only in the specific context of declaring function parameter. > >> > >> This is not new syntax, nor is it a keyword. It's only a > >> new singleton and it is well usable outside of function > >> declarations as well, e.g. for class attributes which are > >> not yet initialized (and which can accept None as value). > >> > >> The only special casing would be in function call > >> parameter parsing to signal errors when the parameter > >> is used as keyword parameter. > >> > >> -- > >> Marc-Andre Lemburg > >> eGenix.com > >> > >> Professional Python Services directly from the Experts (#1, Mar 02 2017) > >>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>>>> Python Database Interfaces ... http://products.egenix.com/ > >>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > >> ____________________________________________________________ > ____________ > >> > >> ::: We implement business ideas - efficiently in both time and costs ::: > >> > >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > >> Registered at Amtsgericht Duesseldorf: HRB 46611 > >> http://www.egenix.com/company/contact/ > >> http://www.malemburg.com/ > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > >> > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Mar 02 2017) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Mar 2 07:08:42 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 2 Mar 2017 13:08:42 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On 02.03.2017 12:31, Stephan Houben wrote: > I am not sure if I fully understand the proposal then. > > NoDefault would be special syntax so that this would be disallowed: > > f(NoDefault) > > but this would be allowed: > def f(x=NoDefault): > ... > > and also this: > > x is NoDefault > > So this would seem to require an exhaustive list of syntactic contexts > in which NoDefault is allowed. I mean, can I do: > > x = NoDefault > > ? > > I observe that I can always get to the underlying NoDefault object in this > way: > > (lambda x=NoDefault:x)() > > So what happens if I do: > > f((lambda x=NoDefault:x)()) > > ? Sorry for the confusion. NoDefault would be usable just like any other singleton. There would only be one case where it would cause an exception, namely when you declare a parameter as having NoDefault as value. This would trigger special logic in the argument parsing code to disallow using that parameter as keyword parameter. Example: def f(x=NoDefault): # x is an optional positional parameter if x is NoDefault: # x was not passed in as parameter ... else: # x was provided as parameter ... These would all work fine: f() f(1) f(None) This would trigger an exception in the argument parsing code: f(x=NoDefault) e.g. TypeError('x is a positional only parameter') This would not trigger an exception: f(NoDefault) since x is not being used as keyword parameter and the function f may want to pass the optional positional parameter down to other functions with optional positional paramters as well. Is this clearer now ? Note: The name of the singleton could be something else as well, e.g. NoKeywordParameter :-) > Stephan > > > 2017-03-02 12:15 GMT+01:00 M.-A. Lemburg : > >> On 02.03.2017 11:22, Stephan Houben wrote: >>> In cases like this I would recommend creating the sentinel yourself: >>> >>> NoDefault = object() >>> >>> def get(store, key, default=NoDefault): >>> if default is NoDefault: >>> # do something >>> >>> You can arrange to not export NoDefault so that the client code cannot >> even >>> access >>> the sentinel value. >> >> Yes, I know... I've been using the mxTools NotGiven since 1998. >> >>> This is strictly preferable over having yet another global >>> value meaning "no value", since that just moves the goal posts: >>> clients will complain they cannot pass in a default=NoDefault and get >> back >>> NoDefault. >> >> Not really. NoDefault would mean: no value provided, not that >> you don't want a value. As a result, passing NoDefault would >> not be allowed, since then you'd be providing a value :-) >> >>> Stephan >>> >>> >>> 2017-03-02 11:04 GMT+01:00 M.-A. Lemburg : >>> >>>> On 02.03.2017 10:06, Serhiy Storchaka wrote: >>>>> On 02.03.17 10:36, M.-A. Lemburg wrote: >>>>>> Why a new syntax ? Can't we just have a pre-defined sentinel >>>>>> singleton NoDefault and use that throughout the code (and also >>>>>> special case it in argument parsing/handling)? >>>>>> >>>>>> def get(store, key, default=NoDefault): >>>>>> if store.exists(key): >>>>>> return store.retrieve(key) >>>>>> ... >>>>> >>>>> This means adding a new syntax. NoDefault should be a keyword (we can >>>>> reuse existing keyword couldn't be used in expression), and it should >> be >>>>> accepted only in the specific context of declaring function parameter. >>>> >>>> This is not new syntax, nor is it a keyword. It's only a >>>> new singleton and it is well usable outside of function >>>> declarations as well, e.g. for class attributes which are >>>> not yet initialized (and which can accept None as value). >>>> >>>> The only special casing would be in function call >>>> parameter parsing to signal errors when the parameter >>>> is used as keyword parameter. >>>> >>>> -- >>>> Marc-Andre Lemburg >>>> eGenix.com >>>> >>>> Professional Python Services directly from the Experts (#1, Mar 02 2017) >>>>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>>>>> Python Database Interfaces ... http://products.egenix.com/ >>>>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ >>>> ____________________________________________________________ >> ____________ >>>> >>>> ::: We implement business ideas - efficiently in both time and costs ::: >>>> >>>> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >>>> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >>>> Registered at Amtsgericht Duesseldorf: HRB 46611 >>>> http://www.egenix.com/company/contact/ >>>> http://www.malemburg.com/ >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>> >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Experts (#1, Mar 02 2017) >>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>>> Python Database Interfaces ... http://products.egenix.com/ >>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ >> ________________________________________________________________________ >> >> ::: We implement business ideas - efficiently in both time and costs ::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> http://www.malemburg.com/ >> >> > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From p.f.moore at gmail.com Thu Mar 2 07:20:23 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Mar 2017 12:20:23 +0000 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On 2 March 2017 at 11:31, Stephan Houben wrote: > NoDefault would be special syntax so that this would be disallowed: > > f(NoDefault) I think the key point of confusion here is whether the language needs to enforce this or it's just convention. MAL is saying that f(NoDefault) is disallowed - but not that the language will somehow notice what you've done and refuse to let you, just that you mustn't do it or your program will be wrong. Stephan seems to be saying that you'd get a SyntaxError (or a RuntimeError? I'm not sure when you'd expect this to be detected - consider f(*[NoDefault])) if you wrote that. Philosophically, Python has always tended in the past towards a "consenting adults" rule - so we don't reject code like this but expect people to use the constructs given in the way they were intended. The OP's proposal is about making it more convenient to specify that parameters are "positional only", by avoiding the need to create custom sentinels (or agree on a common conventional value). That seems to me to be a reasonable proposal - typical sentinel handling code is fairly verbose. OTOH, creating a language mandated sentinel does nothing to improve readability (all we gain is omission of the creation of a custom sentinel) and has the downside that we add yet another special value to the language, that provides a subtly different meaning of "not present" than the ones we have. So I guess I'm +0.5 on the proposed "positional only parameters" syntax, and -1 on any form of new language-defined sentinel value. Paul From stephanh42 at gmail.com Thu Mar 2 07:22:13 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 2 Mar 2017 13:22:13 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: OK, I get it, I think. I presume it is really the object identity which matters, not the syntax, so: y = NoDefault f(x=y) would be equally an error. Would this also apply if we provide or capture the keyword arguments using ** ? I.e. f(**{"x": NoDict}) (lambda **kw: kw)(x=NoDict) In that case I see a problem with this idiom: newdict = dict(**olddict) This would now start throwing errors in case any of the values of olddict was NoDefault. Stephan 2017-03-02 13:08 GMT+01:00 M.-A. Lemburg : > On 02.03.2017 12:31, Stephan Houben wrote: > > I am not sure if I fully understand the proposal then. > > > > NoDefault would be special syntax so that this would be disallowed: > > > > f(NoDefault) > > > > but this would be allowed: > > def f(x=NoDefault): > > ... > > > > and also this: > > > > x is NoDefault > > > > So this would seem to require an exhaustive list of syntactic contexts > > in which NoDefault is allowed. I mean, can I do: > > > > x = NoDefault > > > > ? > > > > I observe that I can always get to the underlying NoDefault object in > this > > way: > > > > (lambda x=NoDefault:x)() > > > > So what happens if I do: > > > > f((lambda x=NoDefault:x)()) > > > > ? > > Sorry for the confusion. NoDefault would be usable just like > any other singleton. > > There would only be one case where it would cause an exception, > namely when you declare a parameter as having NoDefault as value. > This would trigger special logic in the argument parsing code to > disallow using that parameter as keyword parameter. > > Example: > > def f(x=NoDefault): > # x is an optional positional parameter > if x is NoDefault: > # x was not passed in as parameter > ... > else: > # x was provided as parameter > ... > > These would all work fine: > > f() > f(1) > f(None) > > This would trigger an exception in the argument parsing code: > > f(x=NoDefault) > > e.g. TypeError('x is a positional only parameter') > > This would not trigger an exception: > > f(NoDefault) > > since x is not being used as keyword parameter and the > function f may want to pass the optional positional parameter > down to other functions with optional positional paramters > as well. > > Is this clearer now ? > > Note: The name of the singleton could be something else > as well, e.g. NoKeywordParameter :-) > > > Stephan > > > > > > 2017-03-02 12:15 GMT+01:00 M.-A. Lemburg : > > > >> On 02.03.2017 11:22, Stephan Houben wrote: > >>> In cases like this I would recommend creating the sentinel yourself: > >>> > >>> NoDefault = object() > >>> > >>> def get(store, key, default=NoDefault): > >>> if default is NoDefault: > >>> # do something > >>> > >>> You can arrange to not export NoDefault so that the client code cannot > >> even > >>> access > >>> the sentinel value. > >> > >> Yes, I know... I've been using the mxTools NotGiven since 1998. > >> > >>> This is strictly preferable over having yet another global > >>> value meaning "no value", since that just moves the goal posts: > >>> clients will complain they cannot pass in a default=NoDefault and get > >> back > >>> NoDefault. > >> > >> Not really. NoDefault would mean: no value provided, not that > >> you don't want a value. As a result, passing NoDefault would > >> not be allowed, since then you'd be providing a value :-) > >> > >>> Stephan > >>> > >>> > >>> 2017-03-02 11:04 GMT+01:00 M.-A. Lemburg : > >>> > >>>> On 02.03.2017 10:06, Serhiy Storchaka wrote: > >>>>> On 02.03.17 10:36, M.-A. Lemburg wrote: > >>>>>> Why a new syntax ? Can't we just have a pre-defined sentinel > >>>>>> singleton NoDefault and use that throughout the code (and also > >>>>>> special case it in argument parsing/handling)? > >>>>>> > >>>>>> def get(store, key, default=NoDefault): > >>>>>> if store.exists(key): > >>>>>> return store.retrieve(key) > >>>>>> ... > >>>>> > >>>>> This means adding a new syntax. NoDefault should be a keyword (we can > >>>>> reuse existing keyword couldn't be used in expression), and it should > >> be > >>>>> accepted only in the specific context of declaring function > parameter. > >>>> > >>>> This is not new syntax, nor is it a keyword. It's only a > >>>> new singleton and it is well usable outside of function > >>>> declarations as well, e.g. for class attributes which are > >>>> not yet initialized (and which can accept None as value). > >>>> > >>>> The only special casing would be in function call > >>>> parameter parsing to signal errors when the parameter > >>>> is used as keyword parameter. > >>>> > >>>> -- > >>>> Marc-Andre Lemburg > >>>> eGenix.com > >>>> > >>>> Professional Python Services directly from the Experts (#1, Mar 02 > 2017) > >>>>>>> Python Projects, Coaching and Consulting ... > http://www.egenix.com/ > >>>>>>> Python Database Interfaces ... > http://products.egenix.com/ > >>>>>>> Plone/Zope Database Interfaces ... > http://zope.egenix.com/ > >>>> ____________________________________________________________ > >> ____________ > >>>> > >>>> ::: We implement business ideas - efficiently in both time and costs > ::: > >>>> > >>>> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > >>>> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > >>>> Registered at Amtsgericht Duesseldorf: HRB 46611 > >>>> http://www.egenix.com/company/contact/ > >>>> http://www.malemburg.com/ > >>>> > >>>> _______________________________________________ > >>>> Python-ideas mailing list > >>>> Python-ideas at python.org > >>>> https://mail.python.org/mailman/listinfo/python-ideas > >>>> Code of Conduct: http://python.org/psf/codeofconduct/ > >>>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Python-ideas mailing list > >>> Python-ideas at python.org > >>> https://mail.python.org/mailman/listinfo/python-ideas > >>> Code of Conduct: http://python.org/psf/codeofconduct/ > >>> > >> > >> -- > >> Marc-Andre Lemburg > >> eGenix.com > >> > >> Professional Python Services directly from the Experts (#1, Mar 02 2017) > >>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>>>> Python Database Interfaces ... http://products.egenix.com/ > >>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > >> ____________________________________________________________ > ____________ > >> > >> ::: We implement business ideas - efficiently in both time and costs ::: > >> > >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > >> Registered at Amtsgericht Duesseldorf: HRB 46611 > >> http://www.egenix.com/company/contact/ > >> http://www.malemburg.com/ > >> > >> > > > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Mar 02 2017) > >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ > >>> Python Database Interfaces ... http://products.egenix.com/ > >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Thu Mar 2 07:26:04 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Thu, 2 Mar 2017 13:26:04 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On 2 March 2017 at 13:20, Paul Moore wrote: > On 2 March 2017 at 11:31, Stephan Houben wrote: > > NoDefault would be special syntax so that this would be disallowed: > > > > f(NoDefault) > > [...] > > So I guess I'm +0.5 on the proposed "positional only parameters" > syntax, and -1 on any form of new language-defined sentinel value. > > This is also my opinion. -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Mar 2 07:30:15 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 2 Mar 2017 13:30:15 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: <776e3327-16cd-76fd-dc6c-dd21b4f4b9c5@egenix.com> On 02.03.2017 13:22, Stephan Houben wrote: > OK, I get it, I think. > > I presume it is really the object identity which matters, not the syntax, > so: > > y = NoDefault > f(x=y) > > would be equally an error. Yes. > Would this also apply if we provide or capture the keyword arguments using > ** ? > > I.e. > f(**{"x": NoDict}) I think you meant NoDefault here. > (lambda **kw: kw)(x=NoDict) > > In that case I see a problem with this idiom: > > newdict = dict(**olddict) > > This would now start throwing errors in case any of the values of olddict > was NoDefault. Continuing the example, this case would throw an error as well: kwargs = {'x': NoDefault) f(**kwargs) e.g. TypeError('x is a positional only parameter') However, only because f "declared" x as optional positional parameter. If you'd pass the same dict to a function g as in: def g(x): pass g(**kwargs) it would not raise an exception, since Python functions always allow passing in keyword parameters for positional parameters (unlike C functions, which only allow this if configured that way). > Stephan > > > > 2017-03-02 13:08 GMT+01:00 M.-A. Lemburg : > >> On 02.03.2017 12:31, Stephan Houben wrote: >>> I am not sure if I fully understand the proposal then. >>> >>> NoDefault would be special syntax so that this would be disallowed: >>> >>> f(NoDefault) >>> >>> but this would be allowed: >>> def f(x=NoDefault): >>> ... >>> >>> and also this: >>> >>> x is NoDefault >>> >>> So this would seem to require an exhaustive list of syntactic contexts >>> in which NoDefault is allowed. I mean, can I do: >>> >>> x = NoDefault >>> >>> ? >>> >>> I observe that I can always get to the underlying NoDefault object in >> this >>> way: >>> >>> (lambda x=NoDefault:x)() >>> >>> So what happens if I do: >>> >>> f((lambda x=NoDefault:x)()) >>> >>> ? >> >> Sorry for the confusion. NoDefault would be usable just like >> any other singleton. >> >> There would only be one case where it would cause an exception, >> namely when you declare a parameter as having NoDefault as value. >> This would trigger special logic in the argument parsing code to >> disallow using that parameter as keyword parameter. >> >> Example: >> >> def f(x=NoDefault): >> # x is an optional positional parameter >> if x is NoDefault: >> # x was not passed in as parameter >> ... >> else: >> # x was provided as parameter >> ... >> >> These would all work fine: >> >> f() >> f(1) >> f(None) >> >> This would trigger an exception in the argument parsing code: >> >> f(x=NoDefault) >> >> e.g. TypeError('x is a positional only parameter') >> >> This would not trigger an exception: >> >> f(NoDefault) >> >> since x is not being used as keyword parameter and the >> function f may want to pass the optional positional parameter >> down to other functions with optional positional paramters >> as well. >> >> Is this clearer now ? >> >> Note: The name of the singleton could be something else >> as well, e.g. NoKeywordParameter :-) >> >>> Stephan >>> >>> >>> 2017-03-02 12:15 GMT+01:00 M.-A. Lemburg : >>> >>>> On 02.03.2017 11:22, Stephan Houben wrote: >>>>> In cases like this I would recommend creating the sentinel yourself: >>>>> >>>>> NoDefault = object() >>>>> >>>>> def get(store, key, default=NoDefault): >>>>> if default is NoDefault: >>>>> # do something >>>>> >>>>> You can arrange to not export NoDefault so that the client code cannot >>>> even >>>>> access >>>>> the sentinel value. >>>> >>>> Yes, I know... I've been using the mxTools NotGiven since 1998. >>>> >>>>> This is strictly preferable over having yet another global >>>>> value meaning "no value", since that just moves the goal posts: >>>>> clients will complain they cannot pass in a default=NoDefault and get >>>> back >>>>> NoDefault. >>>> >>>> Not really. NoDefault would mean: no value provided, not that >>>> you don't want a value. As a result, passing NoDefault would >>>> not be allowed, since then you'd be providing a value :-) >>>> >>>>> Stephan >>>>> >>>>> >>>>> 2017-03-02 11:04 GMT+01:00 M.-A. Lemburg : >>>>> >>>>>> On 02.03.2017 10:06, Serhiy Storchaka wrote: >>>>>>> On 02.03.17 10:36, M.-A. Lemburg wrote: >>>>>>>> Why a new syntax ? Can't we just have a pre-defined sentinel >>>>>>>> singleton NoDefault and use that throughout the code (and also >>>>>>>> special case it in argument parsing/handling)? >>>>>>>> >>>>>>>> def get(store, key, default=NoDefault): >>>>>>>> if store.exists(key): >>>>>>>> return store.retrieve(key) >>>>>>>> ... >>>>>>> >>>>>>> This means adding a new syntax. NoDefault should be a keyword (we can >>>>>>> reuse existing keyword couldn't be used in expression), and it should >>>> be >>>>>>> accepted only in the specific context of declaring function >> parameter. >>>>>> >>>>>> This is not new syntax, nor is it a keyword. It's only a >>>>>> new singleton and it is well usable outside of function >>>>>> declarations as well, e.g. for class attributes which are >>>>>> not yet initialized (and which can accept None as value). >>>>>> >>>>>> The only special casing would be in function call >>>>>> parameter parsing to signal errors when the parameter >>>>>> is used as keyword parameter. >>>>>> >>>>>> -- >>>>>> Marc-Andre Lemburg >>>>>> eGenix.com >>>>>> >>>>>> Professional Python Services directly from the Experts (#1, Mar 02 >> 2017) >>>>>>>>> Python Projects, Coaching and Consulting ... >> http://www.egenix.com/ >>>>>>>>> Python Database Interfaces ... >> http://products.egenix.com/ >>>>>>>>> Plone/Zope Database Interfaces ... >> http://zope.egenix.com/ >>>>>> ____________________________________________________________ >>>> ____________ >>>>>> >>>>>> ::: We implement business ideas - efficiently in both time and costs >> ::: >>>>>> >>>>>> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >>>>>> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >>>>>> Registered at Amtsgericht Duesseldorf: HRB 46611 >>>>>> http://www.egenix.com/company/contact/ >>>>>> http://www.malemburg.com/ >>>>>> >>>>>> _______________________________________________ >>>>>> Python-ideas mailing list >>>>>> Python-ideas at python.org >>>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>> >>>> >>>> -- >>>> Marc-Andre Lemburg >>>> eGenix.com >>>> >>>> Professional Python Services directly from the Experts (#1, Mar 02 2017) >>>>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>>>>> Python Database Interfaces ... http://products.egenix.com/ >>>>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ >>>> ____________________________________________________________ >> ____________ >>>> >>>> ::: We implement business ideas - efficiently in both time and costs ::: >>>> >>>> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >>>> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >>>> Registered at Amtsgericht Duesseldorf: HRB 46611 >>>> http://www.egenix.com/company/contact/ >>>> http://www.malemburg.com/ >>>> >>>> >>> >> >> -- >> Marc-Andre Lemburg >> eGenix.com >> >> Professional Python Services directly from the Experts (#1, Mar 02 2017) >>>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>>> Python Database Interfaces ... http://products.egenix.com/ >>>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ >> ________________________________________________________________________ >> >> ::: We implement business ideas - efficiently in both time and costs ::: >> >> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 >> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg >> Registered at Amtsgericht Duesseldorf: HRB 46611 >> http://www.egenix.com/company/contact/ >> http://www.malemburg.com/ >> >> > -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From rosuav at gmail.com Thu Mar 2 07:35:53 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 2 Mar 2017 23:35:53 +1100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On Thu, Mar 2, 2017 at 11:22 PM, Stephan Houben wrote: > Would this also apply if we provide or capture the keyword arguments using > ** ? > > I.e. > f(**{"x": NoDict}) > > (lambda **kw: kw)(x=NoDict) > > In that case I see a problem with this idiom: > > newdict = dict(**olddict) > > This would now start throwing errors in case any of the values of olddict > was NoDefault. > You shouldn't be returning NoDefault anywhere, though, so the only problem is that the error is being reported in the wrong place. If this were to become syntax, enhanced linters could track down exactly where NoDefault came from, and report the error, because that's really where the bug is. ChrisA From clint.hepner at gmail.com Thu Mar 2 07:57:06 2017 From: clint.hepner at gmail.com (Clint Hepner) Date: Thu, 2 Mar 2017 07:57:06 -0500 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> Message-ID: <66FC1EFC-F51C-4EC7-B144-360D659CB4E9@gmail.com> > On 2017 Mar 2 , at 2:53 a, Stephan Houben wrote: > > A crucial difference between a set and a type is that you cannot > explicitly iterate over the elements of a type, so while we could implement > > x in int > > to do something useful, we cannot make > > for x in int: > print(x) > __contains__ was introduced to provide a more efficient test than to simply iterate over the elements one by one. I don?t see why something has to be iterable in order to implement __contains__, though. class PositiveInts(int): def __contains__(self, x): return x > 0 N = PostiveInts() > Because if we could, we could implement Russell's paradox in Python: > > R = set(x for x in object if x not in x) > > print(R in R) object is not the equivalent of the paradoxical set of all sets. It?s closer to the set of all valid Python values. That includes all valid Python set values, but a Python set is not mathematical set; it?s a *finite* collection of *hashable* values. From steve at pearwood.info Thu Mar 2 07:58:36 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 2 Mar 2017 23:58:36 +1100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <58e46182-1210-438f-b85a-02aa1bd9dc9e@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <20170228235616.GP5689@ando.pearwood.info> <58e46182-1210-438f-b85a-02aa1bd9dc9e@gmail.com> Message-ID: <20170302125835.GB5689@ando.pearwood.info> On Wed, Mar 01, 2017 at 02:56:44AM +0100, Michel Desmoulin wrote: > > first_item = (alist[0:1] or ["ham"])[0] > > Come on, I've been doing Python for more than a decade and never saw > anybody doing that. Even reading it in a code would make me scratch my > head for a moment with a "what is it doing that for?". These days, it might be better to write it as: first_item = alist[0] if len(alist) else "ham" but I remember the days when slicing was normal and using the `or` trick was standard operating procedure. > You are trying to hard to provide a counter argument here. In context, all I'm saying is that you don't *have* to catch IndexError. There are alternatives. That is all. > Me, I have to deal SOAP government systems, mongodb based API built by > teenagers, geographer data set exports and FTP + CSV in marina systems > (which I happen to work on right now). > > 3rd party CSV, XML and JSON processing are just a hundred of lines of > try/except on indexing because they have many listings, data positions > is important and a lot of system got it wrong, giving you inconsistent > output with missing data and terrible labeling. This is all very well and good, and I feel your pain for having to deal with garbage data, but I don't see how this helps you. You talk about missing data, but lists cannot contain missing data from the middle. There's no such thing as a list like: [0, 1, 2, 3, , , , , , , 10, 11, 12] where alist[3] and alist[10] will succeed but alist[4] etc will raise IndexError. So I'm still trying to understand what this proposal gets you that wouldn't be better solved using (say) itertools.zip_longest or a pre-processing step to clean up your data. > And because life is unfair, the data you can extract is often a mix of > heterogeneous mappings and lists / tuples. And your tool must manage the > various versions of the data format they send to you, some with > additional fields, or missing ones. Some named, other found by position. Maybe I'm underestimating just how awful your data is, but I'm having difficulty thinking of a scenario where you don't know what kind of object you are processing and have to write completely type-agnostic code: for key_or_index in sequence_of_keys_or_indexes: result = sequence_or_mapping[key_or_index] I'm sure that there is lots of code where you iterate over dicts: for key in keys: result = mapping.get(key, default) and likewise code where you process lists: for i in indexes: try: result = sequence[i] except IndexError: result = default # could be re-written using a helper function: for i in indexes: result = get(sequence, i default) but I've never come across a data-processing situation where I didn't know which was which. That second version with the helper function would be *marginally* nicer written as a method call. I grant you that! > This summer, I had to convert a data set provided by polls in africa > through an android form, generated from an XML schema, send as json > using Ajax, then stored in mongodb... to an excel spread sheet (and also > an HTML table and some JS graphs for good measure). > > Needingless to say I dealt with a looooot of IndexError. Grepping the > project gives me: > > grep -R IndexError | wc -l > 33 > > In contrast I have 32 KeyError (most of them to allow lazy default > value), and 3 decorators. So 33 is "a looooot", but 32 KeyErrors is "in contrast" and presumably a little. > Apparently IndexError is an important error because if I grep the > virtualenv of the project: > > grep -R IndexError | wc -l > 733 > > Ok, it's a pretty large project with 154 dependancies, but it's still > almost 7 IndexError by package on average. So it's not a rare use case. You don't know that every one of those can be replaced by list.get(). Some of them might be raising IndexError; some of them may be documenting that a function or method raises IndexError; etc. > I also see it regularly in my classes. Students try it because they > learned it works with dict. It makes sense. I never said it didn't. But I wonder whether it gives *enough* benefit to be worth while. -- Steve From storchaka at gmail.com Thu Mar 2 08:08:15 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 2 Mar 2017 15:08:15 +0200 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On 02.03.17 12:04, M.-A. Lemburg wrote: > This is not new syntax, nor is it a keyword. It's only a > new singleton and it is well usable outside of function > declarations as well, e.g. for class attributes which are > not yet initialized (and which can accept None as value). If it is not a keyword, it could be used in expressions, e.g. assigned to a variable or passed to a function. It could be monkey-patched or hidden by accident (as True and False in Python 2). From storchaka at gmail.com Thu Mar 2 08:11:41 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 2 Mar 2017 15:11:41 +0200 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On 02.03.17 14:20, Paul Moore wrote: > So I guess I'm +0.5 on the proposed "positional only parameters" > syntax, and -1 on any form of new language-defined sentinel value. My proposition is not about "positional-only parameters". From steve at pearwood.info Thu Mar 2 08:23:18 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 3 Mar 2017 00:23:18 +1100 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: <20170302132317.GC5689@ando.pearwood.info> On Tue, Feb 28, 2017 at 10:17:31PM +0100, Victor Stinner wrote: > My question is: would it make sense to implement this feature [positional only parameters] > in Python directly? +0 on positional-only parameters. > If yes, what should be the syntax? Use "/" marker? I think that / makes a nice pun with * and is easy to remember. I dislike the proposed double-leading-only underscore convention, as that makes the names look like *private* parameters the caller shouldn't provide at all. And it leads to confusion: def function(__a, b, _c, * __d): ... So __a is positional-only, b could be positional or keyword, _c is private, and __d is keyword-only but just happens to start with two underscores. Yuck. I think that [...] is completely unacceptable. It is a very common convention to use square brackets to flag optional arguments when writing function signatures in documentation, e.g.: Help on class zip in module builtins: class zip(object) | zip(iter1 [,iter2 [...]]) --> zip object It would be confusing to have [...] have syntactic meaning different to that convention. > Use the @positional() decorator? I suppose a positional() decorator would be useful for backporting, but I wouldn't want it to be the One Obvious Way to get positional arguments. > By the way, I read that "/" marker is unknown by almost all Python > developers, Of course it is not well known -- it is only new, and not legal syntax yet! Unless they are heavily using Argument Clinic they probably won't recognise it. > and [...] syntax should be preferred, but > inspect.signature() doesn't support this syntax. Maybe we should fix > signature() and use [...] format instead? -1 > Replace "replace(self, old, new, count=-1, /)" with "replace(self, > old, new[, count=-1])" (or maybe even not document the default > value?). That isn't right. It would have to be: replace([self, old, new, count=-1]) if all of the arguments are positional-only. But that makes it look like they are all optional! A very strong -1 to this. > Python 3.5 help (docstring) uses "S.replace(old, new[, count])". Should be: S.replace(old, new[, count], /) which shows that all three arguments are positional only, but only count is optional. -- Steve From steve at pearwood.info Thu Mar 2 08:44:16 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 3 Mar 2017 00:44:16 +1100 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: <20170302032308.g4hsqdmhacycaqgn@jaerhard.com> References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> <20170302032308.g4hsqdmhacycaqgn@jaerhard.com> Message-ID: <20170302134416.GD5689@ando.pearwood.info> On Thu, Mar 02, 2017 at 04:23:08AM +0100, J?rgen A. Erhard wrote: > > The OP seems to be proposing that we reflect this identity between > > types and sets in Python by spelling "isinstance(obj, T)" as "obj in > > T" and "issubclass(S, T)" as "S <= T". This proposal has some solid > > theory behind it and I don't think it would be hard to implement, but > > it doesn't seem like a particularly useful change to me. It wouldn't > > really enable anything we can't do now, and it may be confusing to > > people reading code that "obj in list" does something completely > > different from "obj in list()". > > So? Compare to "fn" vs "fn()" now. Yes, some people are confused. > So what. You *do* have to learn things. I don't understand what this comparison is supposed to show. `fn` is a name. `fn()` does a function call on whatever object is bound to `fn`. How is that relevant to the question of adding a completely separate set-like interface to isinstance and issubclass? > And "enable anything we can't do now". That argument was used any > number of times on this list, and even before this very list even > existed. It has been used many times. That is because it is a GOOD argument. We don't just add every single random feature that people can think of. ("Hey, wouldn't it be AWESOME if Class*str returned a list of class methods that contained the string in their name???") The new functionality, spelling or syntax has to add some benefit to make up for the extra cost: - the cost to develop this new feature or syntax; - the cost to write new tests for this feature; - the cost to document it; - the cost to documentation in books and the web that are now obsolete or incorrect; - the cost for people to learn this feature; - the cost for people to decide whether to use the old or the new spelling; and so forth. > Still, we got decorators (they don't enable anything we > couldn't do without them, and we actually can still do what they do > without using them). Indeed, but you are missing that decorator syntax brings some VERY important benefits that are a clear win over the old way: @decorate def spam(): ... versus def spam(): ... spam = decorate(spam) The decorator syntax avoids writing the function name three times, but more importantly, it puts the decoration right at the start of the function, next to the signature, where it is more obvious and easier to see, instead of way down the end, which might be many lines or even pages away. That's a very big win that makes decorator syntax the One Obvious Way to decorate functions and classes. Compare to the OP's suggestion: 23 in int This doesn't even make sense unless you have been exposed to a very small subset of theoretical computer science which treats classes as sets and instances as elements of those sets. To everyone else, especially those with a background in "ordinary" OOP, it looks like nonsense. (Personally, I'm a bit dubious about conflating is-a and element-of operations in this way, it feels like a category mistake to me, but for the sake of the argument I'll accept it.) So the benefit applies only to a tiny subset of users, but the cost is carried by everyone. -- Steve From rosuav at gmail.com Thu Mar 2 08:50:46 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 3 Mar 2017 00:50:46 +1100 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: <20170302134416.GD5689@ando.pearwood.info> References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> <20170302032308.g4hsqdmhacycaqgn@jaerhard.com> <20170302134416.GD5689@ando.pearwood.info> Message-ID: On Fri, Mar 3, 2017 at 12:44 AM, Steven D'Aprano wrote: > Compare to the OP's suggestion: > > 23 in int > > This doesn't even make sense unless you have been exposed to a very > small subset of theoretical computer science which treats classes as > sets and instances as elements of those sets. To everyone else, > especially those with a background in "ordinary" OOP, it looks like > nonsense. > > (Personally, I'm a bit dubious about conflating is-a and element-of > operations in this way, it feels like a category mistake to me, but for > the sake of the argument I'll accept it.) I've seen languages in which types can be the RHO of 'is', so this would look like: 23 is int Obviously that depends on types not themselves being first-class objects, but it makes a lot more sense than a containment check. But I'm trying to think how frequently I do *any* type checking in production code. It's not often. It doesn't need syntax. isinstance(23, int) works fine. ChrisA From p.f.moore at gmail.com Thu Mar 2 09:11:43 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Mar 2017 14:11:43 +0000 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On 2 March 2017 at 13:11, Serhiy Storchaka wrote: > On 02.03.17 14:20, Paul Moore wrote: >> >> So I guess I'm +0.5 on the proposed "positional only parameters" >> syntax, and -1 on any form of new language-defined sentinel value. > > > My proposition is not about "positional-only parameters". Bah, sorry. I'm getting muddled between two different threads. I'm not having a good day, it seems :-( On the proposed feature, I don't like any of the proposed syntaxes (I'd rate "default=" with no value as the least bad, but I don't like it much; "default?" as opposed to "?default" is a possible option). I'm not convinced that the version using the new syntax is any easier to read or understand - the sentinel pattern is pretty well-understood by now, and a built-in replacement would need to result in noticeably simpler code (which this proposal doesn't seem to). Agreed that the help() output is ugly. It would of course be possible to give the sentinel a nicer repr, if you wanted: >>> class Sentinel(object): ... def __repr__(self): return "" ... >>> _sentinel = Sentinel() >>> def get(store, key, default=_sentinel): ... pass ... >>> help(get) Help on function get in module __main__: get(store, key, default=) Whether it's worth doing this depends on the application, of course (just like it's possible to hide the name of the sentinel if it matters sufficiently). Paul From steve at pearwood.info Thu Mar 2 09:15:21 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 3 Mar 2017 01:15:21 +1100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: <20170302141521.GE5689@ando.pearwood.info> On Thu, Mar 02, 2017 at 01:08:42PM +0100, M.-A. Lemburg wrote: > Sorry for the confusion. NoDefault would be usable just like > any other singleton. But that is exactly the trouble! We already have a singleton to indicate "no default" -- that is spelled None. Occasionally, we need to allow None as a legitimate value, not just as a sentinel indicating a missing value. So the current practice is to create your own sentinel. That just pushes the problem back one more level: what happens when you have a function where the new NoDefault singleton is a legitimate value? You need a *third* sentinel value. And a fourth, and so on... > There would only be one case where it would cause an exception, > namely when you declare a parameter as having NoDefault as value. > This would trigger special logic in the argument parsing code to > disallow using that parameter as keyword parameter. Did you miss Serhiy's comment? Optional parameters without default value can be positional-or-keyword, keyword-only or positional-only (if the latter is implemented). It doesn't matter whether the parameter is positional or keyword, or how you call the function: f(NoDefault) f(x=NoDefault) the effect should be the same. But I think that's the wrong solution. If it were a good solution, we should just say that None is the "No Default" value and prohibit passing None as an argument. But of course we can't do that, even if we were willing to break backwards compatibility. There are good reasons for passing None as a legitimate value, and there will be good reasons for passing NoDefault as a legitimate value too. The problem with having a special value that means "no value" is that it actually is a value. Serhiy has a good idea here: cut the gordian knot by *avoiding having a value at all*. The parameter name remains unbound. If you don't pass a value for the optional parameter, and then try to access the parameter, you get a NameError exception! That's really clever and I like it a lot. -- Steve From stephanh42 at gmail.com Thu Mar 2 09:15:44 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 2 Mar 2017 15:15:44 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: Hi Chris, I do not think such a magic linter can be written. It seems an obvious instance of the Halting Problem to me. If it is possible, then the problem is moot anyway since we can just write it and teach it to treat some arbitrary sentinel NoDefault = object() as the magic value and there will be no need to patch the Python core. Regarding your remark that "you shouldn't return NoDefault anyway": well, *I* will obviously not do so ;-) , but some substantial fraction of programmers will use it as a handy extra sentinel when they can get away with it. And then libraries which deal with processing arbitrary parameters, return values or member values are going to have to deal with the issue if they are going to become "NoDefault-clean" or if they are going to document their NoDefault-uncleanliness. For examples from the standard library, would these work: multiprocessing.Process(target=foo, args=(NoDefault,)) asyncio.AbstractEventLoop.call_later(delay, callback, NoDefault) functools.partial(f, NoDefault) shelve.open("spam")["x"] = NoDefault They should, shouldn't they? I never passed in NoDefault by keyword argument. This generator: def bad_idea(): yield NoDefault can we guarantee it works with all stuff in itertools? And that is just the standard library. Now for all the stuff on Pypy.. Stephan 2017-03-02 13:35 GMT+01:00 Chris Angelico : > > On Thu, Mar 2, 2017 at 11:22 PM, Stephan Houben wrote: > > Would this also apply if we provide or capture the keyword arguments using > > ** ? > > > > I.e. > > f(**{"x": NoDict}) > > > > (lambda **kw: kw)(x=NoDict) > > > > In that case I see a problem with this idiom: > > > > newdict = dict(**olddict) > > > > This would now start throwing errors in case any of the values of olddict > > was NoDefault. > > > > You shouldn't be returning NoDefault anywhere, though, so the only > problem is that the error is being reported in the wrong place. If > this were to become syntax, enhanced linters could track down exactly > where NoDefault came from, and report the error, because that's really > where the bug is. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Thu Mar 2 09:18:34 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 3 Mar 2017 01:18:34 +1100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On Fri, Mar 3, 2017 at 1:15 AM, Stephan Houben wrote: > I do not think such a magic linter can be written. > It seems an obvious instance of the Halting Problem to me. Yeah it can :) Static analysis is pretty impressive these days. Check out tools like Coverity, which can analyse your source code and tell you that, at this point in the code, it's possible for x to be >100 and y to have only 100 bytes of buffer, and then you index past a buffer. You could do the same to track down the origin of an object in Python. However, I think this is far from an ideal solution to the problem. ChrisA From steve at pearwood.info Thu Mar 2 09:24:14 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 3 Mar 2017 01:24:14 +1100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: <20170302142414.GF5689@ando.pearwood.info> On Thu, Mar 02, 2017 at 10:03:29AM +0200, Serhiy Storchaka wrote: > I propose to add a new syntax for optional parameters. If the argument > corresponding to the optional parameter without default value is not > specified, the parameter takes no value. As well as the "*" prefix means > "arbitrary number of positional parameters", the prefix "?" can mean > "single optional parameter". I like this! If the caller doesn't provide a value, the parameter remains unbound and any attempt to look it up will give a NameError or UnboundLocalError. The only question is, how often do we need a function with optional parameter that don't have a default? I've done it a few times, and used the sentinel trick, but I'm not sure this is common enough to need support from the compiler. It is a really clever solution though. > Alternative syntaxes: > > * "=" not followed by an expression: "def get(store, key, default=)". Too easy to happen by accident if you accidently forget to add the default, or delete it. > * The "del" keyword: "def get(store, key, del default)". This feature has nothing to do with del. -- Steve From stephanh42 at gmail.com Thu Mar 2 09:33:51 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 2 Mar 2017 15:33:51 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: So let's turn the question around: Since Coverity is user-extensible (and supports Python), can you write a Coverity rule which detects wrong use of some given NoDefault sentinel with a useful level of reliability? Actually I feel this should be feasible. (And if so, mission accomplished?) Stephan 2017-03-02 15:18 GMT+01:00 Chris Angelico : > On Fri, Mar 3, 2017 at 1:15 AM, Stephan Houben wrote: >> I do not think such a magic linter can be written. >> It seems an obvious instance of the Halting Problem to me. > > Yeah it can :) Static analysis is pretty impressive these days. Check > out tools like Coverity, which can analyse your source code and tell > you that, at this point in the code, it's possible for x to be >100 > and y to have only 100 bytes of buffer, and then you index past a > buffer. You could do the same to track down the origin of an object in > Python. > > However, I think this is far from an ideal solution to the problem. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Thu Mar 2 09:33:57 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Mar 2017 14:33:57 +0000 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <20170302142414.GF5689@ando.pearwood.info> References: <20170302142414.GF5689@ando.pearwood.info> Message-ID: On 2 March 2017 at 14:24, Steven D'Aprano wrote: > I like this! If the caller doesn't provide a value, the parameter > remains unbound and any attempt to look it up will give a NameError or > UnboundLocalError. Hmm. But those exceptions currently indicate with almost 100% certainty, a programming error (usually a mis-spelled name or a control flow error). The proposal makes them a normal runtime behaviour, in certain circumstances. What would happen if you mis-spelled the name of the optional parameter? You'd get a NameError from using the wrong name, rather than from the user not supplying a value. I don't think re-using NameError is a good idea here. Paul From ryan at ryanhiebert.com Thu Mar 2 09:44:25 2017 From: ryan at ryanhiebert.com (Ryan Hiebert) Date: Thu, 2 Mar 2017 08:44:25 -0600 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> <20170302032308.g4hsqdmhacycaqgn@jaerhard.com> <20170302134416.GD5689@ando.pearwood.info> Message-ID: <88E6A55B-86D9-4B6B-9D6B-EA586A14BA9C@ryanhiebert.com> By itself, I don't see using the ``in`` syntax to check for ``instanceof`` as a big benefit, given the overhead of learning that new concept. However, given in the light of a bigger concept, I think it may make more sense. If we accept that it may be desirable to work with types as set-like objects, apart from (in most cases) iteration, then some other shorthands become reasonable too. Other set operators, such as:: list <= object # issubclass(list, object) Plus the other obvious comparisons. Other set operators can be used for typing:: list | set # same or similar to typing.Union[list, set] mixin1 & mixin2 # Represents classes that inherit from mixin1 and mixin2 When we bring typing into it, it would be cool if those resultant values also were able to do instance and subclass checks. They currently raise an error, but I think it would be possible to do if this were all built into ``type``. And, of course, if we're bringing typing to ``type``, we can replace things like ``typing.List[int]`` with ``list[int]``, and putting those directly into signatures. I think it would be somewhat odd to bring in *some* of the set operators to types, but leave off ``__contains__`` and inequality comparisons. This is not a proposal of all of this, just pointing out that there are further applications to this concept. At least in the case of the typing examples, I think that there are some simplifications to be had by thinking this way. Reasons against also exist, of course, but like everything it's a trade-off to consider. I have a repo that starts some of this, though it's currently broken due to lack of love in the middle of a refactor. I haven't tried to add any typing stuff, but that seems like an obvious extension. I wouldn't expect type checkers to work with a 3rd party module like this, but it could help demonstrate the ideas. https://github.com/ryanhiebert/typeset If anyone is interested enough to help flesh out this proof of concept, I'd be grateful for some collaboration, or being let know of other work like it. From mal at egenix.com Thu Mar 2 10:04:39 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 2 Mar 2017 16:04:39 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: On 02.03.2017 14:08, Serhiy Storchaka wrote: > On 02.03.17 12:04, M.-A. Lemburg wrote: >> This is not new syntax, nor is it a keyword. It's only a >> new singleton and it is well usable outside of function >> declarations as well, e.g. for class attributes which are >> not yet initialized (and which can accept None as value). > > If it is not a keyword, it could be used in expressions, e.g. assigned > to a variable or passed to a function. It could be monkey-patched or > hidden by accident (as True and False in Python 2). Yes, sure. My proposal was just to address the problems of changing Python syntax and making it possible to define positional only arguments in Python functions/methods in a backwards compatible way. The same could be had by adding a C function proxy to Python which then takes care of the error handling, since we already have the logic for C functions via PyArg_ParseTuple*(). A decorator could then apply the proxy as needed or ignore this for older Python versions without breaking compatibility (or a PyPI extension could provide the same proxy logic for older versions). FWIW: I don't think the use case for positional only arguments to Python functions is strong enough to warrant breaking backwards compatibility by introducing new syntax. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From victor.stinner at gmail.com Thu Mar 2 10:36:55 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 2 Mar 2017 16:36:55 +0100 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: 2017-03-01 21:52 GMT+01:00 Terry Reedy : > + 1 also. When people write a Python equivalent of a built-in function for > documentation or teaching purposes, they should be able to exactly mimic the > API. Yeah, Serhiy said basically the same thing: it's doable, but complex without builtin support for positional-only arguments. I dislike subtle differences between C and Python, and positional-only is a major difference since it has a very visible effect on the API. After having used PHP for years, I really enjoyed Python keyword arguments and default values. I was very happy to not have to "reimplement" the "keyword arguments with default values" feature in each function (*). Basically, I would like the same thing for positional-only arguments :-) (*) Example (found on the Internet) of PHP code pattern for keywords, enjoy ;-) function doSomething($arguments = array()) { // set defaults $arguments = array_merge(array( "argument" => "default value", ), $arguments); var_dump($arguments); } Victor From victor.stinner at gmail.com Thu Mar 2 10:47:00 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 2 Mar 2017 16:47:00 +0100 Subject: [Python-ideas] Positional-only parameters In-Reply-To: <20170302132317.GC5689@ando.pearwood.info> References: <20170302132317.GC5689@ando.pearwood.info> Message-ID: 2017-03-02 14:23 GMT+01:00 Steven D'Aprano : >> Replace "replace(self, old, new, count=-1, /)" with "replace(self, >> old, new[, count=-1])" (or maybe even not document the default >> value?). > > That isn't right. It would have to be: > > replace([self, old, new, count=-1]) > > if all of the arguments are positional-only. But that makes it look like > they are all optional! A very strong -1 to this. > >> Python 3.5 help (docstring) uses "S.replace(old, new[, count])". > > Should be: > > S.replace(old, new[, count], /) > > which shows that all three arguments are positional only, but only count > is optional. Oh, I didn't notice the weird count parameter: positional-only, but no default value? I would prefer to avoid weird parameters and use a syntax which can be written in Python, like: def replace(self, old, new, /, count=-1): ... When a function has more than 3 parameters, I like the ability to pass arguments by keyword for readability: "xxx".replace("x", "y", count=2) It's more explicit than: "xxx".replace("x", "y", 2) By the way, I proposed once to convert open() parameters after filename and mode to keyword-only arguments, but Guido didn't want to break the backward compatibility ;-) open(filename, mode, *, buffering=-1, ...) Victor From jsbueno at python.org.br Thu Mar 2 11:13:53 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 2 Mar 2017 13:13:53 -0300 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: Is it just me that find that having the un-assigned parameter raise NameError (or other exception) much more cumbersome than havign a sentinel-value? I definitely don't find it clever - for one, a common default parameter - sentinel or not, can be replaced in a single line of code by an expression using "if" or "or", while the Exception raising variant require a whole try-except block. So, while I like the idea of simplifying the "sentinel idiom", I don't find any suggestion here useful so far. Maybe something to the stlib that would allow something along: from paramlib import NoDefault, passed def myfunc(a, b, count=NoDefault): if not passed(count): ... else: ... That would simplify a bit the sentinel pattern, do not pollute the namespace, and don't need any new syntax, (and still allow "if" expressions without a full try/except block) On 2 March 2017 at 12:04, M.-A. Lemburg wrote: > On 02.03.2017 14:08, Serhiy Storchaka wrote: >> On 02.03.17 12:04, M.-A. Lemburg wrote: >>> This is not new syntax, nor is it a keyword. It's only a >>> new singleton and it is well usable outside of function >>> declarations as well, e.g. for class attributes which are >>> not yet initialized (and which can accept None as value). >> >> If it is not a keyword, it could be used in expressions, e.g. assigned >> to a variable or passed to a function. It could be monkey-patched or >> hidden by accident (as True and False in Python 2). > > Yes, sure. > > My proposal was just to address the problems of > changing Python syntax and making it possible to define > positional only arguments in Python functions/methods in > a backwards compatible way. > > The same could be had by adding a C function proxy to Python > which then takes care of the error handling, since we already > have the logic for C functions via PyArg_ParseTuple*(). > A decorator could then apply the proxy as needed or ignore this > for older Python versions without breaking compatibility > (or a PyPI extension could provide the same proxy logic for > older versions). > > FWIW: I don't think the use case for positional only arguments > to Python functions is strong enough to warrant breaking > backwards compatibility by introducing new syntax. > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Experts (#1, Mar 02 2017) >>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>>> Python Database Interfaces ... http://products.egenix.com/ >>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ > ________________________________________________________________________ > > ::: We implement business ideas - efficiently in both time and costs ::: > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > http://www.malemburg.com/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ethan at stoneleaf.us Thu Mar 2 11:57:21 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 02 Mar 2017 08:57:21 -0800 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> Message-ID: <58B84EF1.4020405@stoneleaf.us> On 03/02/2017 08:13 AM, Joao S. O. Bueno wrote: > Is it just me that find that having the un-assigned parameter raise > NameError (or other exception) much more cumbersome than > havign a sentinel-value? No. While clever, the hassle of figuring out if you have a parameter clearly outweighs the benefit of avoiding a sentinel value. It would be a different story if we had exception-catching expressions. ;) -- ~Ethan~ From ethan at stoneleaf.us Thu Mar 2 12:08:26 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 02 Mar 2017 09:08:26 -0800 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> Message-ID: <58B8518A.5040408@stoneleaf.us> -1. It is already possible to specify what inst in cls means by using a metaclass. For example: class Color(enum.Enum): RED = 1 GREEN = 2 BLUE = 3 some_var = Color.GREEN some_var in Color # True some_var in enum.Enum # False Containment != isinstance() -- ~Ethan~ From brett at python.org Thu Mar 2 13:05:21 2017 From: brett at python.org (Brett Cannon) Date: Thu, 02 Mar 2017 18:05:21 +0000 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On Thu, 2 Mar 2017 at 03:07 Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: [SNIP] > As always though, reality can be expected to be quite a bit more > complicated than theory so I decided to check the stdlib for real uses > of break. This is quite a tedious task since break is used in many > different ways and I couldn't come up with a good automated way of > classifying them. So what I did is just go through stdlib code (in > reverse alphabetical order) containing the break keyword and put it into > categories manually. I only got up to socket.py before losing my > enthusiasm, but here's what I found: > > - overall I looked at 114 code blocks that contain one or more breaks > I wanted to say thanks for taking the time to go through the stdlib and doing such a thorough analysis of the impact of your suggestion! It always helps to have real-world numbers to know whether an idea will be useful (or not). > > - 84 of these are trivial use cases that simply break out of a while > True block or terminate a while/for loop prematurely (no use for any > follow-up clause there) > > - 8 more are causing a side-effect before a single break, and it would > be pointless to put this into an except break clause > > - 3 more cause different, non-redundant side-effects before different > breaks from the same loop and, obviously, an except break clause would > not help them either > > => So the vast majority of breaks does *not* need an except break *nor* > an else clause, but that's just as expected. > > > Of the remaining 19 non-trivial cases > > - 9 are variations of your classical search idiom above, i.e., there's > an else clause there and nothing more is needed > > - 6 are variations of your "nested side-effects" form presented above > with debatable (see above) benefit from except break > > - 2 do not use an else clause currently, but have multiple breaks that > do partly redundant things that could be combined in a single except > break clause > > - 1 is an example of breaking out of two loops; from sre_parse._parse_sub: > > [...] > # check if all items share a common prefix > while True: > prefix = None > for item in items: > if not item: > break > if prefix is None: > prefix = item[0] > elif item[0] != prefix: > break > else: > # all subitems start with a common "prefix". > # move it out of the branch > for item in items: > del item[0] > subpatternappend(prefix) > continue # check next one > break > [...] > > This could have been written as: > > [...] > # check if all items share a common prefix > while True: > prefix = None > for item in items: > if not item: > break > if prefix is None: > prefix = item[0] > elif item[0] != prefix: > break > except break: > break > > # all subitems start with a common "prefix". > # move it out of the branch > for item in items: > del item[0] > subpatternappend(prefix) > [...] > > > - finally, 1 is a complicated break dance to achieve sth that clearly > would have been easier with except break; from typing.py: > > [...] > def __subclasscheck__(self, cls): > if cls is Any: > return True > if isinstance(cls, GenericMeta): > # For a class C(Generic[T]) where T is co-variant, > # C[X] is a subclass of C[Y] iff X is a subclass of Y. > origin = self.__origin__ > if origin is not None and origin is cls.__origin__: > assert len(self.__args__) == len(origin.__parameters__) > assert len(cls.__args__) == len(origin.__parameters__) > for p_self, p_cls, p_origin in zip(self.__args__, > cls.__args__, > origin.__parameters__): > if isinstance(p_origin, TypeVar): > if p_origin.__covariant__: > # Covariant -- p_cls must be a subclass of > p_self. > if not issubclass(p_cls, p_self): > break > elif p_origin.__contravariant__: > # Contravariant. I think it's the > opposite. :-) > if not issubclass(p_self, p_cls): > break > else: > # Invariant -- p_cls and p_self must equal. > if p_self != p_cls: > break > else: > # If the origin's parameter is not a typevar, > # insist on invariance. > if p_self != p_cls: > break > else: > return True > # If we break out of the loop, the superclass gets a > chance. > if super().__subclasscheck__(cls): > return True > if self.__extra__ is None or isinstance(cls, GenericMeta): > return False > return issubclass(cls, self.__extra__) > [...] > > which could be rewritten as: > > [...] > def __subclasscheck__(self, cls): > if cls is Any: > return True > if isinstance(cls, GenericMeta): > # For a class C(Generic[T]) where T is co-variant, > # C[X] is a subclass of C[Y] iff X is a subclass of Y. > origin = self.__origin__ > if origin is not None and origin is cls.__origin__: > assert len(self.__args__) == len(origin.__parameters__) > assert len(cls.__args__) == len(origin.__parameters__) > for p_self, p_cls, p_origin in zip(self.__args__, > cls.__args__, > origin.__parameters__): > if isinstance(p_origin, TypeVar): > if p_origin.__covariant__: > # Covariant -- p_cls must be a subclass of > p_self. > if not issubclass(p_cls, p_self): > break > elif p_origin.__contravariant__: > # Contravariant. I think it's the > opposite. :-) > if not issubclass(p_self, p_cls): > break > else: > # Invariant -- p_cls and p_self must equal. > if p_self != p_cls: > break > else: > # If the origin's parameter is not a typevar, > # insist on invariance. > if p_self != p_cls: > break > except break: > # If we break out of the loop, the superclass gets > a chance. > if super().__subclasscheck__(cls): > return True > if self.__extra__ is None or isinstance(cls, > GenericMeta): > return False > return issubclass(cls, self.__extra__) > > return True > [...] > > > My summary: I do see use-cases for the except break clause, but, > admittedly, they are relatively rare and may be not worth the hassle of > introducing new syntax. > IOW out of 114 cases, 4 may benefit from an 'except' block? If I'm reading those numbers correctly then ~3.5% of cases would benefit which isn't high enough to add the syntax and related complexity IMO. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Mar 2 13:37:18 2017 From: brett at python.org (Brett Cannon) Date: Thu, 02 Mar 2017 18:37:18 +0000 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <58B84EF1.4020405@stoneleaf.us> References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> <58B84EF1.4020405@stoneleaf.us> Message-ID: On Thu, 2 Mar 2017 at 08:58 Ethan Furman wrote: > On 03/02/2017 08:13 AM, Joao S. O. Bueno wrote: > > > Is it just me that find that having the un-assigned parameter raise > > NameError (or other exception) much more cumbersome than > > havign a sentinel-value? > > No. While clever, the hassle of figuring out if you have a parameter > clearly outweighs the benefit of avoiding a > sentinel value. > > It would be a different story if we had exception-catching expressions. ;) > I don't like the NameError solution either. What I would like to know is how common is this problem? That will help frame whether this warrants syntax or just providing a sentinel in some module in the stdlib that people can use (e.g. functools.NotGiven; I do prefer MAL's naming of the sentinel). Sticking it into a module would help minimize people from using it in places where None is entirely acceptable and not confusing the whole community when people suddenly start peppering their code with NotGiven instead of None for default values. And if this is really common enough to warrant syntax, then I would want: def foo(a, b, opt?): ... to represent that 'opt' is optional and if not provided by the user then it is given the value of NotGiven (or None if we are just after a syntactic shortcut to say "this argument is optional"). So to me, there's actually two things being discussed. Do we need another sentinel to handle the "None is valid" case, and do we want syntax to more clearly delineate optional arguments? -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Thu Mar 2 13:16:32 2017 From: brett at python.org (Brett Cannon) Date: Thu, 02 Mar 2017 18:16:32 +0000 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: It seems all the core devs who have commented on this are in the positive (Victor, Yury, Ethan, Yury, Guido, Terry, and Steven; MAL didn't explicitly vote). So to me that suggests there's enough support to warrant writing a PEP. Are you up for writing it, Victor, or is someone else going to write it? On Tue, 28 Feb 2017 at 13:18 Victor Stinner wrote: > Hi, > > For technical reasons, many functions of the Python standard libraries > implemented in C have positional-only parameters. Example: > ------- > $ ./python > Python 3.7.0a0 (default, Feb 25 2017, 04:30:32) > >>> help(str.replace) > replace(self, old, new, count=-1, /) # <== notice "/" at the end > ... > >>> "a".replace("x", "y") # ok > 'a' > > >>> "a".replace(old="x", new="y") # ERR! > TypeError: replace() takes at least 2 arguments (0 given) > ------- > > When converting the methods of the builtin str type to the internal > "Argument Clinic" tool (tool to generate the function signature, > function docstring and the code to parse arguments in C), I asked if > we should add support for keyword arguments in str.replace(). The > answer was quick: no! It's a deliberate design choice. > > Quote of Yury Selivanov's message: > """ > I think Guido explicitly stated that he doesn't like the idea to > always allow keyword arguments for all methods. I.e. `str.find('aaa')` > just reads better than `str.find(needle='aaa')`. Essentially, the idea > is that for most of the builtins that accept one or two arguments, > positional-only parameters are better. > """ > http://bugs.python.org/issue29286#msg285578 > > I just noticed a module on PyPI to implement this behaviour on Python > functions: > > https://pypi.python.org/pypi/positional > > My question is: would it make sense to implement this feature in > Python directly? If yes, what should be the syntax? Use "/" marker? > Use the @positional() decorator? > > Do you see concrete cases where it's a deliberate choice to deny > passing arguments as keywords? > > Don't you like writing int(x="123") instead of int("123")? :-) (I know > that Serhiy Storshake hates the name of the "x" parameter of the int > constructor ;-)) > > By the way, I read that "/" marker is unknown by almost all Python > developers, and [...] syntax should be preferred, but > inspect.signature() doesn't support this syntax. Maybe we should fix > signature() and use [...] format instead? > > Replace "replace(self, old, new, count=-1, /)" with "replace(self, > old, new[, count=-1])" (or maybe even not document the default > value?). > > Python 3.5 help (docstring) uses "S.replace(old, new[, count])". > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Thu Mar 2 14:01:56 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 2 Mar 2017 16:01:56 -0300 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 1 March 2017 at 06:37, Wolfgang Maier wrote: > Now here's the proposal: allow an except (or except break) clause to follow > for/while loops that will be executed if the loop was terminated by a break > statement. After rethinking over some code I've written in the past, yes, I agree this change could be a nice one. The simple fact that people are commenting that they could chage the code to be an inner function in order to break from nested "for" loops should be a hint this syntax is useful. (I myself have done that with exceptions in some cases). js -><- From srkunze at mail.de Thu Mar 2 14:56:10 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 2 Mar 2017 20:56:10 +0100 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: <45729ca3-f684-89a4-538f-fbcec6a0b16b@mail.de> Isn't https://www.python.org/dev/peps/pep-0457/ the PEP you are looking for? On 02.03.2017 19:16, Brett Cannon wrote: > It seems all the core devs who have commented on this are in the > positive (Victor, Yury, Ethan, Yury, Guido, Terry, and Steven; MAL > didn't explicitly vote). So to me that suggests there's enough support > to warrant writing a PEP. Are you up for writing it, Victor, or is > someone else going to write it? > > On Tue, 28 Feb 2017 at 13:18 Victor Stinner > wrote: > > Hi, > > For technical reasons, many functions of the Python standard libraries > implemented in C have positional-only parameters. Example: > ------- > $ ./python > Python 3.7.0a0 (default, Feb 25 2017, 04:30:32) > >>> help(str.replace) > replace(self, old, new, count=-1, /) # <== notice "/" at the end > ... > >>> "a".replace("x", "y") # ok > 'a' > > >>> "a".replace(old="x", new="y") # ERR! > TypeError: replace() takes at least 2 arguments (0 given) > ------- > > When converting the methods of the builtin str type to the internal > "Argument Clinic" tool (tool to generate the function signature, > function docstring and the code to parse arguments in C), I asked if > we should add support for keyword arguments in str.replace(). The > answer was quick: no! It's a deliberate design choice. > > Quote of Yury Selivanov's message: > """ > I think Guido explicitly stated that he doesn't like the idea to > always allow keyword arguments for all methods. I.e. `str.find('aaa')` > just reads better than `str.find(needle='aaa')`. Essentially, the idea > is that for most of the builtins that accept one or two arguments, > positional-only parameters are better. > """ > http://bugs.python.org/issue29286#msg285578 > > I just noticed a module on PyPI to implement this behaviour on > Python functions: > > https://pypi.python.org/pypi/positional > > My question is: would it make sense to implement this feature in > Python directly? If yes, what should be the syntax? Use "/" marker? > Use the @positional() decorator? > > Do you see concrete cases where it's a deliberate choice to deny > passing arguments as keywords? > > Don't you like writing int(x="123") instead of int("123")? :-) (I know > that Serhiy Storshake hates the name of the "x" parameter of the int > constructor ;-)) > > By the way, I read that "/" marker is unknown by almost all Python > developers, and [...] syntax should be preferred, but > inspect.signature() doesn't support this syntax. Maybe we should fix > signature() and use [...] format instead? > > Replace "replace(self, old, new, count=-1, /)" with "replace(self, > old, new[, count=-1])" (or maybe even not document the default > value?). > > Python 3.5 help (docstring) uses "S.replace(old, new[, count])". > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Mar 2 15:07:52 2017 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 2 Mar 2017 20:07:52 +0000 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: <7cc404b1-9b4d-b5da-c4f6-787d84fb6648@mrabarnett.plus.com> On 2017-03-02 08:03, Serhiy Storchaka wrote: > Function implemented in Python can have optional parameters with default > value. It also can accept arbitrary number of positional and keyword > arguments if use var-positional or var-keyword parameters (*args and > **kwargs). But there is no way to declare an optional parameter that > don't have default value. Currently you need to use the sentinel idiom > for implementing this: > > _sentinel = object() > def get(store, key, default=_sentinel): > if store.exists(key): > return store.retrieve(key) > if default is _sentinel: > raise LookupError > else: > return default > > There are drawback of this: > > * Module's namespace is polluted with sentinel's variables. > > * You need to check for the sentinel before passing it to other function > by accident. > > * Possible name conflicts between sentinels for different functions of > the same module. > > * Since the sentinel is accessible outside of the function, it possible > to pass it to the function. > > * help() of the function shows reprs of default values. "foo(bar= object at 0xb713c698>)" looks ugly. > > > I propose to add a new syntax for optional parameters. If the argument > corresponding to the optional parameter without default value is not > specified, the parameter takes no value. As well as the "*" prefix means > "arbitrary number of positional parameters", the prefix "?" can mean > "single optional parameter". > > Example: > > def get(store, key, ?default): > if store.exists(key): > return store.retrieve(key) > try: > return default > except NameError: > raise LookupError > > Alternative syntaxes: > > * "=" not followed by an expression: "def get(store, key, default=)". > > * The "del" keyword: "def get(store, key, del default)". > > This feature is orthogonal to supporting positional-only parameters. > Optional parameters without default value can be positional-or-keyword, > keyword-only or positional-only (if the latter is implemented). > Could you use 'pass' as the pseudo-sentinel? Maybe also allow " is pass"/" is not pass" as tests for absence/presence. From abedillon at gmail.com Thu Mar 2 17:58:54 2017 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 2 Mar 2017 16:58:54 -0600 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <7cc404b1-9b4d-b5da-c4f6-787d84fb6648@mrabarnett.plus.com> References: <7cc404b1-9b4d-b5da-c4f6-787d84fb6648@mrabarnett.plus.com> Message-ID: I honestly don't understand the reasoning behind using anything more complex than a built-in sentinel value. Just plop "NotGiven" or whatever in the built-ins and say "it's like None, but for the specific case of optional parameters with no default value". Why prohibit people from passing it to functions? That would just be an explicit way of saying: "I'm not giving you a value for this parameter". Anything more than that is just paranoia that people won't know how to use it in an expected manner. I'm -0.5 on this proposal. It seems like it would add more confusion for dubious benefit. On Thu, Mar 2, 2017 at 2:07 PM, MRAB wrote: > On 2017-03-02 08:03, Serhiy Storchaka wrote: > >> Function implemented in Python can have optional parameters with default >> value. It also can accept arbitrary number of positional and keyword >> arguments if use var-positional or var-keyword parameters (*args and >> **kwargs). But there is no way to declare an optional parameter that >> don't have default value. Currently you need to use the sentinel idiom >> for implementing this: >> >> _sentinel = object() >> def get(store, key, default=_sentinel): >> if store.exists(key): >> return store.retrieve(key) >> if default is _sentinel: >> raise LookupError >> else: >> return default >> >> There are drawback of this: >> >> * Module's namespace is polluted with sentinel's variables. >> >> * You need to check for the sentinel before passing it to other function >> by accident. >> >> * Possible name conflicts between sentinels for different functions of >> the same module. >> >> * Since the sentinel is accessible outside of the function, it possible >> to pass it to the function. >> >> * help() of the function shows reprs of default values. "foo(bar=> object at 0xb713c698>)" looks ugly. >> >> >> I propose to add a new syntax for optional parameters. If the argument >> corresponding to the optional parameter without default value is not >> specified, the parameter takes no value. As well as the "*" prefix means >> "arbitrary number of positional parameters", the prefix "?" can mean >> "single optional parameter". >> >> Example: >> >> def get(store, key, ?default): >> if store.exists(key): >> return store.retrieve(key) >> try: >> return default >> except NameError: >> raise LookupError >> >> Alternative syntaxes: >> >> * "=" not followed by an expression: "def get(store, key, default=)". >> >> * The "del" keyword: "def get(store, key, del default)". >> >> This feature is orthogonal to supporting positional-only parameters. >> Optional parameters without default value can be positional-or-keyword, >> keyword-only or positional-only (if the latter is implemented). >> >> Could you use 'pass' as the pseudo-sentinel? > > Maybe also allow " is pass"/" is not pass" as tests for > absence/presence. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Mar 2 18:00:24 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 2 Mar 2017 18:00:24 -0500 Subject: [Python-ideas] Optional parameters without default value References: Message-ID: <20170302180024.4091f497@subdivisions.wooz.org> On Mar 02, 2017, at 10:03 AM, Serhiy Storchaka wrote: >Currently you need to use the sentinel idiom for implementing this: > >_sentinel = object() >def get(store, key, default=_sentinel): > if store.exists(key): > return store.retrieve(key) > if default is _sentinel: > raise LookupError > else: > return default The reason for using a special sentinel here is to ensure that there's no way (other than deliberate subterfuge) for code using this API to pass that sentinel in. Normally, None is just fine, but for some cases None is a possible legitimate value, so it won't do as a sentinel. Thus you make one up that you know will mean "wasn't given". A classic example is dict.get(): missing = object() if d.get('key', missing) is missing: its_definitely_not_in_the_dictionary() i.e. because it's possible for d['key'] == None. I don't think this use case is common enough for special syntax, or a keyword. I'm -0 for adding a new built-in because while it might serve your purposes, it's easier to commit subterfuge. >>> get(store, key, default=NoDefault) # Whoop! or if d.get('key', NoDefault) is NoDefault: hopefully_its_not_in_the_dictionary() Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From barry at python.org Thu Mar 2 18:02:17 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 2 Mar 2017 18:02:17 -0500 Subject: [Python-ideas] Optional parameters without default value References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> <58B84EF1.4020405@stoneleaf.us> Message-ID: <20170302180217.03cf6df5@subdivisions.wooz.org> On Mar 02, 2017, at 06:37 PM, Brett Cannon wrote: >So to me, there's actually two things being discussed. Do we need another >sentinel to handle the "None is valid" case, and do we want syntax to more >clearly delineate optional arguments? No, and no (IMHO). -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From abedillon at gmail.com Thu Mar 2 18:10:07 2017 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 2 Mar 2017 17:10:07 -0600 Subject: [Python-ideas] lazy use for optional import In-Reply-To: References: Message-ID: I really think the whole "lazy" idea is misguided. If it's possible for the interpreter to determine automatically when it needs to force evaluation of a lazy expression or statement, then why not make *all* expressions and statements lazy by default? I think it's pretty clear when to force evaluation: 1) when the result is used in a control flow statement/expression 2) when the result is output (file, network, or other I/O) and 3) evaluate all pending lazy code before releasing the GIL. At that point, why not make lazy evaluation an implicit feature of the language, like the garbage collector. On Wed, Mar 1, 2017 at 8:58 PM, Chris Barker wrote: > Going through machinations to satisfy PEP 8 makes no sense -- it's s style > *guide* -- that's it. > > -CHB > > > On Tue, Feb 28, 2017 at 3:31 PM, Nicolas Cellier < > contact at nicolas-cellier.net> wrote: > >> I have seen some interest into lazy functionality implementation. >> >> I wondered if it can be linked with optional import. >> >> PEP 8 authoritatively states: >> >> Imports are always put at the top of the file, just after any module >> comments and docstrings, and before module globals and constants. >> >> So, if we want to stick to PEP8 with non mandatory import, we have to >> catch the import errors, or jail the class or function using extra >> functionnality. >> >> Why not using the potential lazy keyword to have a nice way to deal with >> it? >> >> For example: >> >> lazy import pylab as pl # do nothing for now >>> >>> # do stuff >>> >>> def plot(*args): >>> pl.figure() # Will raise an ImportError at this point >>> pl.plot(...) >>> >> >> That way, our library will raise an ImportError only on plot func usage >> with an explicit traceback : if matplotlib is not installed, we will have >> the line where it is used for the first time and we will have the name of >> the faulty library. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Thu Mar 2 18:42:31 2017 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 2 Mar 2017 17:42:31 -0600 Subject: [Python-ideas] Delayed Execution via Keyword In-Reply-To: References: <159901d28b6b$0aef9280$20ceb780$@hotmail.com> Message-ID: I'm going to repeat here what I posted in the thread on lazy imports. If it's possible for the interpreter to determine when it needs to force evaluation of a lazy expression or statement, then why not use them everywhere? If that's the case, then why not make everything lazy by default? Why not make it a service of the language to lazify your code (analogous to garbage collection) so a human doesn't have to worry about screwing it up? There are, AFAIK, three things that *must* force evaluation of lazy expressions or statements: 1) Before the GIL is released, all pending lazy code must be evaluated since the current thread can't know what variables another thread will try to access (unless there's a way to explicitly label variables as "shared", then it will only force evaluation of those). 2) Branching statements force evaluation of anything required to evaluate the conditional clause. 3) I/O forces evaluation of any involved lazy expressions. On Mon, Feb 20, 2017 at 7:07 PM, Joshua Morton wrote: > This comes from a bit of a misunderstanding of how an interpreter figures > out what needs to be compiled. Most (all?) JIT compilers run code in an > interpreted manner, and then compile subsections down to efficient machine > code when they notice that the same code path is taken repeatedly, so in > pypy something like > > x = 0 > for i in range(100000): > x += 1 > > would, get, after 10-20 runs through the loop, turned into assembly that > looked like what you'd write in pure C, instead of the very indirection and > pointer heavy code that such a loop would be if you could take it and > convert it to cpython actually executes, for example. So the "hot" code is > still run. > > All that said, this is a bit of an off topic discussion and probably > shouldn't be on list. > > What you really do want is functional purity, which is a different concept > and one that python as a language can't easily provide no matter what. > > --Josh > > On Mon, Feb 20, 2017 at 7:53 PM Abe Dillon wrote: > >> On Fri, Feb 17, 2017, Steven D'Aprano wrote: >> >> JIT compilation delays *compiling* the code to run-time. This is a >> proposal for delaying *running* the code until such time as some other >> piece of code actually needs the result. >> >> >> My thought was that if a compiler is capable of determining what needs to >> be compiled just in time, then an interpreter might be able to determine >> what expressions need to be evaluated just when their results are actually >> used. >> >> So if you had code that looked like: >> >> >>> log.debug("data: %s", expensive()) >> >> The interpreter could skip evaluating the expensive function if the >> result is never used. It would only evaluate it "just in time". This would >> almost certainly require just in time compilation as well, otherwise the >> byte code that calls the "log.debug" function would be unaware of the byte >> code that implements the function. >> >> This is probably a pipe-dream, though; because the interpreter would have >> to be aware of side effects. >> >> >> >> On Mon, Feb 20, 2017 at 5:18 AM, wrote: >> >> >> >> > -----Original Message----- >> > From: Python-ideas [mailto:python-ideas-bounces+tritium- >> > list=sdamon.com at python.org] On Behalf Of Michel Desmoulin >> > Sent: Monday, February 20, 2017 3:30 AM >> > To: python-ideas at python.org >> > Subject: Re: [Python-ideas] Delayed Execution via Keyword >> > >> > I wrote a blog post about this, and someone asked me if it meant >> > allowing lazy imports to make optional imports easier. >> > >> > Someting like: >> > >> > lazy import foo >> > lazy from foo import bar >> > >> > So now if I don't use the imports, the module is not loaded, which could >> > also significantly speed up applications starting time with a lot of >> > imports. >> >> Would that not also make a failure to import an error at the time of >> executing the imported piece of code rather than at the place of import? >> And how would optional imports work if they are not loaded until use? >> Right >> now, optional imports are done by wrapping the import statement in a >> try/except, would you not need to do that handling everywhere the imported >> object is used instead? >> >> (I haven't been following the entire thread, and I don't know if this is a >> forest/tress argument) >> >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Mar 2 18:52:07 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 3 Mar 2017 10:52:07 +1100 Subject: [Python-ideas] lazy use for optional import In-Reply-To: References: Message-ID: On Fri, Mar 3, 2017 at 10:10 AM, Abe Dillon wrote: > I really think the whole "lazy" idea is misguided. If it's possible for the > interpreter to determine automatically when it needs to force evaluation of > a lazy expression or statement, then why not make *all* expressions and > statements lazy by default? I think it's pretty clear when to force > evaluation: 1) when the result is used in a control flow > statement/expression 2) when the result is output (file, network, or other > I/O) and 3) evaluate all pending lazy code before releasing the GIL. At > that point, why not make lazy evaluation an implicit feature of the > language, like the garbage collector. 4) When the evaluation will have side effects. Making everything lazy is fine when you can guarantee that there's no visible effect (modulo performance) of evaluating things in a different order, or not evaluating some of them at all. In Python, that can't be guaranteed, so universal laziness is way too dangerous. Think of all the problems people have with getting their heads around multithreading, and consider that this is basically going to take your code and turn it into a bunch of threads, and join on those threads only when there's a reason to. Debugging becomes highly non-deterministic, because adding a quick little 'print' call to see what's going on might force evaluation a little sooner, which means X happens before Y, but without that print, Y happens before X... aeons of fun. No, if laziness is added to Python, it *must* be under programmer control. ChrisA From joejev at gmail.com Thu Mar 2 18:53:51 2017 From: joejev at gmail.com (Joseph Jevnik) Date: Thu, 2 Mar 2017 18:53:51 -0500 Subject: [Python-ideas] Delayed Execution via Keyword In-Reply-To: References: <159901d28b6b$0aef9280$20ceb780$@hotmail.com> Message-ID: Other things that scrutinize an expression are iteration or branching (with the current evaluation model). If `xs` is a thunk, then `for x in xs` must scrutinize `xs`. At first this doesn't seem required; however, in general `next` imposes a data dependency on the next call to `next`. For example: x0 = next(xs) x1 = next(xs) print(x1) print(x0) If `next` doesn't force computation then evaluating `x1` before `x0` will bind `x1` to `xs[0]` which is not what the eager version of the code does. To preserve the current semantics of the language you cannot defer arbitrary expressions because they may have observable side-effects. Automatically translating would require knowing ahead of time if a function can have observable side effects, but that is not possible in Python. Because it is impossible to tell in the general case, we must rely on the user to tell us when it is safe to defer an expression. On Thu, Mar 2, 2017 at 6:42 PM, Abe Dillon wrote: > I'm going to repeat here what I posted in the thread on lazy imports. > If it's possible for the interpreter to determine when it needs to force > evaluation of a lazy expression or statement, then why not use them > everywhere? If that's the case, then why not make everything lazy by > default? Why not make it a service of the language to lazify your code > (analogous to garbage collection) so a human doesn't have to worry about > screwing it up? > > There are, AFAIK, three things that *must* force evaluation of lazy > expressions or statements: > > 1) Before the GIL is released, all pending lazy code must be evaluated > since the current thread can't know what variables another thread will try > to access (unless there's a way to explicitly label variables as "shared", > then it will only force evaluation of those). > > 2) Branching statements force evaluation of anything required to evaluate > the conditional clause. > > 3) I/O forces evaluation of any involved lazy expressions. > > > On Mon, Feb 20, 2017 at 7:07 PM, Joshua Morton > wrote: > >> This comes from a bit of a misunderstanding of how an interpreter figures >> out what needs to be compiled. Most (all?) JIT compilers run code in an >> interpreted manner, and then compile subsections down to efficient machine >> code when they notice that the same code path is taken repeatedly, so in >> pypy something like >> >> x = 0 >> for i in range(100000): >> x += 1 >> >> would, get, after 10-20 runs through the loop, turned into assembly that >> looked like what you'd write in pure C, instead of the very indirection and >> pointer heavy code that such a loop would be if you could take it and >> convert it to cpython actually executes, for example. So the "hot" code is >> still run. >> >> All that said, this is a bit of an off topic discussion and probably >> shouldn't be on list. >> >> What you really do want is functional purity, which is a different >> concept and one that python as a language can't easily provide no matter >> what. >> >> --Josh >> >> On Mon, Feb 20, 2017 at 7:53 PM Abe Dillon wrote: >> >>> On Fri, Feb 17, 2017, Steven D'Aprano wrote: >>> >>> JIT compilation delays *compiling* the code to run-time. This is a >>> proposal for delaying *running* the code until such time as some other >>> piece of code actually needs the result. >>> >>> >>> My thought was that if a compiler is capable of determining what needs >>> to be compiled just in time, then an interpreter might be able to determine >>> what expressions need to be evaluated just when their results are actually >>> used. >>> >>> So if you had code that looked like: >>> >>> >>> log.debug("data: %s", expensive()) >>> >>> The interpreter could skip evaluating the expensive function if the >>> result is never used. It would only evaluate it "just in time". This would >>> almost certainly require just in time compilation as well, otherwise the >>> byte code that calls the "log.debug" function would be unaware of the byte >>> code that implements the function. >>> >>> This is probably a pipe-dream, though; because the interpreter would >>> have to be aware of side effects. >>> >>> >>> >>> On Mon, Feb 20, 2017 at 5:18 AM, wrote: >>> >>> >>> >>> > -----Original Message----- >>> > From: Python-ideas [mailto:python-ideas-bounces+tritium- >>> > list=sdamon.com at python.org] On Behalf Of Michel Desmoulin >>> > Sent: Monday, February 20, 2017 3:30 AM >>> > To: python-ideas at python.org >>> > Subject: Re: [Python-ideas] Delayed Execution via Keyword >>> > >>> > I wrote a blog post about this, and someone asked me if it meant >>> > allowing lazy imports to make optional imports easier. >>> > >>> > Someting like: >>> > >>> > lazy import foo >>> > lazy from foo import bar >>> > >>> > So now if I don't use the imports, the module is not loaded, which >>> could >>> > also significantly speed up applications starting time with a lot of >>> > imports. >>> >>> Would that not also make a failure to import an error at the time of >>> executing the imported piece of code rather than at the place of import? >>> And how would optional imports work if they are not loaded until use? >>> Right >>> now, optional imports are done by wrapping the import statement in a >>> try/except, would you not need to do that handling everywhere the >>> imported >>> object is used instead? >>> >>> (I haven't been following the entire thread, and I don't know if this is >>> a >>> forest/tress argument) >>> >>> > _______________________________________________ >>> > Python-ideas mailing list >>> > Python-ideas at python.org >>> > https://mail.python.org/mailman/listinfo/python-ideas >>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Mar 2 19:53:53 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 3 Mar 2017 01:53:53 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: I dislike the try/except NameError test to chevk if the parameter is set. Catching NameError is slow. What is the root issue? Function signature in help(func)? If yes, the solution can be a special value understood by inspect.signature(). Should it be possible to pass explicitly the special value? func(optional=special_value)? I think it's ok to allow it. The expected behaviour is to behave as func(). Victor Le 2 mars 2017 9:04 AM, "Serhiy Storchaka" a ?crit : Function implemented in Python can have optional parameters with default value. It also can accept arbitrary number of positional and keyword arguments if use var-positional or var-keyword parameters (*args and **kwargs). But there is no way to declare an optional parameter that don't have default value. Currently you need to use the sentinel idiom for implementing this: _sentinel = object() def get(store, key, default=_sentinel): if store.exists(key): return store.retrieve(key) if default is _sentinel: raise LookupError else: return default There are drawback of this: * Module's namespace is polluted with sentinel's variables. * You need to check for the sentinel before passing it to other function by accident. * Possible name conflicts between sentinels for different functions of the same module. * Since the sentinel is accessible outside of the function, it possible to pass it to the function. * help() of the function shows reprs of default values. "foo(bar=)" looks ugly. I propose to add a new syntax for optional parameters. If the argument corresponding to the optional parameter without default value is not specified, the parameter takes no value. As well as the "*" prefix means "arbitrary number of positional parameters", the prefix "?" can mean "single optional parameter". Example: def get(store, key, ?default): if store.exists(key): return store.retrieve(key) try: return default except NameError: raise LookupError Alternative syntaxes: * "=" not followed by an expression: "def get(store, key, default=)". * The "del" keyword: "def get(store, key, del default)". This feature is orthogonal to supporting positional-only parameters. Optional parameters without default value can be positional-or-keyword, keyword-only or positional-only (if the latter is implemented). _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Mar 2 19:58:29 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 3 Mar 2017 01:58:29 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> Message-ID: In my code, I commonly use a NOT_SET singleton used as default value. I like this name for the test: if arg is NOT_SET: ... ;-) I use that when I want to behave differently when None is passed. And yes, I have such code. Victor Le 2 mars 2017 9:36 AM, "M.-A. Lemburg" a ?crit : On 02.03.2017 09:03, Serhiy Storchaka wrote: > Function implemented in Python can have optional parameters with default > value. It also can accept arbitrary number of positional and keyword > arguments if use var-positional or var-keyword parameters (*args and > **kwargs). But there is no way to declare an optional parameter that > don't have default value. Currently you need to use the sentinel idiom > for implementing this: > > _sentinel = object() > def get(store, key, default=_sentinel): > if store.exists(key): > return store.retrieve(key) > if default is _sentinel: > raise LookupError > else: > return default > > There are drawback of this: > > * Module's namespace is polluted with sentinel's variables. > > * You need to check for the sentinel before passing it to other function > by accident. > > * Possible name conflicts between sentinels for different functions of > the same module. > > * Since the sentinel is accessible outside of the function, it possible > to pass it to the function. > > * help() of the function shows reprs of default values. "foo(bar= object at 0xb713c698>)" looks ugly. > > > I propose to add a new syntax for optional parameters. If the argument > corresponding to the optional parameter without default value is not > specified, the parameter takes no value. As well as the "*" prefix means > "arbitrary number of positional parameters", the prefix "?" can mean > "single optional parameter". > > Example: > > def get(store, key, ?default): > if store.exists(key): > return store.retrieve(key) > try: > return default > except NameError: > raise LookupError Why a new syntax ? Can't we just have a pre-defined sentinel singleton NoDefault and use that throughout the code (and also special case it in argument parsing/handling)? def get(store, key, default=NoDefault): if store.exists(key): return store.retrieve(key) ... I added a special singleton NotGiven to our mxTools long ago for this purpose. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Mar 02 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Mar 2 20:01:00 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 3 Mar 2017 02:01:00 +0100 Subject: [Python-ideas] Positional-only parameters In-Reply-To: References: Message-ID: I am thinking at writing a PEP, yes. I need time to think about it, find all corner cases. Maybe also include something for "optional parameter without default value". Don't expect it soon, I have some pending work to finish before :-) Victor Le 2 mars 2017 7:16 PM, "Brett Cannon" a ?crit : > It seems all the core devs who have commented on this are in the positive > (Victor, Yury, Ethan, Yury, Guido, Terry, and Steven; MAL didn't explicitly > vote). So to me that suggests there's enough support to warrant writing a > PEP. Are you up for writing it, Victor, or is someone else going to write > it? > > On Tue, 28 Feb 2017 at 13:18 Victor Stinner > wrote: > >> Hi, >> >> For technical reasons, many functions of the Python standard libraries >> implemented in C have positional-only parameters. Example: >> ------- >> $ ./python >> Python 3.7.0a0 (default, Feb 25 2017, 04:30:32) >> >>> help(str.replace) >> replace(self, old, new, count=-1, /) # <== notice "/" at the end >> ... >> >>> "a".replace("x", "y") # ok >> 'a' >> >> >>> "a".replace(old="x", new="y") # ERR! >> TypeError: replace() takes at least 2 arguments (0 given) >> ------- >> >> When converting the methods of the builtin str type to the internal >> "Argument Clinic" tool (tool to generate the function signature, >> function docstring and the code to parse arguments in C), I asked if >> we should add support for keyword arguments in str.replace(). The >> answer was quick: no! It's a deliberate design choice. >> >> Quote of Yury Selivanov's message: >> """ >> I think Guido explicitly stated that he doesn't like the idea to >> always allow keyword arguments for all methods. I.e. `str.find('aaa')` >> just reads better than `str.find(needle='aaa')`. Essentially, the idea >> is that for most of the builtins that accept one or two arguments, >> positional-only parameters are better. >> """ >> http://bugs.python.org/issue29286#msg285578 >> >> I just noticed a module on PyPI to implement this behaviour on Python >> functions: >> >> https://pypi.python.org/pypi/positional >> >> My question is: would it make sense to implement this feature in >> Python directly? If yes, what should be the syntax? Use "/" marker? >> Use the @positional() decorator? >> >> Do you see concrete cases where it's a deliberate choice to deny >> passing arguments as keywords? >> >> Don't you like writing int(x="123") instead of int("123")? :-) (I know >> that Serhiy Storshake hates the name of the "x" parameter of the int >> constructor ;-)) >> >> By the way, I read that "/" marker is unknown by almost all Python >> developers, and [...] syntax should be preferred, but >> inspect.signature() doesn't support this syntax. Maybe we should fix >> signature() and use [...] format instead? >> >> Replace "replace(self, old, new, count=-1, /)" with "replace(self, >> old, new[, count=-1])" (or maybe even not document the default >> value?). >> >> Python 3.5 help (docstring) uses "S.replace(old, new[, count])". >> >> Victor >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Thu Mar 2 20:26:15 2017 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 2 Mar 2017 19:26:15 -0600 Subject: [Python-ideas] Delayed Execution via Keyword In-Reply-To: References: <159901d28b6b$0aef9280$20ceb780$@hotmail.com> Message-ID: I don't think you have to make a special case for iteration. When the interpreter hits: >>> print(x1) print falls under I/O, so it forces evaluation of x1, so we back-track to where x1 is evaluated: >>> x1 = next(xs) And in the next call, we find that we must evaluate the state of the iterator, so we have to back-track to: >>> x0 = next(xs) Evaluate that, then move forward. You essentially keep a graph of pending/unevaluated expressions linked by their dependencies and evaluate branches of the graph as needed. You need to evaluate state to navigate conditional branches, and whenever state is passed outside of the interpreter's scope (like I/O or multi-threading). I think problems might crop up in parts of the language that are pure c-code. For instance; I don't know if the state variables in a list iterator are actually visible to the Interpreter or if it's implemented in C that is inscrutable to the interpreter. On Mar 2, 2017 5:54 PM, "Joseph Jevnik" wrote: Other things that scrutinize an expression are iteration or branching (with the current evaluation model). If `xs` is a thunk, then `for x in xs` must scrutinize `xs`. At first this doesn't seem required; however, in general `next` imposes a data dependency on the next call to `next`. For example: x0 = next(xs) x1 = next(xs) print(x1) print(x0) If `next` doesn't force computation then evaluating `x1` before `x0` will bind `x1` to `xs[0]` which is not what the eager version of the code does. To preserve the current semantics of the language you cannot defer arbitrary expressions because they may have observable side-effects. Automatically translating would require knowing ahead of time if a function can have observable side effects, but that is not possible in Python. Because it is impossible to tell in the general case, we must rely on the user to tell us when it is safe to defer an expression. On Thu, Mar 2, 2017 at 6:42 PM, Abe Dillon wrote: > I'm going to repeat here what I posted in the thread on lazy imports. > If it's possible for the interpreter to determine when it needs to force > evaluation of a lazy expression or statement, then why not use them > everywhere? If that's the case, then why not make everything lazy by > default? Why not make it a service of the language to lazify your code > (analogous to garbage collection) so a human doesn't have to worry about > screwing it up? > > There are, AFAIK, three things that *must* force evaluation of lazy > expressions or statements: > > 1) Before the GIL is released, all pending lazy code must be evaluated > since the current thread can't know what variables another thread will try > to access (unless there's a way to explicitly label variables as "shared", > then it will only force evaluation of those). > > 2) Branching statements force evaluation of anything required to evaluate > the conditional clause. > > 3) I/O forces evaluation of any involved lazy expressions. > > > On Mon, Feb 20, 2017 at 7:07 PM, Joshua Morton > wrote: > >> This comes from a bit of a misunderstanding of how an interpreter figures >> out what needs to be compiled. Most (all?) JIT compilers run code in an >> interpreted manner, and then compile subsections down to efficient machine >> code when they notice that the same code path is taken repeatedly, so in >> pypy something like >> >> x = 0 >> for i in range(100000): >> x += 1 >> >> would, get, after 10-20 runs through the loop, turned into assembly that >> looked like what you'd write in pure C, instead of the very indirection and >> pointer heavy code that such a loop would be if you could take it and >> convert it to cpython actually executes, for example. So the "hot" code is >> still run. >> >> All that said, this is a bit of an off topic discussion and probably >> shouldn't be on list. >> >> What you really do want is functional purity, which is a different >> concept and one that python as a language can't easily provide no matter >> what. >> >> --Josh >> >> On Mon, Feb 20, 2017 at 7:53 PM Abe Dillon wrote: >> >>> On Fri, Feb 17, 2017, Steven D'Aprano wrote: >>> >>> JIT compilation delays *compiling* the code to run-time. This is a >>> proposal for delaying *running* the code until such time as some other >>> piece of code actually needs the result. >>> >>> >>> My thought was that if a compiler is capable of determining what needs >>> to be compiled just in time, then an interpreter might be able to determine >>> what expressions need to be evaluated just when their results are actually >>> used. >>> >>> So if you had code that looked like: >>> >>> >>> log.debug("data: %s", expensive()) >>> >>> The interpreter could skip evaluating the expensive function if the >>> result is never used. It would only evaluate it "just in time". This would >>> almost certainly require just in time compilation as well, otherwise the >>> byte code that calls the "log.debug" function would be unaware of the byte >>> code that implements the function. >>> >>> This is probably a pipe-dream, though; because the interpreter would >>> have to be aware of side effects. >>> >>> >>> >>> On Mon, Feb 20, 2017 at 5:18 AM, wrote: >>> >>> >>> >>> > -----Original Message----- >>> > From: Python-ideas [mailto:python-ideas-bounces+tritium- >>> > list=sdamon.com at python.org] On Behalf Of Michel Desmoulin >>> > Sent: Monday, February 20, 2017 3:30 AM >>> > To: python-ideas at python.org >>> > Subject: Re: [Python-ideas] Delayed Execution via Keyword >>> > >>> > I wrote a blog post about this, and someone asked me if it meant >>> > allowing lazy imports to make optional imports easier. >>> > >>> > Someting like: >>> > >>> > lazy import foo >>> > lazy from foo import bar >>> > >>> > So now if I don't use the imports, the module is not loaded, which >>> could >>> > also significantly speed up applications starting time with a lot of >>> > imports. >>> >>> Would that not also make a failure to import an error at the time of >>> executing the imported piece of code rather than at the place of import? >>> And how would optional imports work if they are not loaded until use? >>> Right >>> now, optional imports are done by wrapping the import statement in a >>> try/except, would you not need to do that handling everywhere the >>> imported >>> object is used instead? >>> >>> (I haven't been following the entire thread, and I don't know if this is >>> a >>> forest/tress argument) >>> >>> > _______________________________________________ >>> > Python-ideas mailing list >>> > Python-ideas at python.org >>> > https://mail.python.org/mailman/listinfo/python-ideas >>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Thu Mar 2 20:30:14 2017 From: joejev at gmail.com (Joseph Jevnik) Date: Thu, 2 Mar 2017 20:30:14 -0500 Subject: [Python-ideas] Delayed Execution via Keyword In-Reply-To: References: <159901d28b6b$0aef9280$20ceb780$@hotmail.com> Message-ID: without special casing iteration how do you know that `x1 = next(xs)` depends on the value of `x0`? If you assume every operation depends on every other operation then you have implemented an eager evaluation model. On Thu, Mar 2, 2017 at 8:26 PM, Abe Dillon wrote: > I don't think you have to make a special case for iteration. > > When the interpreter hits: > >>> print(x1) > > print falls under I/O, so it forces evaluation of x1, so we back-track to > where x1 is evaluated: > >>> x1 = next(xs) > > And in the next call, we find that we must evaluate the state of the > iterator, so we have to back-track to: > >>> x0 = next(xs) > > Evaluate that, then move forward. > > You essentially keep a graph of pending/unevaluated expressions linked by > their dependencies and evaluate branches of the graph as needed. You need > to evaluate state to navigate conditional branches, and whenever state is > passed outside of the interpreter's scope (like I/O or multi-threading). I > think problems might crop up in parts of the language that are pure c-code. > For instance; I don't know if the state variables in a list iterator are > actually visible to the Interpreter or if it's implemented in C that is > inscrutable to the interpreter. > > > On Mar 2, 2017 5:54 PM, "Joseph Jevnik" wrote: > > Other things that scrutinize an expression are iteration or branching > (with the current evaluation model). If `xs` is a thunk, then `for x in xs` > must scrutinize `xs`. At first this doesn't seem required; however, in > general `next` imposes a data dependency on the next call to `next`. For > example: > > x0 = next(xs) > x1 = next(xs) > > print(x1) > print(x0) > > If `next` doesn't force computation then evaluating `x1` before `x0` will > bind `x1` to `xs[0]` which is not what the eager version of the code does. > > To preserve the current semantics of the language you cannot defer > arbitrary expressions because they may have observable side-effects. > Automatically translating would require knowing ahead of time if a function > can have observable side effects, but that is not possible in Python. > Because it is impossible to tell in the general case, we must rely on the > user to tell us when it is safe to defer an expression. > > On Thu, Mar 2, 2017 at 6:42 PM, Abe Dillon wrote: > >> I'm going to repeat here what I posted in the thread on lazy imports. >> If it's possible for the interpreter to determine when it needs to force >> evaluation of a lazy expression or statement, then why not use them >> everywhere? If that's the case, then why not make everything lazy by >> default? Why not make it a service of the language to lazify your code >> (analogous to garbage collection) so a human doesn't have to worry about >> screwing it up? >> >> There are, AFAIK, three things that *must* force evaluation of lazy >> expressions or statements: >> >> 1) Before the GIL is released, all pending lazy code must be evaluated >> since the current thread can't know what variables another thread will try >> to access (unless there's a way to explicitly label variables as "shared", >> then it will only force evaluation of those). >> >> 2) Branching statements force evaluation of anything required to evaluate >> the conditional clause. >> >> 3) I/O forces evaluation of any involved lazy expressions. >> >> >> On Mon, Feb 20, 2017 at 7:07 PM, Joshua Morton > > wrote: >> >>> This comes from a bit of a misunderstanding of how an interpreter >>> figures out what needs to be compiled. Most (all?) JIT compilers run code >>> in an interpreted manner, and then compile subsections down to efficient >>> machine code when they notice that the same code path is taken repeatedly, >>> so in pypy something like >>> >>> x = 0 >>> for i in range(100000): >>> x += 1 >>> >>> would, get, after 10-20 runs through the loop, turned into assembly that >>> looked like what you'd write in pure C, instead of the very indirection and >>> pointer heavy code that such a loop would be if you could take it and >>> convert it to cpython actually executes, for example. So the "hot" code is >>> still run. >>> >>> All that said, this is a bit of an off topic discussion and probably >>> shouldn't be on list. >>> >>> What you really do want is functional purity, which is a different >>> concept and one that python as a language can't easily provide no matter >>> what. >>> >>> --Josh >>> >>> On Mon, Feb 20, 2017 at 7:53 PM Abe Dillon wrote: >>> >>>> On Fri, Feb 17, 2017, Steven D'Aprano wrote: >>>> >>>> JIT compilation delays *compiling* the code to run-time. This is a >>>> proposal for delaying *running* the code until such time as some other >>>> piece of code actually needs the result. >>>> >>>> >>>> My thought was that if a compiler is capable of determining what needs >>>> to be compiled just in time, then an interpreter might be able to determine >>>> what expressions need to be evaluated just when their results are actually >>>> used. >>>> >>>> So if you had code that looked like: >>>> >>>> >>> log.debug("data: %s", expensive()) >>>> >>>> The interpreter could skip evaluating the expensive function if the >>>> result is never used. It would only evaluate it "just in time". This would >>>> almost certainly require just in time compilation as well, otherwise the >>>> byte code that calls the "log.debug" function would be unaware of the byte >>>> code that implements the function. >>>> >>>> This is probably a pipe-dream, though; because the interpreter would >>>> have to be aware of side effects. >>>> >>>> >>>> >>>> On Mon, Feb 20, 2017 at 5:18 AM, wrote: >>>> >>>> >>>> >>>> > -----Original Message----- >>>> > From: Python-ideas [mailto:python-ideas-bounces+tritium- >>>> > list=sdamon.com at python.org] On Behalf Of Michel Desmoulin >>>> > Sent: Monday, February 20, 2017 3:30 AM >>>> > To: python-ideas at python.org >>>> > Subject: Re: [Python-ideas] Delayed Execution via Keyword >>>> > >>>> > I wrote a blog post about this, and someone asked me if it meant >>>> > allowing lazy imports to make optional imports easier. >>>> > >>>> > Someting like: >>>> > >>>> > lazy import foo >>>> > lazy from foo import bar >>>> > >>>> > So now if I don't use the imports, the module is not loaded, which >>>> could >>>> > also significantly speed up applications starting time with a lot of >>>> > imports. >>>> >>>> Would that not also make a failure to import an error at the time of >>>> executing the imported piece of code rather than at the place of import? >>>> And how would optional imports work if they are not loaded until use? >>>> Right >>>> now, optional imports are done by wrapping the import statement in a >>>> try/except, would you not need to do that handling everywhere the >>>> imported >>>> object is used instead? >>>> >>>> (I haven't been following the entire thread, and I don't know if this >>>> is a >>>> forest/tress argument) >>>> >>>> > _______________________________________________ >>>> > Python-ideas mailing list >>>> > Python-ideas at python.org >>>> > https://mail.python.org/mailman/listinfo/python-ideas >>>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Thu Mar 2 21:10:23 2017 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 2 Mar 2017 20:10:23 -0600 Subject: [Python-ideas] Delayed Execution via Keyword In-Reply-To: References: <159901d28b6b$0aef9280$20ceb780$@hotmail.com> Message-ID: > > without special casing iteration how do you know that `x1 = next(xs)` > depends on the value of `x0`? `x1 = next(xs)` doesn't depend on the value of `x0`, it depends on the state of xs. In order to evaluate `next(xs)` you have to jump into the function call and evaluate the relevant expressions within, which will, presumably, mean evaluating the value of some place-holder variable or something, which will trigger evaluation of preceding, pending expressions that modify the value of that place-holder variable, which includes `x0 = next(xs)`. You do have a point, though; if the pending-execution graph has to be fine enough scale to capture all that, then it's a dubious claim that juggling such a construct would save any time over simply executing the code as you go. This, I believe, goes beyond iterators and gets at the heart of what Josh said: What you really do want is functional purity, which is a different concept > and one that python as a language can't easily provide no matter what. `next` is not a pure function, because it has side-effects: it changes state variables. Even if those side-effects can be tracked by the interpreter, they present a challenge. In the example: >>> log.warning(expensive_function()) Where we want to avoid executing expensive_function(). It's likely that the function iterates over some large amount of data. According to the `x1 = next(xs)` example, that means building a huge pending-execution graph in case that function does need to be evaluated, so you can track the iterator state changes all the way back to the first iteration before executing. Perhaps there's some clever trick I'm not thinking of to keep the graph small and only expand it as needed. I don't know. Maybe, like Joshua Morton's JIT example, you could automatically identify loop patterns and collapse them somehow. I guess special casing iteration would help with that, though it's difficult to see what that would look like. On Thu, Mar 2, 2017 at 7:30 PM, Joseph Jevnik wrote: > without special casing iteration how do you know that `x1 = next(xs)` > depends on the value of `x0`? If you assume every operation depends on > every other operation then you have implemented an eager evaluation model. > > On Thu, Mar 2, 2017 at 8:26 PM, Abe Dillon wrote: > >> I don't think you have to make a special case for iteration. >> >> When the interpreter hits: >> >>> print(x1) >> >> print falls under I/O, so it forces evaluation of x1, so we back-track to >> where x1 is evaluated: >> >>> x1 = next(xs) >> >> And in the next call, we find that we must evaluate the state of the >> iterator, so we have to back-track to: >> >>> x0 = next(xs) >> >> Evaluate that, then move forward. >> >> You essentially keep a graph of pending/unevaluated expressions linked by >> their dependencies and evaluate branches of the graph as needed. You need >> to evaluate state to navigate conditional branches, and whenever state is >> passed outside of the interpreter's scope (like I/O or multi-threading). I >> think problems might crop up in parts of the language that are pure c-code. >> For instance; I don't know if the state variables in a list iterator are >> actually visible to the Interpreter or if it's implemented in C that is >> inscrutable to the interpreter. >> >> >> On Mar 2, 2017 5:54 PM, "Joseph Jevnik" wrote: >> >> Other things that scrutinize an expression are iteration or branching >> (with the current evaluation model). If `xs` is a thunk, then `for x in xs` >> must scrutinize `xs`. At first this doesn't seem required; however, in >> general `next` imposes a data dependency on the next call to `next`. For >> example: >> >> x0 = next(xs) >> x1 = next(xs) >> >> print(x1) >> print(x0) >> >> If `next` doesn't force computation then evaluating `x1` before `x0` will >> bind `x1` to `xs[0]` which is not what the eager version of the code does. >> >> To preserve the current semantics of the language you cannot defer >> arbitrary expressions because they may have observable side-effects. >> Automatically translating would require knowing ahead of time if a function >> can have observable side effects, but that is not possible in Python. >> Because it is impossible to tell in the general case, we must rely on the >> user to tell us when it is safe to defer an expression. >> >> On Thu, Mar 2, 2017 at 6:42 PM, Abe Dillon wrote: >> >>> I'm going to repeat here what I posted in the thread on lazy imports. >>> If it's possible for the interpreter to determine when it needs to force >>> evaluation of a lazy expression or statement, then why not use them >>> everywhere? If that's the case, then why not make everything lazy by >>> default? Why not make it a service of the language to lazify your code >>> (analogous to garbage collection) so a human doesn't have to worry about >>> screwing it up? >>> >>> There are, AFAIK, three things that *must* force evaluation of lazy >>> expressions or statements: >>> >>> 1) Before the GIL is released, all pending lazy code must be evaluated >>> since the current thread can't know what variables another thread will try >>> to access (unless there's a way to explicitly label variables as "shared", >>> then it will only force evaluation of those). >>> >>> 2) Branching statements force evaluation of anything required to >>> evaluate the conditional clause. >>> >>> 3) I/O forces evaluation of any involved lazy expressions. >>> >>> >>> On Mon, Feb 20, 2017 at 7:07 PM, Joshua Morton < >>> joshua.morton13 at gmail.com> wrote: >>> >>>> This comes from a bit of a misunderstanding of how an interpreter >>>> figures out what needs to be compiled. Most (all?) JIT compilers run code >>>> in an interpreted manner, and then compile subsections down to efficient >>>> machine code when they notice that the same code path is taken repeatedly, >>>> so in pypy something like >>>> >>>> x = 0 >>>> for i in range(100000): >>>> x += 1 >>>> >>>> would, get, after 10-20 runs through the loop, turned into assembly >>>> that looked like what you'd write in pure C, instead of the very >>>> indirection and pointer heavy code that such a loop would be if you could >>>> take it and convert it to cpython actually executes, for example. So the >>>> "hot" code is still run. >>>> >>>> All that said, this is a bit of an off topic discussion and probably >>>> shouldn't be on list. >>>> >>>> What you really do want is functional purity, which is a different >>>> concept and one that python as a language can't easily provide no matter >>>> what. >>>> >>>> --Josh >>>> >>>> On Mon, Feb 20, 2017 at 7:53 PM Abe Dillon wrote: >>>> >>>>> On Fri, Feb 17, 2017, Steven D'Aprano wrote: >>>>> >>>>> JIT compilation delays *compiling* the code to run-time. This is a >>>>> proposal for delaying *running* the code until such time as some other >>>>> piece of code actually needs the result. >>>>> >>>>> >>>>> My thought was that if a compiler is capable of determining what needs >>>>> to be compiled just in time, then an interpreter might be able to determine >>>>> what expressions need to be evaluated just when their results are actually >>>>> used. >>>>> >>>>> So if you had code that looked like: >>>>> >>>>> >>> log.debug("data: %s", expensive()) >>>>> >>>>> The interpreter could skip evaluating the expensive function if the >>>>> result is never used. It would only evaluate it "just in time". This would >>>>> almost certainly require just in time compilation as well, otherwise the >>>>> byte code that calls the "log.debug" function would be unaware of the byte >>>>> code that implements the function. >>>>> >>>>> This is probably a pipe-dream, though; because the interpreter would >>>>> have to be aware of side effects. >>>>> >>>>> >>>>> >>>>> On Mon, Feb 20, 2017 at 5:18 AM, wrote: >>>>> >>>>> >>>>> >>>>> > -----Original Message----- >>>>> > From: Python-ideas [mailto:python-ideas-bounces+tritium- >>>>> > list=sdamon.com at python.org] On Behalf Of Michel Desmoulin >>>>> > Sent: Monday, February 20, 2017 3:30 AM >>>>> > To: python-ideas at python.org >>>>> > Subject: Re: [Python-ideas] Delayed Execution via Keyword >>>>> > >>>>> > I wrote a blog post about this, and someone asked me if it meant >>>>> > allowing lazy imports to make optional imports easier. >>>>> > >>>>> > Someting like: >>>>> > >>>>> > lazy import foo >>>>> > lazy from foo import bar >>>>> > >>>>> > So now if I don't use the imports, the module is not loaded, which >>>>> could >>>>> > also significantly speed up applications starting time with a lot of >>>>> > imports. >>>>> >>>>> Would that not also make a failure to import an error at the time of >>>>> executing the imported piece of code rather than at the place of >>>>> import? >>>>> And how would optional imports work if they are not loaded until use? >>>>> Right >>>>> now, optional imports are done by wrapping the import statement in a >>>>> try/except, would you not need to do that handling everywhere the >>>>> imported >>>>> object is used instead? >>>>> >>>>> (I haven't been following the entire thread, and I don't know if this >>>>> is a >>>>> forest/tress argument) >>>>> >>>>> > _______________________________________________ >>>>> > Python-ideas mailing list >>>>> > Python-ideas at python.org >>>>> > https://mail.python.org/mailman/listinfo/python-ideas >>>>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>> >>>>> >>>>> _______________________________________________ >>>>> Python-ideas mailing list >>>>> Python-ideas at python.org >>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Thu Mar 2 21:24:15 2017 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 2 Mar 2017 20:24:15 -0600 Subject: [Python-ideas] Delayed Execution via Keyword In-Reply-To: References: <159901d28b6b$0aef9280$20ceb780$@hotmail.com> Message-ID: Another problem I thought of was how this might complicate stack tracebacks. If you execute the following code: [1] a = ["hello", 1] [2] b = "1" + 1 [3] a = "".join(a) [4] print(a) The interpreter would build a graph until it hit line 4 and was forced to evaluate `a`. It would track `a` back to the branch: [1]->[3]-> and raise an error from line [3] when you would expect line [2] to raise an error first. I suppose it may be possible to catch any exceptions and force full evaluation of nodes up to that point to find any preceding errors, but that sounds like a harry proposition... On Thu, Mar 2, 2017 at 8:10 PM, Abe Dillon wrote: > without special casing iteration how do you know that `x1 = next(xs)` >> depends on the value of `x0`? > > `x1 = next(xs)` doesn't depend on the value of `x0`, it depends on the > state of xs. In order to evaluate `next(xs)` you have to jump into the > function call and evaluate the relevant expressions within, which will, > presumably, mean evaluating the value of some place-holder variable or > something, which will trigger evaluation of preceding, pending expressions > that modify the value of that place-holder variable, which includes `x0 = > next(xs)`. > > You do have a point, though; if the pending-execution graph has to be fine > enough scale to capture all that, then it's a dubious claim that juggling > such a construct would save any time over simply executing the code as you > go. This, I believe, goes beyond iterators and gets at the heart of what > Josh said: > > What you really do want is functional purity, which is a different concept >> and one that python as a language can't easily provide no matter what. > > > `next` is not a pure function, because it has side-effects: it changes > state variables. Even if those side-effects can be tracked by the > interpreter, they present a challenge. In the example: > >>> log.warning(expensive_function()) > > Where we want to avoid executing expensive_function(). It's likely that > the function iterates over some large amount of data. According to the `x1 > = next(xs)` example, that means building a huge pending-execution graph in > case that function does need to be evaluated, so you can track the iterator > state changes all the way back to the first iteration before executing. > > Perhaps there's some clever trick I'm not thinking of to keep the graph > small and only expand it as needed. I don't know. Maybe, like Joshua > Morton's JIT example, you could automatically identify loop patterns and > collapse them somehow. I guess special casing iteration would help with > that, though it's difficult to see what that would look like. > > > > On Thu, Mar 2, 2017 at 7:30 PM, Joseph Jevnik wrote: > >> without special casing iteration how do you know that `x1 = next(xs)` >> depends on the value of `x0`? If you assume every operation depends on >> every other operation then you have implemented an eager evaluation model. >> >> On Thu, Mar 2, 2017 at 8:26 PM, Abe Dillon wrote: >> >>> I don't think you have to make a special case for iteration. >>> >>> When the interpreter hits: >>> >>> print(x1) >>> >>> print falls under I/O, so it forces evaluation of x1, so we back-track >>> to where x1 is evaluated: >>> >>> x1 = next(xs) >>> >>> And in the next call, we find that we must evaluate the state of the >>> iterator, so we have to back-track to: >>> >>> x0 = next(xs) >>> >>> Evaluate that, then move forward. >>> >>> You essentially keep a graph of pending/unevaluated expressions linked >>> by their dependencies and evaluate branches of the graph as needed. You >>> need to evaluate state to navigate conditional branches, and whenever state >>> is passed outside of the interpreter's scope (like I/O or multi-threading). >>> I think problems might crop up in parts of the language that are pure >>> c-code. For instance; I don't know if the state variables in a list >>> iterator are actually visible to the Interpreter or if it's implemented in >>> C that is inscrutable to the interpreter. >>> >>> >>> On Mar 2, 2017 5:54 PM, "Joseph Jevnik" wrote: >>> >>> Other things that scrutinize an expression are iteration or branching >>> (with the current evaluation model). If `xs` is a thunk, then `for x in xs` >>> must scrutinize `xs`. At first this doesn't seem required; however, in >>> general `next` imposes a data dependency on the next call to `next`. For >>> example: >>> >>> x0 = next(xs) >>> x1 = next(xs) >>> >>> print(x1) >>> print(x0) >>> >>> If `next` doesn't force computation then evaluating `x1` before `x0` >>> will bind `x1` to `xs[0]` which is not what the eager version of the code >>> does. >>> >>> To preserve the current semantics of the language you cannot defer >>> arbitrary expressions because they may have observable side-effects. >>> Automatically translating would require knowing ahead of time if a function >>> can have observable side effects, but that is not possible in Python. >>> Because it is impossible to tell in the general case, we must rely on the >>> user to tell us when it is safe to defer an expression. >>> >>> On Thu, Mar 2, 2017 at 6:42 PM, Abe Dillon wrote: >>> >>>> I'm going to repeat here what I posted in the thread on lazy imports. >>>> If it's possible for the interpreter to determine when it needs to >>>> force evaluation of a lazy expression or statement, then why not use them >>>> everywhere? If that's the case, then why not make everything lazy by >>>> default? Why not make it a service of the language to lazify your code >>>> (analogous to garbage collection) so a human doesn't have to worry about >>>> screwing it up? >>>> >>>> There are, AFAIK, three things that *must* force evaluation of lazy >>>> expressions or statements: >>>> >>>> 1) Before the GIL is released, all pending lazy code must be evaluated >>>> since the current thread can't know what variables another thread will try >>>> to access (unless there's a way to explicitly label variables as "shared", >>>> then it will only force evaluation of those). >>>> >>>> 2) Branching statements force evaluation of anything required to >>>> evaluate the conditional clause. >>>> >>>> 3) I/O forces evaluation of any involved lazy expressions. >>>> >>>> >>>> On Mon, Feb 20, 2017 at 7:07 PM, Joshua Morton < >>>> joshua.morton13 at gmail.com> wrote: >>>> >>>>> This comes from a bit of a misunderstanding of how an interpreter >>>>> figures out what needs to be compiled. Most (all?) JIT compilers run code >>>>> in an interpreted manner, and then compile subsections down to efficient >>>>> machine code when they notice that the same code path is taken repeatedly, >>>>> so in pypy something like >>>>> >>>>> x = 0 >>>>> for i in range(100000): >>>>> x += 1 >>>>> >>>>> would, get, after 10-20 runs through the loop, turned into assembly >>>>> that looked like what you'd write in pure C, instead of the very >>>>> indirection and pointer heavy code that such a loop would be if you could >>>>> take it and convert it to cpython actually executes, for example. So the >>>>> "hot" code is still run. >>>>> >>>>> All that said, this is a bit of an off topic discussion and probably >>>>> shouldn't be on list. >>>>> >>>>> What you really do want is functional purity, which is a different >>>>> concept and one that python as a language can't easily provide no matter >>>>> what. >>>>> >>>>> --Josh >>>>> >>>>> On Mon, Feb 20, 2017 at 7:53 PM Abe Dillon >>>>> wrote: >>>>> >>>>>> On Fri, Feb 17, 2017, Steven D'Aprano wrote: >>>>>> >>>>>> JIT compilation delays *compiling* the code to run-time. This is a >>>>>> proposal for delaying *running* the code until such time as some other >>>>>> piece of code actually needs the result. >>>>>> >>>>>> >>>>>> My thought was that if a compiler is capable of determining what >>>>>> needs to be compiled just in time, then an interpreter might be able to >>>>>> determine what expressions need to be evaluated just when their results are >>>>>> actually used. >>>>>> >>>>>> So if you had code that looked like: >>>>>> >>>>>> >>> log.debug("data: %s", expensive()) >>>>>> >>>>>> The interpreter could skip evaluating the expensive function if the >>>>>> result is never used. It would only evaluate it "just in time". This would >>>>>> almost certainly require just in time compilation as well, otherwise the >>>>>> byte code that calls the "log.debug" function would be unaware of the byte >>>>>> code that implements the function. >>>>>> >>>>>> This is probably a pipe-dream, though; because the interpreter would >>>>>> have to be aware of side effects. >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 20, 2017 at 5:18 AM, wrote: >>>>>> >>>>>> >>>>>> >>>>>> > -----Original Message----- >>>>>> > From: Python-ideas [mailto:python-ideas-bounces+tritium- >>>>>> > list=sdamon.com at python.org] On Behalf Of Michel Desmoulin >>>>>> > Sent: Monday, February 20, 2017 3:30 AM >>>>>> > To: python-ideas at python.org >>>>>> > Subject: Re: [Python-ideas] Delayed Execution via Keyword >>>>>> > >>>>>> > I wrote a blog post about this, and someone asked me if it meant >>>>>> > allowing lazy imports to make optional imports easier. >>>>>> > >>>>>> > Someting like: >>>>>> > >>>>>> > lazy import foo >>>>>> > lazy from foo import bar >>>>>> > >>>>>> > So now if I don't use the imports, the module is not loaded, which >>>>>> could >>>>>> > also significantly speed up applications starting time with a lot of >>>>>> > imports. >>>>>> >>>>>> Would that not also make a failure to import an error at the time of >>>>>> executing the imported piece of code rather than at the place of >>>>>> import? >>>>>> And how would optional imports work if they are not loaded until >>>>>> use? Right >>>>>> now, optional imports are done by wrapping the import statement in a >>>>>> try/except, would you not need to do that handling everywhere the >>>>>> imported >>>>>> object is used instead? >>>>>> >>>>>> (I haven't been following the entire thread, and I don't know if this >>>>>> is a >>>>>> forest/tress argument) >>>>>> >>>>>> > _______________________________________________ >>>>>> > Python-ideas mailing list >>>>>> > Python-ideas at python.org >>>>>> > https://mail.python.org/mailman/listinfo/python-ideas >>>>>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>>>>> >>>>>> _______________________________________________ >>>>>> Python-ideas mailing list >>>>>> Python-ideas at python.org >>>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Python-ideas mailing list >>>>>> Python-ideas at python.org >>>>>> https://mail.python.org/mailman/listinfo/python-ideas >>>>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Mar 2 22:36:29 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Mar 2017 13:36:29 +1000 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 2 March 2017 at 21:06, Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: > On 02.03.2017 06:46, Nick Coghlan wrote: > >> The proposal in this thread then has the significant downside of only >> covering the "nested side effect" case: >> >> for item in iterable: >> if condition(item): >> break >> except break: >> operation(item) >> else: >> condition_was_never_true(iterable) >> >> While being even *less* amenable to being pushed down into a helper >> function (since converting the "break" to a "return" would bypass the >> "except break" clause). >> > > I'm actually not quite buying this last argument. If you wanted to > refactor this to "return" instead of "break", you could simply put the > return into the except break block. In many real-world situations with > multiple breaks from a loop this could actually make things easier instead > of worse. > Fair point - so that would be even with the "single nested side effect" case, but simpler when you had multiple break conditions (and weren't already combined them with "and"). > Personally, the "nested side effect" form makes me uncomfortable every > time I use it because the side effects on breaking or not breaking the loop > don't end up at the same indentation level and not necessarily together. > However, I'm gathering from the discussion so far that not too many people > are thinking like me about this point, so maybe I should simply adjust my > mind-set. > This is why I consider the "search only" form of the loop, where the else clause either sets a default value, or else prevents execution of the code after the loop body (via raise, return, or continue), to be the preferred form: there aren't any meaningful side effects hidden away next to the break statement. If I can't do that, I'm more likely to switch to a classic flag variable that gets checked post-loop execution than I am to push the side effect inside the loop body: search_result = _not_found = object() for item in iterable: if condition(item): search_result = item break if search_result is _not_found: # Handle the "not found" case else: # Handle the "found" case > All that said, this is a very nice abstract view on things! I really > learned quite a bit from this, thank you :) > > As always though, reality can be expected to be quite a bit more > complicated than theory so I decided to check the stdlib for real uses of > break. This is quite a tedious task since break is used in many different > ways and I couldn't come up with a good automated way of classifying them. > So what I did is just go through stdlib code (in reverse alphabetical > order) containing the break keyword and put it into categories manually. I > only got up to socket.py before losing my enthusiasm, but here's what I > found: > > - overall I looked at 114 code blocks that contain one or more breaks > Thanks for doing that research :) > Of the remaining 19 non-trivial cases > > - 9 are variations of your classical search idiom above, i.e., there's an > else clause there and nothing more is needed > > - 6 are variations of your "nested side-effects" form presented above with > debatable (see above) benefit from except break > > - 2 do not use an else clause currently, but have multiple breaks that do > partly redundant things that could be combined in a single except break > clause > Those 8 cases could also be reviewed to see whether a flag variable might be clearer than relying on nested side effects or code repetition. > - 1 is an example of breaking out of two loops; from sre_parse._parse_sub: > > [...] > # check if all items share a common prefix > while True: > prefix = None > for item in items: > if not item: > break > if prefix is None: > prefix = item[0] > elif item[0] != prefix: > break > else: > # all subitems start with a common "prefix". > # move it out of the branch > for item in items: > del item[0] > subpatternappend(prefix) > continue # check next one > break > [...] > This is a case where a flag variable may be easier to read than loop state manipulations: may_have_common_prefix = True while may_have_common_prefix: prefix = None for item in items: if not item: may_have_common_prefix = False break if prefix is None: prefix = item[0] elif item[0] != prefix: may_have_common_prefix = False break else: # all subitems start with a common "prefix". # move it out of the branch for item in items: del item[0] subpatternappend(prefix) Although the whole thing could likely be cleaned up even more via itertools.zip_longest: for first_uncommon_idx, aligned_entries in enumerate(itertools.zip_longest(*items)): if not all_true_and_same(aligned_entries): break else: # Everything was common, so clear all entries first_uncommon_idx = None for item in items: del item[:first_uncommon_idx] (Batching the deletes like that may even be slightly faster than deleting common entries one at a time) Given the following helper function: def all_true_and_same(entries): itr = iter(entries) try: first_entry = next(itr) except StopIteration: return False if not first_entry: return False for entry in itr: if not entry or entry != first_entry: return False return True > > - finally, 1 is a complicated break dance to achieve sth that clearly > would have been easier with except break; from typing.py: > > [...] > def __subclasscheck__(self, cls): > if cls is Any: > return True > if isinstance(cls, GenericMeta): > # For a class C(Generic[T]) where T is co-variant, > # C[X] is a subclass of C[Y] iff X is a subclass of Y. > origin = self.__origin__ > if origin is not None and origin is cls.__origin__: > assert len(self.__args__) == len(origin.__parameters__) > assert len(cls.__args__) == len(origin.__parameters__) > for p_self, p_cls, p_origin in zip(self.__args__, > cls.__args__, > origin.__parameters__): > if isinstance(p_origin, TypeVar): > if p_origin.__covariant__: > # Covariant -- p_cls must be a subclass of > p_self. > if not issubclass(p_cls, p_self): > break > elif p_origin.__contravariant__: > # Contravariant. I think it's the opposite. > :-) > if not issubclass(p_self, p_cls): > break > else: > # Invariant -- p_cls and p_self must equal. > if p_self != p_cls: > break > else: > # If the origin's parameter is not a typevar, > # insist on invariance. > if p_self != p_cls: > break > else: > return True > # If we break out of the loop, the superclass gets a > chance. > if super().__subclasscheck__(cls): > return True > if self.__extra__ is None or isinstance(cls, GenericMeta): > return False > return issubclass(cls, self.__extra__) > [...] > I think is another case that is asking for the inner loop to be factored out to a named function, not for reasons of re-use, but for reasons of making the code more readable and self-documenting :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From python-ideas at mgmiller.net Fri Mar 3 00:13:12 2017 From: python-ideas at mgmiller.net (Mike Miller) Date: Thu, 2 Mar 2017 21:13:12 -0800 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <20170302180217.03cf6df5@subdivisions.wooz.org> References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> <58B84EF1.4020405@stoneleaf.us> <20170302180217.03cf6df5@subdivisions.wooz.org> Message-ID: Agreed, I've rarely found a need for a "second None" or sentinel either, but once every few years I do. So, this use case doesn't seem to be common enough to devote special syntax or a keyword to from my perspective. But, I'll let you know my secret. I don't make my own sentinel, but rather use another singleton that is built-in already. And if you squint just right, it even makes sense. It is a built-in singleton so rarely known that you will almost never encounter code with it, so you'll have it all to yourself. Even on a python mailing list, in a thread about sentinels/singletons, it will not be mentioned. Some may "consider it unnatural." It is? ? ? (hint) ? (wait for it) ? >>> Ellipsis Ellipsis Don't think I've ever needed a "third None" but if I did I'd probably try an enum instead. -Mike On 2017-03-02 15:02, Barry Warsaw wrote: > On Mar 02, 2017, at 06:37 PM, Brett Cannon wrote: > >> So to me, there's actually two things being discussed. Do we need another >> sentinel to handle the "None is valid" case, and do we want syntax to more >> clearly delineate optional arguments? > > No, and no (IMHO). > > -Barry > From bussonniermatthias at gmail.com Fri Mar 3 00:46:03 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Thu, 2 Mar 2017 21:46:03 -0800 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> <58B84EF1.4020405@stoneleaf.us> <20170302180217.03cf6df5@subdivisions.wooz.org> Message-ID: On Thu, Mar 2, 2017 at 9:13 PM, Mike Miller wrote: > > It is a built-in singleton so rarely known that you will almost never > encounter code with it, so you'll have it all to yourself. Even on a python > mailing list, in a thread about sentinels/singletons, it will not be > mentioned. Some may "consider it unnatural." It is? > > ? > > ? (hint) > > ? (wait for it) > > ? > <3 the suspens ! I got it before the end though ! I'll tell you my secret as well. I have my own (https://pypi.python.org/pypi/undefined) I've set it up to raise on __eq__ or __bool__ to enforce checking for identity. -- M From pavol.lisy at gmail.com Fri Mar 3 02:14:30 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Fri, 3 Mar 2017 08:14:30 +0100 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 3/1/17, Wolfgang Maier wrote: > - as explained by Nick, the existence of "except break" would strengthen > the analogy with try/except/else and help people understand what the > existing else clause after a loop is good for. I was thinking bout this analogy: 1. try/else (without except) is SyntaxError. And seems useless. 2. try/break/except is backward compatible: for i in L: try: break except Something: pass except break: # current code has not this so break is applied to for-block 3. for/raise/except (which is natural application of this analogy) could reduce indentation but in my personal view that don't improve readability (but I could be wrong) It could help enhance "break" possibilities so "simplify" double break in nested loops. for broken = False for if condition1(): # I like to "double break" here raise SomeError() if condition2(): break except SomeError: break except break: broken = True 4. for/finally may be useful From srkunze at mail.de Thu Mar 2 15:29:19 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 2 Mar 2017 21:29:19 +0100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <20170228235616.GP5689@ando.pearwood.info> <58e46182-1210-438f-b85a-02aa1bd9dc9e@gmail.com> Message-ID: On 02.03.2017 04:41, Chris Barker wrote: > Maybe someone else will chime in with more "I'd really have a use for > this" examples. It also makes refactoring easier. Regards, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Fri Mar 3 04:05:12 2017 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 3 Mar 2017 10:05:12 +0100 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 03/02/2017 07:05 PM, Brett Cannon wrote: > > - overall I looked at 114 code blocks that contain one or more breaks > > > I wanted to say thanks for taking the time to go through the stdlib and > doing such a thorough analysis of the impact of your suggestion! It > always helps to have real-world numbers to know whether an idea will be > useful (or not). > > > > - 84 of these are trivial use cases that simply break out of a while > True block or terminate a while/for loop prematurely (no use for any > follow-up clause there) > > - 8 more are causing a side-effect before a single break, and it would > be pointless to put this into an except break clause > > - 3 more cause different, non-redundant side-effects before different > breaks from the same loop and, obviously, an except break clause would > not help them either > > => So the vast majority of breaks does *not* need an except break *nor* > an else clause, but that's just as expected. > > > Of the remaining 19 non-trivial cases > > - 9 are variations of your classical search idiom above, i.e., there's > an else clause there and nothing more is needed > > - 6 are variations of your "nested side-effects" form presented above > with debatable (see above) benefit from except break > > - 2 do not use an else clause currently, but have multiple breaks that > do partly redundant things that could be combined in a single except > break clause > > - 1 is an example of breaking out of two loops; from > sre_parse._parse_sub: > [...] > - finally, 1 is a complicated break dance to achieve sth that clearly > would have been easier with except break; from typing.py: > > My summary: I do see use-cases for the except break clause, but, > admittedly, they are relatively rare and may be not worth the hassle of > introducing new syntax. > > > IOW out of 114 cases, 4 may benefit from an 'except' block? If I'm > reading those numbers correctly then ~3.5% of cases would benefit which > isn't high enough to add the syntax and related complexity IMO. Hmm, I'm not sure how much sense it makes to express this in percent since the total your comparing to is rather arbitrary. The 114 cases include *any* for/while loop I could find that contains at least a single break. More than 90 of these loops do not use an "else" clause either showing that even this currently supported syntax is used rarely. I found only 19 cases that are complex enough to be candidates for an except clause (17 of these use the else clause). For 9 of these 19 (the ones using the classical search idiom) an except clause would not be applicable, but it could be used in the 10 remaining cases (though all of them could also make use of a flag or could be refactored instead). So depending on what you want to emphasize you could also say that the proposal could affect as much as 10/19 or 52.6% of cases. From wolfgang.maier at biologie.uni-freiburg.de Fri Mar 3 03:47:52 2017 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Fri, 3 Mar 2017 09:47:52 +0100 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 03/03/2017 04:36 AM, Nick Coghlan wrote: > On 2 March 2017 at 21:06, Wolfgang Maier > > wrote: > >> - overall I looked at 114 code blocks that contain one or more breaks > > > Thanks for doing that research :) > > >> Of the remaining 19 non-trivial cases >> >> - 9 are variations of your classical search idiom above, i.e., >> there's an else clause there and nothing more is needed >> >> - 6 are variations of your "nested side-effects" form presented >> above with debatable (see above) benefit from except break >> >> - 2 do not use an else clause currently, but have multiple breaks >> that do partly redundant things that could be combined in a single >> except break clause > > > Those 8 cases could also be reviewed to see whether a flag variable > might be clearer than relying on nested side effects or code repetition. > [...] > > This is a case where a flag variable may be easier to read than loop > state manipulations: > > may_have_common_prefix = True > while may_have_common_prefix: > prefix = None > for item in items: > if not item: > may_have_common_prefix = False > break > if prefix is None: > prefix = item[0] > elif item[0] != prefix: > may_have_common_prefix = False > break > else: > # all subitems start with a common "prefix". > # move it out of the branch > for item in items: > del item[0] > subpatternappend(prefix) > > Although the whole thing could likely be cleaned up even more via > itertools.zip_longest: > > for first_uncommon_idx, aligned_entries in > enumerate(itertools.zip_longest(*items)): > if not all_true_and_same(aligned_entries): > break > else: > # Everything was common, so clear all entries > first_uncommon_idx = None > for item in items: > del item[:first_uncommon_idx] > > (Batching the deletes like that may even be slightly faster than > deleting common entries one at a time) > > Given the following helper function: > > def all_true_and_same(entries): > itr = iter(entries) > try: > first_entry = next(itr) > except StopIteration: > return False > if not first_entry: > return False > for entry in itr: > if not entry or entry != first_entry: > return False > return True > >> - finally, 1 is a complicated break dance to achieve sth that >> clearly would have been easier with except break; from typing.py: > [...] > > I think is another case that is asking for the inner loop to be factored > out to a named function, not for reasons of re-use, but for reasons of > making the code more readable and self-documenting :) > It's true that using a flag or factoring out redundant code is always a possibility. Having the except clause would clearly not let people do anything they couldn't have done before. On the other hand, the same is true for the else clause - it's only advantage here is that it's existing already - because a single flag could always distinguish between a break having occurred or not: brk = False for item in iterable: if some_condition: brk = True break if brk: do_stuff_upon_breaking_out() else: do_alternative_stuff() is a general pattern that would always work without except *and* else. However, the fact that else exists generates a regrettable asymmetry in that there is direct language support for detecting one outcome, but not the other. Stressing the analogy to try/except/else one more time, it's as if "else" wasn't available for try blocks. You could always use a flag to substitute for it: dealt_with_exception = False try: do_stuff() except: deal_with_exception() dealt_with_exception = True if dealt_with_exception: do_stuff_you_would_do_in_an_else_block() So IMO the real difference here is that the except clause after for would require adding it to the language, while the else clauses are there already. With that we're back at the high bar for adding new syntax :( A somewhat similar case that comes to mind here is PEP 315 -- Enhanced While Loop, which got rejected for two reasons, the first one being pretty much the same as the argument here, i.e., that instead of the proposed do .. while it's always possible to factor out or duplicate a line of code. However, the second reason was that it required the new "do" keyword, something not necessary for the current suggestion. From contact at brice.xyz Fri Mar 3 04:14:31 2017 From: contact at brice.xyz (Brice PARENT) Date: Fri, 3 Mar 2017 10:14:31 +0100 Subject: [Python-ideas] For/in/as syntax Message-ID: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> Object: Creation of a ForLoopIterationObject object (name open to suggestions) using a "For/in/as" syntax. Disclaimer: I've read PEP-3136 which was refused. The present proposal includes a solution to what it tried to solve (multi-level breaking out of loops), but is a more general proposal and it offers many unrelated advantages and simplifications. It also gives another clean way of making a kind of for/except/else syntax of another active conversation. What it tries to solve: During the iteration process, we often need to create temporary variables (like counters or states) or tests that don't correspond to the logic of the algo, but are required to make it work. This proposal aims at removing these unnecessary and hard to understand parts and help write simpler, cleaner, more maintainable and more logical code. So, more PEP-8 compliant code. How does it solve it: By instanciating a new object when starting a loop, which contains some data about the iteration process and some methods to act on it. The list of available properties and methods are yet to be defined, but here are a subset of what could be really helpful. A word about compatibility and understandability before: "as" is already a keyword, so it is already reserved and easy to parse. It couldn't be taken for its other uses (context managers, import statements and exceptions) as they all start with a specific and different keyword. It would have a really similar meaning, so be easy to understand. It doesn't imply any incompatibility with previous code, as current syntax would of course still be supported, and it doesn't change anything anywhere else (no change in indentation levels for example, no deprecation). Syntax: for element in elements as elements_loop: assert type(elements_loop) is ForLoopIterationObject Properties and methods (non exhaustive list, really open to discussion and suggestions): for element in elements as elements_loop: elements_loop.length: int # len ? count ? elements_loop.counter: int # Add a counter0, as in Django? a counter1 ? elements_loop.completed: bool elements_loop.break() elements_loop.continue() elements_loop.skip(count=1) Examples of every presented element (I didn't execute the code, it's just to explain the proposal): ################################################## # forloop.counter and forloop.length for num, dog in enumerate(dogs): print(num, dog) print("That's a total of {} dogs !".format(len(dogs))) # would be equivalent to for dog in dogs as dogs_loop: print(dogs_loop.counter, dog) print("That's a total of {} dogs !".format(dogs_loop.length)) # -> cleaner, and probably faster as it won't have to call len() or enumerate() #(but I could easily be wrong on that) ################################################## # forloop.length when we broke out of the list small_and_medium_dogs_count = 0 for dog in generate_dogs_list_by_size(): if dog.size >= medium_dog_size: break small_and_medium_dogs_count += 1 print("That's a total of {} small to medium dogs !".format( small_and_medium_dogs_count)) # would be equivalent to for dog in generate_dogs_list_by_size() as dogs_loop: if dog.size >= medium_dog_size: break print("That's a total of {} small to medium dogs !".format( dogs_loop.length - 1)) # -> easier to read, less temporary variables ################################################## # forloop.break(), to break out of nested loops (or explicitly out of current #loop) - a little like pep-3136's first proposal has_dog_named_rex = False for owner in owners: for dog in dogs: if dog.name == "Rex": has_dog_named_rex = True break if has_dog_named_rex: break # would be equivalent to for owner in owners as owners_loop: for dog in dogs: # syntax without "as" is off course still supported if dog.name == "Rex": owners_loop.break() # -> way easier to read and understand, less temporary variables ################################################## # forloop.continue(), to call "continue" from any nested loops (or explicitly #in active loop) has_dog_named_rex = False for owner in owners: for dog in owner.dogs: if dog.name == "Rex": has_dog_named_rex = True break if has_dog_named_rex: continue # do something # would be equivalent to for owner in owners as owners_loop: for dog in owner.dogs: if dog.name == "Rex": owners_loop.continue() # do something # -> way easier to read and understand, less temporary variables ################################################## # forloop.completed, to know if the list was entirely consumed or "break" was #called (or exception caught) - might help with the for/except/elseproposal broken_out = False for dog in dogs: if dog.name == "Rex": broken_out = True break if broken_out: print("We didn't consume all the dogs list") # would be equivalent to for dog in dogs as dogs_loop: if dog.name == "Rex": break if not dogs_loop.completed: print("We didn't consume all the dogs list") # -> less temporary variables, every line of code is relevant, easy to #understand ################################################## # forloop.skip # In the example, we want to skip 2 elements starting from item #2 skip = 0 for num, dog in enumerate(dogs): if skip: skip -= 1 continue if num == 2: skip = 2 # would be equivalent to for dog in dogs as dogs_loop: if dogs_loop.counter == 2: dogs_loop.skip(2) # -> way easier to read and understand, less temporary variables # Notes : # - Does a call to forloop.skip() implies a forloop.continue() call or does # the code continue its execution until the end of the loop, which will # then be skipped? Implying the .continue() call seems less ambiguous to # me. Or the method should be called skip_next_iteration, or something # like that. # - Does a call to forloop.skip(2) adds 2 to forloop.length or not? # -> kwargs may be added to allow both behaviours for both questions. # We could allow the argument to be a function that accepts a single argument #and return a boolean, like dogs_loop.skip(lambda k: k % 3 == 0) # Execute the code on multiples of 3 only ################################################## Thoughts : - It would allow to pass forloop.break and forloop.continue as callback to other functions. Not sure yet if it's a good or a bad thing (readability against what it could offer). - I haven't yet used much the asynchronous functionalities, so I couldn't yet think about the implications of such a new syntax to this (and what about a lazy keyword in here?) - I suppose it's a huge work to create such a syntax. And I have no idea how complicated it could be to create methods (like break() and continue()) doing what keywords were doing until now. - I'm not sure if that would make sense in list comprehensions, despite being confusing. - Would enable to support callback events like forloop.on_break. But would there be a need for that? - Would this have a major impact on the loops execution times? - would a "while condition as condition_loop:" be of any use too? Sorry for the very long message, I hope it will get your interest. And I also hope my English was clear enough. Brice Parent From srkunze at mail.de Fri Mar 3 04:16:36 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 3 Mar 2017 10:16:36 +0100 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: <16e59a1f-4975-a18f-b93f-1a0502dad3a3@mail.de> On 03.03.2017 09:47, Wolfgang Maier wrote: > However, the fact that else exists generates a regrettable asymmetry > in that there is direct language support for detecting one outcome, > but not the other. > > Stressing the analogy to try/except/else one more time, it's as if > "else" wasn't available for try blocks. You could always use a flag to > substitute for it: > > dealt_with_exception = False > try: > do_stuff() > except: > deal_with_exception() > dealt_with_exception = True > if dealt_with_exception: > do_stuff_you_would_do_in_an_else_block() Even worse when we think about the "finally" clause. Regards, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Thu Mar 2 15:36:02 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Thu, 2 Mar 2017 21:36:02 +0100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <58B65D6A.8090806@stoneleaf.us> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <58B6229D.10203@stoneleaf.us> <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> <58B65D6A.8090806@stoneleaf.us> Message-ID: <817ddde2-0e7a-31e8-6c15-1c5d626c6c14@mail.de> On 01.03.2017 06:34, Ethan Furman wrote: > On the bright side, if enough use-cases of this type come up (pesky > try/except for a simple situation), we may be able to get Guido to > reconsider PEP 463. I certainly think PEP 463 makes a lot more sense > that adding list.get(). It then would make sense to remove .get() on dicts. ;-) Regards, Sven ... and to remove parameter "default" of max(). and to remove parameter "default" of getattr(). ... From guettliml at thomas-guettler.de Fri Mar 3 06:44:54 2017 From: guettliml at thomas-guettler.de (=?UTF-8?Q?Thomas_G=c3=bcttler?=) Date: Fri, 3 Mar 2017 12:44:54 +0100 Subject: [Python-ideas] Smoothing transition: 'unicode' and 'basestring' as aliases for 'str'? Message-ID: <31701b30-03a5-3a8d-32e2-2dd59a26c09c@thomas-guettler.de> I found this in an old post: > Maybe too late now but there should have been 'unicode', > 'basestring' as aliases for 'str'. I guess it is too late to think about it again ... Regards, Thomas G?ttler -- Thomas Guettler http://www.thomas-guettler.de/ From victor.stinner at gmail.com Fri Mar 3 08:06:56 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 3 Mar 2017 14:06:56 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> <58B84EF1.4020405@stoneleaf.us> <20170302180217.03cf6df5@subdivisions.wooz.org> Message-ID: 2017-03-03 6:13 GMT+01:00 Mike Miller : > Agreed, I've rarely found a need for a "second None" or sentinel either, but > once every few years I do. So, this use case doesn't seem to be common > enough to devote special syntax or a keyword to from my perspective. The question here is how to have an official support of this feature in inspect.signature(). If we go to the special value (singleton) way, Ellispis doesn't work neither since a few modules use Ellipsis for legit use case. Recent user: the typing module for "Callable[[arg, ...], result]". Victor From srkunze at mail.de Fri Mar 3 08:19:24 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 3 Mar 2017 14:19:24 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: <27f4e4ae-efa5-e858-a825-d4d0514a388e@egenix.com> <05ad4367-52cd-c032-c401-feb7a87c46df@egenix.com> <58B84EF1.4020405@stoneleaf.us> <20170302180217.03cf6df5@subdivisions.wooz.org> Message-ID: <7fb6afbb-eeba-6836-be8d-1985350548e1@mail.de> On 03.03.2017 14:06, Victor Stinner wrote: > 2017-03-03 6:13 GMT+01:00 Mike Miller : >> Agreed, I've rarely found a need for a "second None" or sentinel either, but >> once every few years I do. So, this use case doesn't seem to be common >> enough to devote special syntax or a keyword to from my perspective. > The question here is how to have an official support of this feature > in inspect.signature(). > > If we go to the special value (singleton) way, Ellispis doesn't work > neither since a few modules use Ellipsis for legit use case. Recent > user: the typing module for "Callable[[arg, ...], result]". Exactly. So, it should be obvious to you, that introducing "official" support leads yet to an endless chain of "but I would need yet another None" because we already use [None, Undefined, NotUsed, Ellipsis, , pypi project] in our projects. Having every project rolling their own "NotDefined" makes it completely incompatible to each other. So, it's not possible to plug in return values to other functions and gets bitten. Not providing an "official" solution, solves the matter. Best, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Mar 3 09:29:24 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 3 Mar 2017 15:29:24 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: Since yet another sentinel singleton sounds like a dead end, I suggest to use [arg=value] syntax and require a default value in the prototype, as currently required for *optional keyword* arguments. "[...]" syntax for optional parameter is commonly used in Python documentation (but the exact syntax for multiple optional arguments is different than what I propose, see below). I already saw this syntax for optional parameters in C and PHP languages at least. Python prototype of the standard library and their generated signature: def bool([x=False]) => bool([x]) def bytearray([source=None], [encoding=None], errors="strict") => bytearray([source], [encoding], [errors]) # in practice, default value of 'default' parameter (and maybe also 'key'?) # will more likely be a custom sentinel def sum(iterable, *args, [key=None], [default=None]) => sum(iterable, *args, [key], [default]) # "/" is an hypothetical marker for positional-only arguments def str.replace(old, new, [count=-1], /) => str.replace(old, new, [count], /) def pickle.dump(obj, file, [protocol=3], *, fix_imports=True) => pickle.dump(obj, file, [protocol], *, fix_imports=True) An alternative for generated signature of multiple optional arguments is "bytearray([source[, encoding[, errors]]])", but I'm not a big fan of nested [...], IMHO it's harder to read. And I like the idea of having a signature closer to the actual Python code. Invalid syntaxes raising SyntaxError: * no default value: "def func([x]): ..." * optional arguments before *args: "def func(arg, [opt=None], *args):" In practice, calling a function without an optional argument or pass the (private?) sentinel as the optional argument should have the same behaviour. Each module is free to decide how the sentinel is exposed or not. For example, the inspect module has 2 sentinels: _empty is exposed as Signature.empty and Parameter.empty, whereas _void is private. If I understood correctly, Argument Clinic already supports optional positional arguments, and so doesn't need to be changed. I'm not sure that it's useful for optional keyword-only arguments: def func(*, [arg=None]) => func(*, [arg]) The only difference with optional keyword-only arguments with a default value is the signature: def func(*, arg=None) => func(*, arg=None) See also the discussion on converting the bisect functions to Argument Clinic and issues with the generated signature: http://bugs.python.org/issue28754 Victor From rainventions at gmail.com Fri Mar 3 09:37:09 2017 From: rainventions at gmail.com (Ryan Birmingham) Date: Fri, 3 Mar 2017 09:37:09 -0500 Subject: [Python-ideas] Smoothing transition: 'unicode' and 'basestring' as aliases for 'str'? In-Reply-To: <31701b30-03a5-3a8d-32e2-2dd59a26c09c@thomas-guettler.de> References: <31701b30-03a5-3a8d-32e2-2dd59a26c09c@thomas-guettler.de> Message-ID: The thread is here in the archive (https://mail.python.org/ pipermail/python-ideas/2016-June/040761.html) if anyone's wondering context, as I was. In short, someone wanted an alias from string to basestring. This is addressed in the "What's new in Python 3.0" ( https://docs.python.org/3/whatsnew/3.0.html) page: > > - The built-in basestring abstract type was removed. Use str > instead. The str > and bytes > types don?t > have functionality enough in common to warrant a shared base class. The > 2to3 tool (see below) replaces every occurrence of basestring with str > . > > Personally, I have no issue with leaving an alias like this in 2to3, since adding it to the language feels more like forced backwards compatibility to me. That said, there are more related subtleties on the "What's new in Python 3.0" page, some of which seem less intuitive, so I understand where a desire like this would come from. Would more specific and succinct documentation on this change alone help? -Ryan Birmingham On 3 March 2017 at 06:44, Thomas G?ttler wrote: > I found this in an old post: > > > Maybe too late now but there should have been 'unicode', > > 'basestring' as aliases for 'str'. > > I guess it is too late to think about it again ... > > Regards, > Thomas G?ttler > > > -- > Thomas Guettler http://www.thomas-guettler.de/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Mar 3 10:24:00 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 3 Mar 2017 10:24:00 -0500 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: <0c47b0a5-9db7-071a-290e-a7971571d3f9@gmail.com> TBH I think that optional parameters isn't a problem requiring new syntax. We probably do need syntax for positional-only arguments (since we already have them in a way), but optional parameters can be solved easily without a new syntax. Syntax like: 1. def a(?foo), 2. def a(foo=pass), 3. def a([foo]), will complicate the language too much IMO. Yury On 2017-03-03 9:29 AM, Victor Stinner wrote: > Since yet another sentinel singleton sounds like a dead end, I suggest > to use [arg=value] syntax and require a default value in the > prototype, as currently required for *optional keyword* arguments. > > "[...]" syntax for optional parameter is commonly used in Python > documentation (but the exact syntax for multiple optional arguments is > different than what I propose, see below). I already saw this syntax > for optional parameters in C and PHP languages at least. > > Python prototype of the standard library and their generated signature: > > def bool([x=False]) > => bool([x]) > > def bytearray([source=None], [encoding=None], errors="strict") > => bytearray([source], [encoding], [errors]) > > # in practice, default value of 'default' parameter (and maybe also 'key'?) > # will more likely be a custom sentinel > def sum(iterable, *args, [key=None], [default=None]) > => sum(iterable, *args, [key], [default]) > > # "/" is an hypothetical marker for positional-only arguments > def str.replace(old, new, [count=-1], /) > => str.replace(old, new, [count], /) > > def pickle.dump(obj, file, [protocol=3], *, fix_imports=True) > => pickle.dump(obj, file, [protocol], *, fix_imports=True) > > An alternative for generated signature of multiple optional arguments > is "bytearray([source[, encoding[, errors]]])", but I'm not a big fan > of nested [...], IMHO it's harder to read. And I like the idea of > having a signature closer to the actual Python code. > > > Invalid syntaxes raising SyntaxError: > > * no default value: "def func([x]): ..." > * optional arguments before *args: "def func(arg, [opt=None], *args):" > > In practice, calling a function without an optional argument or pass > the (private?) sentinel as the optional argument should have the same > behaviour. Each module is free to decide how the sentinel is exposed > or not. For example, the inspect module has 2 sentinels: _empty is > exposed as Signature.empty and Parameter.empty, whereas _void is > private. > > If I understood correctly, Argument Clinic already supports optional > positional arguments, and so doesn't need to be changed. > > > I'm not sure that it's useful for optional keyword-only arguments: > > def func(*, [arg=None]) > => func(*, [arg]) > > The only difference with optional keyword-only arguments with a > default value is the signature: > > def func(*, arg=None) > => func(*, arg=None) > > See also the discussion on converting the bisect functions to Argument > Clinic and issues with the generated signature: > http://bugs.python.org/issue28754 > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From jsbueno at python.org.br Fri Mar 3 10:43:16 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 3 Mar 2017 12:43:16 -0300 Subject: [Python-ideas] Smoothing transition: 'unicode' and 'basestring' as aliases for 'str'? In-Reply-To: References: <31701b30-03a5-3a8d-32e2-2dd59a26c09c@thomas-guettler.de> Message-ID: I see no reason to introduce clutter like this at this point in time - code needing to run in both Py 2 nd 3, if not using something like "six" could do: compat.py try: unicode except NameError: unicode = basestring = str elsewhere: from compat import unicode, basestring Or rather: try: unicode else: str = basestring = unicode and from compat import str # therefore having Python3 valid and clear code from here. On 3 March 2017 at 11:37, Ryan Birmingham wrote: > The thread is here in the archive > (https://mail.python.org/pipermail/python-ideas/2016-June/040761.html) if > anyone's wondering context, as I was. > > In short, someone wanted an alias from string to basestring. > This is addressed in the "What's new in Python 3.0" > (https://docs.python.org/3/whatsnew/3.0.html) page: >> >> The built-in basestring abstract type was removed. Use str instead. The >> strand bytes types don?t have functionality enough in common to warrant a >> shared base class. The 2to3 tool (see below) replaces every occurrence of >> basestring with str. > > Personally, I have no issue with leaving an alias like this in 2to3, since > adding it to the language feels more like forced backwards compatibility to > me. > > That said, there are more related subtleties on the "What's new in Python > 3.0" page, some of which seem less intuitive, so I understand where a desire > like this would come from. Would more specific and succinct documentation on > this change alone help? > > -Ryan Birmingham > > On 3 March 2017 at 06:44, Thomas G?ttler > wrote: >> >> I found this in an old post: >> >> > Maybe too late now but there should have been 'unicode', >> > 'basestring' as aliases for 'str'. >> >> I guess it is too late to think about it again ... >> >> Regards, >> Thomas G?ttler >> >> >> -- >> Thomas Guettler http://www.thomas-guettler.de/ >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From edk141 at gmail.com Fri Mar 3 11:09:25 2017 From: edk141 at gmail.com (Ed Kellett) Date: Fri, 03 Mar 2017 16:09:25 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> Message-ID: On Tue, 28 Feb 2017 at 17:19 David Mertz wrote: > On Tue, Feb 28, 2017 at 7:16 AM, Michel Desmoulin < > desmoulinmichel at gmail.com> wrote: > > Le 28/02/2017 ? 15:45, Steven D'Aprano a ?crit : > > No you don't. You can use slicing. > > alist = [1, 2, 3] > > print(alist[99:100]) # get the item at position 99 > > No this gives you a list of one item or an empty list. > > dict.get('key', default_value) let you get a SCALAR value, OR a default > value if it doesn't exist. > > > x = (alist[pos:pos+1] or [default_val])[0] > > > How so ? "get the element x or a default value if it doesn't exist" seem > at the contrary, a very robust approach. > > > Yes, and easily written as above. What significant advantage would it > have to spell the above as: > I think code like that is convoluted and confusing and I'm surprised to see anyone at all advocating it. IMO, the sane thing to compare this with is a conditional expression. There aren't any spellings of that that aren't ugly either: >>> stuff[x] if len(stuff) > x else default >>> stuff[x] if stuff[x:x+1] else default As for a reasonable use of list.get (or tuple.get), I often end up with lists of arguments and would like to take values from the list if they exist or take a default if not. This looks particularly horrible if the index isn't a variable (so most of the time): something = args[1] if len(args) > 1 else "cheese" something_else = args[2] if len(args) > 2 else "eggs" (you could make it more horrible by using the slicing trick, but I don't see much point in demonstrating that.) I don't often want to use dicts and lists in the same code in this way, but I think the crucial point about the comparison with dicts is that code like this is simpler and clearer if you do something horrible like this, just to get .get(): >>> argdict = dict(enumerate(args)) Ed P.S. all the talk of PEP 463 seems misplaced. That it solves (FSVO solve) this problem doesn't mean it should supersede this discussion. Personally, I don't think I'd use except-expressions, but I would use list.get. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Fri Mar 3 11:21:17 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Fri, 3 Mar 2017 08:21:17 -0800 Subject: [Python-ideas] For/in/as syntax In-Reply-To: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> Message-ID: Hi Brice, On Fri, Mar 3, 2017 at 1:14 AM, Brice PARENT wrote: > > A word about compatibility and understandability before: > "as" is already a keyword, so it is already reserved and easy to parse. It > couldn't be taken for its other uses (context managers, import statements > and > exceptions) as they all start with a specific and different keyword. It > would > have a really similar meaning, so be easy to understand. It doesn't imply > any > incompatibility with previous code, as current syntax would of course still > be > supported, and it doesn't change anything anywhere else (no change in > indentation levels for example, no deprecation). That still would make it usable only on Python 3.7+ It seem to me you ca already do that now with helper object/function: In [1]: class Loop: ...: ...: def __init__(self, iterable): ...: self._it = iter(iterable) ...: self._break = False ...: ...: def __next__(self): ...: if self._break: ...: raise StopIteration ...: return (self, next(self._it)) ...: ...: def __iter__(self): ...: return self ...: ...: def brk(self): ...: self._break = True ...: ...: In [2]: for loop,i in Loop(range(10)): ...: if i==5: ...: loop.brk() ...: print(i) 1 2 3 4 5 > > Syntax: > for element in elements as elements_loop: > assert type(elements_loop) is ForLoopIterationObject > > Properties and methods (non exhaustive list, really open to discussion and > suggestions): > for element in elements as elements_loop: > elements_loop.length: int # len ? count ? > elements_loop.counter: int # Add a counter0, as in Django? a counter1 ? > elements_loop.completed: bool > elements_loop.break() > elements_loop.continue() > elements_loop.skip(count=1) I did not implement these, but they are as feasible. (break is keyword though, so you need to rename it) > > Examples of every presented element (I didn't execute the code, it's just to > explain the proposal): > > ################################################## > # forloop.counter and forloop.length > > for num, dog in enumerate(dogs): > print(num, dog) > > print("That's a total of {} dogs !".format(len(dogs))) > > # would be equivalent to > > for dog in dogs as dogs_loop: > print(dogs_loop.counter, dog) I find it less readable than enumerate. > > print("That's a total of {} dogs !".format(dogs_loop.length)) > > # -> cleaner, and probably faster as it won't have to call len() or > enumerate() > #(but I could easily be wrong on that) Though I can see how accumulating the length while iterating may make sens. > > > ################################################## > # forloop.length when we broke out of the list > > small_and_medium_dogs_count = 0 > for dog in generate_dogs_list_by_size(): > if dog.size >= medium_dog_size: > break > > small_and_medium_dogs_count += 1 > > print("That's a total of {} small to medium dogs !".format( > small_and_medium_dogs_count)) > > # would be equivalent to > > for dog in generate_dogs_list_by_size() as dogs_loop: > if dog.size >= medium_dog_size: > break > > print("That's a total of {} small to medium dogs !".format( > dogs_loop.length - 1)) > > # -> easier to read, less temporary variables > > ################################################## > # forloop.break(), to break out of nested loops (or explicitly out of > current > #loop) - a little like pep-3136's first proposal > > has_dog_named_rex = False > for owner in owners: > for dog in dogs: > if dog.name == "Rex": > has_dog_named_rex = True > break > > if has_dog_named_rex: > break > > # would be equivalent to > > for owner in owners as owners_loop: > for dog in dogs: # syntax without "as" is off course still supported > if dog.name == "Rex": > owners_loop.break() See my above proposal, you would still need to break the inner loop here as well. So i'm guessing you miss a `break` afrer owner_loop.break() ? > > # -> way easier to read and understand, less temporary variables > > ################################################## > # forloop.continue(), to call "continue" from any nested loops (or > explicitly > #in active loop) > > has_dog_named_rex = False > for owner in owners: > for dog in owner.dogs: > if dog.name == "Rex": > has_dog_named_rex = True > break > > if has_dog_named_rex: > continue > > # do something > > # would be equivalent to > > for owner in owners as owners_loop: > for dog in owner.dogs: > if dog.name == "Rex": > owners_loop.continue() you are missing a break as well here you need to break out of owner.dogs, but fair. I think though that having to `.continue()` the outer loop before breaking the inner (you have to right ?) can make things complicated. > > # do something > > # -> way easier to read and understand, less temporary variables > > ################################################## > # forloop.completed, to know if the list was entirely consumed or "break" > was > #called (or exception caught) - might help with the for/except/elseproposal > > broken_out = False > for dog in dogs: > if dog.name == "Rex": > broken_out = True > break > > if broken_out: > print("We didn't consume all the dogs list") > > # would be equivalent to > > for dog in dogs as dogs_loop: > if dog.name == "Rex": > break > > if not dogs_loop.completed: > print("We didn't consume all the dogs list") > > # -> less temporary variables, every line of code is relevant, easy to > #understand > > ################################################## > # forloop.skip > # In the example, we want to skip 2 elements starting from item #2 > > skip = 0 > for num, dog in enumerate(dogs): > if skip: > skip -= 1 > continue > > if num == 2: > skip = 2 > > # would be equivalent to > > for dog in dogs as dogs_loop: > if dogs_loop.counter == 2: > dogs_loop.skip(2) > > # -> way easier to read and understand, less temporary variables > > # Notes : > # - Does a call to forloop.skip() implies a forloop.continue() call or > does > # the code continue its execution until the end of the loop, which will > # then be skipped? Implying the .continue() call seems less ambiguous > to > # me. Or the method should be called skip_next_iteration, or something > # like that. > # - Does a call to forloop.skip(2) adds 2 to forloop.length or not? > # -> kwargs may be added to allow both behaviours for both questions. > > # We could allow the argument to be a function that accepts a single > argument > #and return a boolean, like > dogs_loop.skip(lambda k: k % 3 == 0) # Execute the code on multiples of 3 > only > > > ################################################## > > Thoughts : > - It would allow to pass forloop.break and forloop.continue as callback to > other functions. Not sure yet if it's a good or a bad thing (readability > against what it could offer). > - I haven't yet used much the asynchronous functionalities, so I couldn't > yet > think about the implications of such a new syntax to this (and what about > a > lazy keyword in here?) > - I suppose it's a huge work to create such a syntax. And I have no idea how > complicated it could be to create methods (like break() and continue()) > doing > what keywords were doing until now. > - I'm not sure if that would make sense in list comprehensions, despite > being > confusing. > - Would enable to support callback events like forloop.on_break. But would > there be a need for that? > - Would this have a major impact on the loops execution times? > - would a "while condition as condition_loop:" be of any use too? > I think most of what you ask can be achieve by a Generator with some custom function, and does not require new syntax. It's good to be able to do that as an external library then you can experiment evolve it to get real world usage. And you can use it now ! Cheers, -- M From matt at getpattern.com Fri Mar 3 11:32:34 2017 From: matt at getpattern.com (Matt Gilson) Date: Fri, 3 Mar 2017 08:32:34 -0800 Subject: [Python-ideas] For/in/as syntax In-Reply-To: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> Message-ID: Thanks for the idea and prior research. I'm not convinced that this warrants new syntax. Most of what you propose (skipping, counting, exposing a length if available, tracking if completed) could be solved already by creating your own wrapper around an iterable: elements_loop = ForLoopIterationObject(elements) for element in elements_loop: ... Some of your proposal simply can't be done perfectly (e.g. length) since iterables can have infinite length. There is `__length_hint__` IIRC which might (or might not) give a rough estimate. You also couldn't do `elements_loop.continue()` or `elements_loop.break()` in a python-based wrapper without special interpreter magic (at least now as far as I can imagine...). I realize that is the point of this proposal as it would allow breaking/continuing nested loops, however, that proposal was already rejected because Guido didn't think that the use-cases were compelling enough to add new syntax/complexity to the python language and most of the time you can address that case by adding a function here or there. In this proposal, you're adding significantly _more_ complexity than was proposed in PEP-3136. Also, you're creating an object for every loop -- and that object needs to do stuff when iterating (e.g. increment a counter). It's very unlikely that this could be anything but a performance drain compared to existing solutions (only use `enumerate` when you need it). I suppose that the interpreter could be smart enough to only create a ForLoopIterationObject when it actually needs it (e.g. when there is an `as` statement), but that shifts a lot more complexity onto the language implementors -- again for perhaps little benefit. On Fri, Mar 3, 2017 at 1:14 AM, Brice PARENT wrote: > Object: Creation of a ForLoopIterationObject object (name open to > suggestions) > using a "For/in/as" syntax. > > Disclaimer: > I've read PEP-3136 which was refused. The present proposal includes a > solution > to what it tried to solve (multi-level breaking out of loops), but is a > more > general proposal and it offers many unrelated advantages and > simplifications. > It also gives another clean way of making a kind of for/except/else syntax > of > another active conversation. > > What it tries to solve: > During the iteration process, we often need to create temporary variables > (like > counters or states) or tests that don't correspond to the logic of the > algo, > but are required to make it work. This proposal aims at removing these > unnecessary and hard to understand parts and help write simpler, cleaner, > more > maintainable and more logical code. > So, more PEP-8 compliant code. > > How does it solve it: > By instanciating a new object when starting a loop, which contains some > data > about the iteration process and some methods to act on it. The list of > available properties and methods are yet to be defined, but here are a > subset > of what could be really helpful. > > A word about compatibility and understandability before: > "as" is already a keyword, so it is already reserved and easy to parse. It > couldn't be taken for its other uses (context managers, import statements > and > exceptions) as they all start with a specific and different keyword. It > would > have a really similar meaning, so be easy to understand. It doesn't imply > any > incompatibility with previous code, as current syntax would of course > still be > supported, and it doesn't change anything anywhere else (no change in > indentation levels for example, no deprecation). > > Syntax: > for element in elements as elements_loop: > assert type(elements_loop) is ForLoopIterationObject > > Properties and methods (non exhaustive list, really open to discussion and > suggestions): > for element in elements as elements_loop: > elements_loop.length: int # len ? count ? > elements_loop.counter: int # Add a counter0, as in Django? a counter1 > ? > elements_loop.completed: bool > elements_loop.break() > elements_loop.continue() > elements_loop.skip(count=1) > > Examples of every presented element (I didn't execute the code, it's just > to > explain the proposal): > > ################################################## > # forloop.counter and forloop.length > > for num, dog in enumerate(dogs): > print(num, dog) > > print("That's a total of {} dogs !".format(len(dogs))) > > # would be equivalent to > > for dog in dogs as dogs_loop: > print(dogs_loop.counter, dog) > > print("That's a total of {} dogs !".format(dogs_loop.length)) > > # -> cleaner, and probably faster as it won't have to call len() or > enumerate() > #(but I could easily be wrong on that) > > > ################################################## > # forloop.length when we broke out of the list > > small_and_medium_dogs_count = 0 > for dog in generate_dogs_list_by_size(): > if dog.size >= medium_dog_size: > break > > small_and_medium_dogs_count += 1 > > print("That's a total of {} small to medium dogs !".format( > small_and_medium_dogs_count)) > > # would be equivalent to > > for dog in generate_dogs_list_by_size() as dogs_loop: > if dog.size >= medium_dog_size: > break > > print("That's a total of {} small to medium dogs !".format( > dogs_loop.length - 1)) > > # -> easier to read, less temporary variables > > ################################################## > # forloop.break(), to break out of nested loops (or explicitly out of > current > #loop) - a little like pep-3136's first proposal > > has_dog_named_rex = False > for owner in owners: > for dog in dogs: > if dog.name == "Rex": > has_dog_named_rex = True > break > > if has_dog_named_rex: > break > > # would be equivalent to > > for owner in owners as owners_loop: > for dog in dogs: # syntax without "as" is off course still supported > if dog.name == "Rex": > owners_loop.break() > > # -> way easier to read and understand, less temporary variables > > ################################################## > # forloop.continue(), to call "continue" from any nested loops (or > explicitly > #in active loop) > > has_dog_named_rex = False > for owner in owners: > for dog in owner.dogs: > if dog.name == "Rex": > has_dog_named_rex = True > break > > if has_dog_named_rex: > continue > > # do something > > # would be equivalent to > > for owner in owners as owners_loop: > for dog in owner.dogs: > if dog.name == "Rex": > owners_loop.continue() > > # do something > > # -> way easier to read and understand, less temporary variables > > ################################################## > # forloop.completed, to know if the list was entirely consumed or "break" > was > #called (or exception caught) - might help with the for/except/elseproposal > > broken_out = False > for dog in dogs: > if dog.name == "Rex": > broken_out = True > break > > if broken_out: > print("We didn't consume all the dogs list") > > # would be equivalent to > > for dog in dogs as dogs_loop: > if dog.name == "Rex": > break > > if not dogs_loop.completed: > print("We didn't consume all the dogs list") > > # -> less temporary variables, every line of code is relevant, easy to > #understand > > ################################################## > # forloop.skip > # In the example, we want to skip 2 elements starting from item #2 > > skip = 0 > for num, dog in enumerate(dogs): > if skip: > skip -= 1 > continue > > if num == 2: > skip = 2 > > # would be equivalent to > > for dog in dogs as dogs_loop: > if dogs_loop.counter == 2: > dogs_loop.skip(2) > > # -> way easier to read and understand, less temporary variables > > # Notes : > # - Does a call to forloop.skip() implies a forloop.continue() call or > does > # the code continue its execution until the end of the loop, which > will > # then be skipped? Implying the .continue() call seems less ambiguous > to > # me. Or the method should be called skip_next_iteration, or something > # like that. > # - Does a call to forloop.skip(2) adds 2 to forloop.length or not? > # -> kwargs may be added to allow both behaviours for both questions. > > # We could allow the argument to be a function that accepts a single > argument > #and return a boolean, like > dogs_loop.skip(lambda k: k % 3 == 0) # Execute the code on multiples of 3 > only > > > ################################################## > > Thoughts : > - It would allow to pass forloop.break and forloop.continue as callback to > other functions. Not sure yet if it's a good or a bad thing (readability > against what it could offer). > - I haven't yet used much the asynchronous functionalities, so I couldn't > yet > think about the implications of such a new syntax to this (and what > about a > lazy keyword in here?) > - I suppose it's a huge work to create such a syntax. And I have no idea > how > complicated it could be to create methods (like break() and continue()) > doing > what keywords were doing until now. > - I'm not sure if that would make sense in list comprehensions, despite > being > confusing. > - Would enable to support callback events like forloop.on_break. But would > there be a need for that? > - Would this have a major impact on the loops execution times? > - would a "while condition as condition_loop:" be of any use too? > > Sorry for the very long message, I hope it will get your interest. And I > also > hope my English was clear enough. > > Brice Parent > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- [image: pattern-sig.png] Matt Gilson // SOFTWARE ENGINEER E: matt at getpattern.com // P: 603.892.7736 We?re looking for beta testers. Go here to sign up! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Mar 3 11:51:12 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 08:51:12 -0800 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: <58B99F00.6010504@stoneleaf.us> On 03/03/2017 06:29 AM, Victor Stinner wrote: > An alternative for generated signature of multiple optional arguments > is "bytearray([source[, encoding[, errors]]])", but I'm not a big fan > of nested [...], But that's not the same thing. bytearry([source,] [encoding,] [errors]) says that each argument can be passed without passing any others. bytearray([source [, encoding [,errors]]]) says that in order to pass encoding, source must also be specified. At least, that's what it says to me. -- ~Ethan~ From ethan at stoneleaf.us Fri Mar 3 12:02:46 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 09:02:46 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> Message-ID: <58B9A1B6.80501@stoneleaf.us> On 03/03/2017 08:09 AM, Ed Kellett wrote: > P.S. all the talk of PEP 463 seems misplaced. That it solves (FSVO solve) this problem doesn't mean it should supersede > this discussion. The advantage of PEP 463 is that issues like this would be less pressing, and it's much more general purpose. Personally, I don't think `get` belongs on list/tuple for reasons already stated. -- ~Ethan~ From ethan at stoneleaf.us Fri Mar 3 12:06:32 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 09:06:32 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <817ddde2-0e7a-31e8-6c15-1c5d626c6c14@mail.de> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <58B6229D.10203@stoneleaf.us> <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> <58B65D6A.8090806@stoneleaf.us> <817ddde2-0e7a-31e8-6c15-1c5d626c6c14@mail.de> Message-ID: <58B9A298.20106@stoneleaf.us> On 03/02/2017 12:36 PM, Sven R. Kunze wrote: > On 01.03.2017 06:34, Ethan Furman wrote: >> On the bright side, if enough use-cases of this type come up (pesky try/except for a simple situation), we may be able >> to get Guido to reconsider PEP 463. I certainly think PEP 463 makes a lot more sense that adding list.get(). > > It then would make sense to remove .get() on dicts. ;-) > > and to remove parameter "default" of max(). > and to remove parameter "default" of getattr(). Backwards compatibility, and performance, says no. ;) try/except expressions are not a silver bullet any more than try/except blocks. But they can still be very useful. -- ~Ethan~ From ethan at stoneleaf.us Fri Mar 3 12:15:47 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 09:15:47 -0800 Subject: [Python-ideas] For/in/as syntax In-Reply-To: References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> Message-ID: <58B9A4C3.8030103@stoneleaf.us> On 03/03/2017 08:21 AM, Matthias Bussonnier wrote: >> ################################################## >> # forloop.break(), to break out of nested loops (or explicitly out of >> current >> #loop) - a little like pep-3136's first proposal >> >> has_dog_named_rex = False >> for owner in owners: >> for dog in dogs: >> if dog.name == "Rex": >> has_dog_named_rex = True >> break >> >> if has_dog_named_rex: >> break >> >> # would be equivalent to >> >> for owner in owners as owners_loop: >> for dog in dogs: # syntax without "as" is off course still supported >> if dog.name == "Rex": >> owners_loop.break() > > See my above proposal, you would still need to break the inner loop > here as well. So i'm guessing you miss a `break` afrer owner_loop.break() ? No, he's not -- part of implementing this change includes not needing to specify the inner breaks. -- ~Ethan~ From ethan at stoneleaf.us Fri Mar 3 12:18:58 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 09:18:58 -0800 Subject: [Python-ideas] For/in/as syntax In-Reply-To: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> Message-ID: <58B9A582.8000308@stoneleaf.us> On 03/03/2017 01:14 AM, Brice PARENT wrote: > Sorry for the very long message, I hope it will get your interest. And I also > hope my English was clear enough. Long messages that explain the idea are welcome! I think it looks interesting. -- ~Ethan~ From contact at brice.xyz Fri Mar 3 13:00:54 2017 From: contact at brice.xyz (Brice PARENT) Date: Fri, 3 Mar 2017 19:00:54 +0100 Subject: [Python-ideas] For/in/as syntax In-Reply-To: References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> Message-ID: <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> Thanks Matthias for taking the time to give your opinion about it. Just to set the focus where I may have failed to point it: the main purpose of this proposal is the creation of the object itself, an object representing the loop. What we can do with it is still a sub-level of this proposal, as the presence of this object is what allows all these simplifications, and offers a lot of possibilities. Le 03/03/17 ? 17:21, Matthias Bussonnier a ?crit : > >> A word about compatibility and understandability before: >> "as" is already a keyword, so it is already reserved and easy to parse. It >> couldn't be taken for its other uses (context managers, import statements >> and >> exceptions) as they all start with a specific and different keyword. It >> would >> have a really similar meaning, so be easy to understand. It doesn't imply >> any >> incompatibility with previous code, as current syntax would of course still >> be >> supported, and it doesn't change anything anywhere else (no change in >> indentation levels for example, no deprecation). > That still would make it usable only on Python 3.7+ It seem to me you ca already > do that now with helper object/function: Yes, what I meant is that it wouldn't break any code from previous versions. But as with any new keyword or syntax evolution, newer code wouldn't get well in older versions. > In [1]: class Loop: > ...: > ...: def __init__(self, iterable): > ...: self._it = iter(iterable) > ...: self._break = False > ...: > ...: def __next__(self): > ...: if self._break: > ...: raise StopIteration > ...: return (self, next(self._it)) > ...: > ...: def __iter__(self): > ...: return self > ...: > ...: def brk(self): > ...: self._break = True > ...: > ...: > > In [2]: for loop,i in Loop(range(10)): > ...: if i==5: > ...: loop.brk() > ...: print(i) > 1 > 2 > 3 > 4 > 5 A 2 level breaking would have to be used this way : for outerloop, i in Loop(range(4)): for innerloop, j in Loop(range(3)): if i==2 and j==1: outerloop.brk() break # this print(i, j) if outerloop.broken: break # or continue. For the code following this line not to be executed break() and continue() methods (which I named this way to reflect the behaviour of the statements, but I'm not sure whether it could or should be kept this way) are only an improvement on the keywords in nested loops or complex situations where explicitly exiting a certain loop helps readability by removing a bunch of conditions and assignments. It's also allowing to pass this function as a callback to something else, but I'm not sure it would be that useful. > >> Syntax: >> for element in elements as elements_loop: >> assert type(elements_loop) is ForLoopIterationObject >> >> Properties and methods (non exhaustive list, really open to discussion and >> suggestions): >> for element in elements as elements_loop: >> elements_loop.length: int # len ? count ? >> elements_loop.counter: int # Add a counter0, as in Django? a counter1 ? >> elements_loop.completed: bool >> elements_loop.break() >> elements_loop.continue() >> elements_loop.skip(count=1) > I did not implement these, but they are as feasible. (break is keyword > though, so you need to rename it) Length, counter and completed (and .skip() in some ways) are just helping the readability while allowing a more concise code, as they don't force to create temporary variables. But they belong here as a part of the whole thing. If the object had to be integrated, for .break(), .continue(), .skip() and any other useful method, those little improvements would be nice to have and help to keep your code look more like the design in your head. .break() and .continue() help as they modify the flow. They roughly do this : LOOP1 LOOP2 if something: LOOP1.break() # similar to calling break repeatedly once for every loop until LOOP1 included, so going straight to "something_after_every_loop" something_else yet_another_thing something_after_every_loop The same goes for .continue(), calling break in every inner loop, and continue in the one the method is called from. > >> Examples of every presented element (I didn't execute the code, it's just to >> explain the proposal): >> >> ################################################## >> # forloop.counter and forloop.length >> >> for num, dog in enumerate(dogs): >> print(num, dog) >> >> print("That's a total of {} dogs !".format(len(dogs))) >> >> # would be equivalent to >> >> for dog in dogs as dogs_loop: >> print(dogs_loop.counter, dog) > I find it less readable than enumerate. Well, I find both readable, but the second only requires to focus on the loop itself, not on how enumerate and unpacking work. The second syntax however allows the exact same syntax when you work with dicts, lists or tuples. Here, dogs may be of any type of iterable, while in first version, you'd need to implement a counter to have a unified loop syntax (I'm not saying we should always be able to replace those types, just that in that case it makes sense to me). Brice From edk141 at gmail.com Fri Mar 3 13:29:19 2017 From: edk141 at gmail.com (Ed Kellett) Date: Fri, 03 Mar 2017 18:29:19 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <58B9A1B6.80501@stoneleaf.us> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On Fri, 3 Mar 2017 at 17:03 Ethan Furman wrote: > On 03/03/2017 08:09 AM, Ed Kellett wrote: > > > P.S. all the talk of PEP 463 seems misplaced. That it solves (FSVO > solve) this problem doesn't mean it should supersede > > this discussion. > > The advantage of PEP 463 is that issues like this would be less pressing, > and it's much more general purpose. > PEP 463 won't solve this problem for me because its solution is as ugly as the thing it's replacing. Conceptually, I'd even argue that it's uglier. Also, if you want a general-purpose solution to everything, propose with-expressions, but that's another discussion. The existence of general-purpose things doesn't mean specific issues aren't worth talking about. > Personally, I don't think `get` belongs on list/tuple for reasons already > stated. > The reasons already stated boil down to "lists aren't dicts so they shouldn't share methods", which seems ill-advised at best, and "I wouldn't use this". I'm not convinced that the latter is generally true; I've often looked for something like a list.get, been frustrated, and used one (chosen pretty much at random) of the ugly hacks presented in this thread. I'd be surprised if I'm the only one. I guess I don't have any hope of convincing people who think there's no need to ever do this, but I have a couple of questions for the people who think the existing solutions are fine: - Which of the existing things (slice + [default], conditional on a slice, conditional on a len() call) do you think is the obvious way to do it? - Are there any examples where list.get would be applicable and not the obviously best way to do it? Ed -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Mar 3 13:38:46 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 4 Mar 2017 05:38:46 +1100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On Sat, Mar 4, 2017 at 5:29 AM, Ed Kellett wrote: > I've often looked for something like a list.get, been frustrated, and used > one (chosen pretty much at random) of the ugly hacks presented in this > thread. I'd be surprised if I'm the only one. Can you show us some real-world code that would benefit from list.get()? That would make the discussion more productive, I think. ChrisA From srkunze at mail.de Fri Mar 3 13:48:15 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 3 Mar 2017 19:48:15 +0100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <58B9A298.20106@stoneleaf.us> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <58B6229D.10203@stoneleaf.us> <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> <58B65D6A.8090806@stoneleaf.us> <817ddde2-0e7a-31e8-6c15-1c5d626c6c14@mail.de> <58B9A298.20106@stoneleaf.us> Message-ID: On 03.03.2017 18:06, Ethan Furman wrote: > On 03/02/2017 12:36 PM, Sven R. Kunze wrote: >> On 01.03.2017 06:34, Ethan Furman wrote: > >>> On the bright side, if enough use-cases of this type come up (pesky >>> try/except for a simple situation), we may be able >>> to get Guido to reconsider PEP 463. I certainly think PEP 463 makes >>> a lot more sense that adding list.get(). >> >> It then would make sense to remove .get() on dicts. ;-) >> >> and to remove parameter "default" of max(). >> and to remove parameter "default" of getattr(). > > Backwards compatibility, and performance, says no. ;) > > try/except expressions are not a silver bullet any more than > try/except blocks. But they can still be very useful. Totally true. I think both proposals have their merit. IIRC, Guido rightfully declared that try/except expressions aren't a good idea. It's better to find more concrete patterns instead of it. And I still agree with him. The "default parameter" pattern is such a pattern, and it's vastly used in the stdlib. Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Fri Mar 3 13:50:19 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Fri, 3 Mar 2017 10:50:19 -0800 Subject: [Python-ideas] For/in/as syntax In-Reply-To: <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> Message-ID: Hi Brice, On Fri, Mar 3, 2017 at 10:00 AM, Brice PARENT wrote: > Thanks Matthias for taking the time to give your opinion about it. > > Just to set the focus where I may have failed to point it: > the main purpose of this proposal is the creation of the object itself, an > object representing the loop. What we can do with it is still a sub-level of > this proposal, as the presence of this object is what allows all these > simplifications, and offers a lot of possibilities. > A 2 level breaking would have to be used this way : > > for outerloop, i in Loop(range(4)): > for innerloop, j in Loop(range(3)): > if i==2 and j==1: > outerloop.brk() > break # this Thanks, I think it does make sens, I'm going to guess, outerloop.brk(inners=True) might also be helpful if you have more inners loops. I think that implicitely breaking inner ones might not always be the right thing to do so having a way to not break inner ones does make sens. The other possibility would be to allow the Loop object to catch raised exceptions and potentially continue running loops. Then you could "just" use raise to break multiple loops, but that might be a weird Loop/ContextManager hybrid. -- M From rosuav at gmail.com Fri Mar 3 13:52:39 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 4 Mar 2017 05:52:39 +1100 Subject: [Python-ideas] For/in/as syntax In-Reply-To: References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> Message-ID: On Sat, Mar 4, 2017 at 5:50 AM, Matthias Bussonnier wrote: > Thanks, I think it does make sens, I'm going to guess, > outerloop.brk(inners=True) might also be helpful if you have more > inners loops. I think that implicitely breaking inner ones might > not always be the right thing to do so having a way to not break > inner ones does make sens. > *scratches head* How do you break an outer loop without breaking the inner loop? What happens? ChrisA From srkunze at mail.de Fri Mar 3 14:01:32 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 3 Mar 2017 20:01:32 +0100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On 03.03.2017 19:29, Ed Kellett wrote: > The reasons already stated boil down to "lists aren't dicts so they > shouldn't share methods", which seems ill-advised at best, and "I > wouldn't use this". I wonder if those arguing against it also think dicts should not have item access: a[0] dict or list? Why should it matter? a.get(0, 'oops') Doesn't look so different to me. > I'm not convinced that the latter is generally true; I've often looked > for something like a list.get, been frustrated, and used one (chosen > pretty much at random) of the ugly hacks presented in this thread. I'd > be surprised if I'm the only one. You are not the only one. I share your sentiment. > I guess I don't have any hope of convincing people who think there's > no need to ever do this, but I have a couple of questions for the > people who think the existing solutions are fine: > > - Which of the existing things (slice + [default], conditional on a > slice, conditional on a len() call) do you think is the obvious way to > do it? None of them are. Try/except is the most obvious way. But it's tedious. > - Are there any examples where list.get would be applicable and not > the obviously best way to do it? I don't think so. I already have given many examples/ideas of when I would love to have had this ability. Let me re-state those and more: - refactoring (dicts <-> lists and their comprehension counterparts) - error-free accessing list comprehensions - duck typing - increased consistency of item access between dicts and lists - the one obvious way to do it - easier to teach Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From bussonniermatthias at gmail.com Fri Mar 3 13:56:46 2017 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Fri, 3 Mar 2017 10:56:46 -0800 Subject: [Python-ideas] For/in/as syntax In-Reply-To: References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> Message-ID: > *scratches head* How do you break an outer loop without breaking the > inner loop? What happens? Finish running the inner, then breaking the outer. Instead of breaking inner and outer. for i in outer: bk = False for j in inner: if cond: bk = True if bk: break vs for i in outer: bk = False for j in inner: if cond: bk = True break # this. if bk: break Sorry if I'm mis expressing myself. -- M On Fri, Mar 3, 2017 at 10:52 AM, Chris Angelico wrote: > On Sat, Mar 4, 2017 at 5:50 AM, Matthias Bussonnier > wrote: >> Thanks, I think it does make sens, I'm going to guess, >> outerloop.brk(inners=True) might also be helpful if you have more >> inners loops. I think that implicitely breaking inner ones might >> not always be the right thing to do so having a way to not break >> inner ones does make sens. >> > > *scratches head* How do you break an outer loop without breaking the > inner loop? What happens? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Fri Mar 3 14:07:53 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 4 Mar 2017 06:07:53 +1100 Subject: [Python-ideas] For/in/as syntax In-Reply-To: References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> Message-ID: On Sat, Mar 4, 2017 at 5:56 AM, Matthias Bussonnier wrote: >> *scratches head* How do you break an outer loop without breaking the >> inner loop? What happens? > > Finish running the inner, then breaking the outer. Instead of breaking > inner and outer. > > for i in outer: > bk = False > for j in inner: > if cond: > bk = True > if bk: > break > > > vs > > for i in outer: > bk = False > for j in inner: > if cond: > bk = True > break # this. > if bk: > break > > Sorry if I'm mis expressing myself. Oh, I see what you mean. So "breaking the outer loop" really means "flag the outer loop such that, on next iteration, it will terminate". The trouble is that that's not exactly what "break" means. Consider: for i in outer: print("Top of outer") for j in inner: print("Top of inner") if cond1: break if cond2: break_outer(inner=False) if cond3: break_outer(inner=True) print("Bottom of inner") print("Bottom of outer") cond1 would print "bottom of outer" and keep loping. cond2 and cond3 should presumably _not_ print "bottom of outer". Should they print "bottom of inner", though? Seems like an odd use, though. If I want a multi-step break, I want to break every step, not just the last one. ChrisA From srkunze at mail.de Fri Mar 3 14:10:06 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 3 Mar 2017 20:10:06 +0100 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: <0c47b0a5-9db7-071a-290e-a7971571d3f9@gmail.com> References: <0c47b0a5-9db7-071a-290e-a7971571d3f9@gmail.com> Message-ID: <62e70db1-3c1a-237a-73f4-ea3250189fdf@mail.de> On 03.03.2017 16:24, Yury Selivanov wrote: > TBH I think that optional parameters isn't a problem requiring new > syntax. We probably do need syntax for positional-only arguments > (since we already have them in a way), but optional parameters > can be solved easily without a new syntax. > > Syntax like: > > 1. def a(?foo), > 2. def a(foo=pass), > 3. def a([foo]), > > will complicate the language too much IMO. > > Yury I never really encountered a real-world use where I would have needed this kind of parameter declaration ability. It's like the ++ operator of C which comes in pre- and postfix notation. It's really cool to teach the nuances of it. And to create exam questions using it. And to confuse students. But completely unnecessary in real-life code. Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From abrault at mapgears.com Fri Mar 3 14:02:29 2017 From: abrault at mapgears.com (Alexandre Brault) Date: Fri, 3 Mar 2017 14:02:29 -0500 Subject: [Python-ideas] For/in/as syntax In-Reply-To: References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> Message-ID: <387f686d-ae24-e8ed-d884-f899d16b5d6c@mapgears.com> On 2017-03-03 01:52 PM, Chris Angelico wrote: > On Sat, Mar 4, 2017 at 5:50 AM, Matthias Bussonnier > wrote: >> Thanks, I think it does make sens, I'm going to guess, >> outerloop.brk(inners=True) might also be helpful if you have more >> inners loops. I think that implicitely breaking inner ones might >> not always be the right thing to do so having a way to not break >> inner ones does make sens. >> > *scratches head* How do you break an outer loop without breaking the > inner loop? What happens? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ I believe what Matthias is hoping for is an equivalent of Java's named break feature. Breaking out of an outer loop implicitly breaks out of all inner loops Alex From ethan at stoneleaf.us Fri Mar 3 14:15:10 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 11:15:10 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <58B6229D.10203@stoneleaf.us> <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> <58B65D6A.8090806@stoneleaf.us> <817ddde2-0e7a-31e8-6c15-1c5d626c6c14@mail.de> <58B9A298.20106@stoneleaf.us> Message-ID: <58B9C0BE.6060005@stoneleaf.us> On 03/03/2017 10:48 AM, Sven R. Kunze wrote: > On 03.03.2017 18:06, Ethan Furman wrote: >> On 03/02/2017 12:36 PM, Sven R. Kunze wrote: >>> It then would make sense to remove .get() on dicts. ;-) >>> >>> and to remove parameter "default" of max(). >>> and to remove parameter "default" of getattr(). >> >> Backwards compatibility, and performance, says no. ;) >> >> try/except expressions are not a silver bullet any more than try/except blocks. But they can still be very useful. > > Totally true. I think both proposals have their merit. > > IIRC, Guido rightfully declared that try/except expressions aren't a good idea. It's better to find more concrete > patterns instead of it. And I still agree with him. > > The "default parameter" pattern is such a pattern, and it's vastly used in the stdlib. $ grep "def get(" *.py */*.py */*/*.py queue.py: def get(self, block=True, timeout=None): pickle.py: def get(self, i): shelve.py: def get(self, key, default=None): doctest.py: def get(self): mailbox.py: def get(self, key, default=None): weakref.py: def get(self, key, default=None): weakref.py: def get(self, key, default=None): sre_parse.py: def get(self): webbrowser.py: def get(using=None): tkinter/ttk.py: def get(self, x=None, y=None): configparser.py: def get(self, section, option, *, raw=False, vars=None, fallback=_UNSET): configparser.py: def get(self, option, fallback=None, *, raw=False, vars=None, _impl=None, **kwargs): email/message.py: def get(self, name, failobj=None): asyncio/queues.py: def get(self): logging/config.py: def get(self, key, default=None): idlelib/pyparse.py: def get(self, key, default=None): wsgiref/headers.py: def get(self,name,default=None): xml/dom/minidom.py: def get(self, name, value=None): _collections_abc.py: def get(self, key, default=None): tkinter/__init__.py: def get(self): tkinter/__init__.py: def get(self): tkinter/__init__.py: def get(self): tkinter/__init__.py: def get(self): tkinter/__init__.py: def get(self): tkinter/__init__.py: def get(self): tkinter/__init__.py: def get(self, first, last=None): tkinter/__init__.py: def get(self): tkinter/__init__.py: def get(self): tkinter/__init__.py: def get(self, index1, index2=None): tkinter/__init__.py: def get(self, x, y): tkinter/__init__.py: def get(self): xml/sax/xmlreader.py: def get(self, name, alternative=None): collections/__init__.py: def get(self, key, default=None): idlelib/scrolledlist.py: def get(self, index): idlelib/searchengine.py: def get(root): multiprocessing/pool.py: def get(self, timeout=None): xml/etree/ElementTree.py: def get(self, key, default=None): multiprocessing/queues.py: def get(self, block=True, timeout=None): multiprocessing/queues.py: def get(self): multiprocessing/managers.py: def get(self): multiprocessing/managers.py: def get(self): idlelib/idle_test/mock_tk.py: def get(self): idlelib/idle_test/mock_tk.py: def get(self, index1, index2=None): I wouldn't consider 10 out of 43 "vastly" (11 out of 46 if one includes dict, list, and tuple). The numbers are even worse if one considers the "get_something_or_other" methods which do not have a default parameter. -- ~Ethan~ From klahnakoski at mozilla.com Fri Mar 3 14:15:49 2017 From: klahnakoski at mozilla.com (Kyle Lahnakoski) Date: Fri, 3 Mar 2017 14:15:49 -0500 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <58B6229D.10203@stoneleaf.us> <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> Message-ID: <66c9aaad-ba3c-4ec9-36e5-d031950f5dc8@mozilla.com> I must mention a get() method for lists and tuples would be very useful for me too. It is so useful, that I spent too much time making my own module to handle this case, plus many of the other dealing-with-None subjects found on this list. Michel is correct to point out that this is domain specific problem; a domain where you deal with many varied, and schema-free, data formats. I deal with JSON emitted from multiple systems of often-changing schemas. In my experience, this is not a result of bad or buggy programming, rather, it is about representing facts and annotating them with a multitude of optional properties and descriptive structures. Now, in the specific case of list.get(), I would be disappointed that it is used to extract parameters from an arg list: Parameters should be named; packing them into an ordered list looses that important information, but it happens[1], and list.get() would help. For the args scenario, I do like Ed's solution: dict(enumerate(args)). In conclusion, I may have talked myself out of liking list.get(): Python has a fundamentally different philosophy about None that conflicts with what I need for my domain [2] where I am transforming and interpreting data. Using a set of classes that make a different set of assumptions about None is not arduous, it keeps the definitions separate, and I still get all wonderfulness of Python. [1] also happens when reading csv files: Missing values indicate default, or variable number of columns indicate that the missing rightmost columns are all null. [2] For Python, None is a missing value, or a special case. For data transformation, None means "the operation you performed does not apply to this datatype" which avoids exceptions, which gives you an algebra over data (with [], dot and slice as operators), which allows you to build complex list comprehensions (data transformation queries) without the exception catching logic. Databases query languages do this. On 2017-02-28 21:02, Michel Desmoulin wrote: > > Le 01/03/2017 ? 02:23, Ethan Furman a ?crit : >> On 02/28/2017 05:18 PM, Michel Desmoulin wrote: >> >>> I love this proposal but Guido rejected it. Fighting for it right now >>> would probably be detrimental to the current proposed feature which >>> could potentially be more easily accepted. >> PEP 463 has a better chance of being accepted than this one does, for >> reasons that D'Aprano succinctly summarized. >> >> -- >> ~Ethan~ > The debate is not even over and you are already declaring a winner. > That's not really fair. Give the idea a chance and read until the end. > > D'Aprano's argument is mostly "I don't encounter IndexError really often > and when I do I have this twisted one liner to get away it". > > Well, that's not really a good reason to reject things for Python > because it's a language with a very diverse user base. Some bankers, > some web dev, some geographers, some mathematicians, some students, some > 3D graphists, etc. And the language value obvious, readable, predictable > code for all. > > Most people on this list have a specialty, because their speciality > don't see a use for the feature doesn't mean there is not one. > > So I provided on my last answer an explanation of what I would use it for. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ethan at stoneleaf.us Fri Mar 3 14:35:03 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 11:35:03 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: <58B9C567.8020507@stoneleaf.us> On 03/03/2017 11:01 AM, Sven R. Kunze wrote: > On 03.03.2017 19:29, Ed Kellett wrote: >> The reasons already stated boil down to "lists aren't dicts so they shouldn't share methods", which seems ill-advised >> at best, and "I wouldn't use this". > > I wonder if those arguing against it also think dicts should not have item access: dicts don't have item access -- they have key access. :wink: > a[0] > > dict or list? Why should it matter? Because they are different data types with different purposes. >> - Which of the existing things (slice + [default], conditional on a slice, conditional on a len() call) do you think >> is the obvious way to do it? > > None of them are. Try/except is the most obvious way. But it's tedious. [my_value] = some_list[offset:offset+1] or [default_value] No, it's not terribly pretty, but accessing invalid locations on a list on purpose shouldn't be that common. >> - Are there any examples where list.get would be applicable and not the obviously best way to do it? > > I don't think so. I already have given many examples/ideas of when I would love to have had this ability. Let me > re-state those and more: > > - refactoring (dicts <-> lists and their comprehension counterparts) dict and list comprehensions are not the same, and adding .get to list won't make them the same. > - easier to teach Having `print` be a statement instead of a function made it easier to teach but that didn't make it a good idea. For me to think (list/tuple).get() was needed would be if lots of folk either cast their lists to dicts or made their own list-dict class to solve that problem. -- ~Ethan~ From ethan at stoneleaf.us Fri Mar 3 14:43:03 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 11:43:03 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: <58B9C747.5000409@stoneleaf.us> On 03/03/2017 10:29 AM, Ed Kellett wrote: > - Which of the existing things (slice + [default], conditional on a slice, conditional on a len() call) do you think is > the obvious way to do it? [my_value] = some_list[offset:offset+1] or [default_value] -- ~Ethan~ From srkunze at mail.de Fri Mar 3 15:02:46 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 3 Mar 2017 21:02:46 +0100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <58B9C567.8020507@stoneleaf.us> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> Message-ID: On 03.03.2017 20:35, Ethan Furman wrote: > On 03/03/2017 11:01 AM, Sven R. Kunze wrote: >> On 03.03.2017 19:29, Ed Kellett wrote: > >>> The reasons already stated boil down to "lists aren't dicts so they >>> shouldn't share methods", which seems ill-advised >>> at best, and "I wouldn't use this". >> >> I wonder if those arguing against it also think dicts should not have >> item access: > > dicts don't have item access -- they have key access. :wink: Python doesn't make a difference here. :wink: https://docs.python.org/3/reference/datamodel.html#object.__getitem__ > >> a[0] >> >> dict or list? Why should it matter? > > Because they are different data types with different purposes. > >>> - Which of the existing things (slice + [default], conditional on a >>> slice, conditional on a len() call) do you think >>> is the obvious way to do it? >> >> None of them are. Try/except is the most obvious way. But it's tedious. > > [my_value] = some_list[offset:offset+1] or [default_value] > > No, it's not terribly pretty, but accessing invalid locations on a > list on purpose shouldn't be that common. When generating data series / running a simulation, at the beginning there is no data in many lists. Recently, had those issues. dicts went fine, lists just sucked with all those try/except blocks. > >>> - Are there any examples where list.get would be applicable and not >>> the obviously best way to do it? >> >> I don't think so. I already have given many examples/ideas of when I >> would love to have had this ability. Let me >> re-state those and more: >> >> - refactoring (dicts <-> lists and their comprehension counterparts) > > dict and list comprehensions are not the same, and adding .get to list > won't make them the same. Never said they are the same. I said refactoring is easier. >> - easier to teach > > Having `print` be a statement instead of a function made it easier to > teach but that didn't make it a good idea. Many people disagree with you on this. > For me to think (list/tuple).get() was needed would be if lots of folk > either cast their lists to dicts or made their own list-dict class to > solve that problem. The easier solution would be to provide list.get ;-) Regards, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Mar 3 15:02:22 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 3 Mar 2017 12:02:22 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On Fri, Mar 3, 2017 at 11:01 AM, Sven R. Kunze wrote: > I wonder if those arguing against it also think dicts should not have item > access: > > a[0] > > dict or list? Why should it matter? > Because a mapping that happens to have an integer key is a fundamentally different thing than a sequence. It seems to me that in some of the examples given, the use case really calls for a mapping-with-integers-as-keys, rather than a sequence -- in which case, you could use that, and have the get() :-) And then you could iterate through it the same way, too! you could get closer to sequence behvior by used a OrderedDict -- or maybe even write a SortedDict that would keep the keys in order regardless of when they were added. > a.get(0, 'oops') > Doesn't look so different to me. > but you are going to have issue with other things anyway, notable: for thing in a: # is thing a key or a value????? - Which of the existing things (slice + [default], conditional on a slice, > conditional on a len() call) do you think is the obvious way to do it? > > I think conditional on a len() call is the way to go -- it captures the concept well -- sequences have a certain number of items -- how many this one has is the question at hand. mapping, however, also have a certain number of items, but the number does not indicate which ones are "missing". I guess that's the key point for me -- in all teh examples I seen posed (like parsing args) the sequence may contain from n to n+m items, but a get() doesn' really solve your problem, because the default is probably different depending on how MANY of the possible items are "missing". So get(0 is only helpful if: 1) there is only one possible missing item 2) all the missing items have the same default -- unless that default is sometign like None, in which case you are simply replacing one way to express missing with another, I can't see that being common. So I would expect to see a Sequence /get(0 be used to slightly clean up the code that adds a bunch of Nones to sequences to make them all the same length -- which is actually pretty easy to do anyway: seq = seq + [None] * full_len - len(seq) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Mar 3 15:06:13 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 3 Mar 2017 12:06:13 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <66c9aaad-ba3c-4ec9-36e5-d031950f5dc8@mozilla.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <58B6229D.10203@stoneleaf.us> <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> <66c9aaad-ba3c-4ec9-36e5-d031950f5dc8@mozilla.com> Message-ID: On Fri, Mar 3, 2017 at 11:15 AM, Kyle Lahnakoski wrote: > Python > has a fundamentally different philosophy about None that conflicts with > what I need for my domain [2] where I am transforming and interpreting > data. Using a set of classes that make a different set of assumptions > about None is not arduous, it keeps the definitions separate, and I > still get all wonderfulness of Python. > not really related to this thread, but you may want to use your own "sentinal" singletons, rather than special case code to deal with None. i.e. BadData MissingData UnSpecified M.A. Lemberg has been talking about that on this list (in this thread? I've lost track...) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Mar 3 15:09:54 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 3 Mar 2017 12:09:54 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> Message-ID: On Fri, Mar 3, 2017 at 12:02 PM, Sven R. Kunze wrote: > For me to think (list/tuple).get() was needed would be if lots of folk > either cast their lists to dicts or made their own list-dict class to solve > that problem. > > > The easier solution would be to provide list.get ;-) > Exactly -- I think that was the point -- if there is a lot of custom code out there essentially adding a get() to a list -- then that would indicate that is is broadly useful. For my part, I think casting a list to a dict is often the RIGHT way to address these issues. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Mar 3 15:16:48 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 3 Mar 2017 12:16:48 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> Message-ID: About JSON and schema-less data: I need to deal with this fairly often as well, but: JSON has a data model that includes both mappings and sequences: Sequences (arrays, lists, etc) are the "right" thing to use when an object has zero or more of something. Usually, these somethings are all the same. So you may need to answer the question: how many somethings are there? but rarely: if there are less than this many somethings, then I should use a default value. Mappings (objects, dicts) are the "right" thing to do when an object has a bunch of somethings, and each of them may be different and nameable. In this case, the if this name is in there, use its associated object, otherwise use a default" is a pretty common action. so if your JSON is well formed (and I agree, being schema-less does not mean it is poorly formed) then it should already be using the appropriate data structures, and you are good to go. That being said, maybe a concrete example would persuade the skeptics among us -- though I understand it may be hard to find one that is both non-trivial and simple and small enough to post to a mailing list... -CHB On Fri, Mar 3, 2017 at 12:09 PM, Chris Barker wrote: > On Fri, Mar 3, 2017 at 12:02 PM, Sven R. Kunze wrote: > >> For me to think (list/tuple).get() was needed would be if lots of folk >> either cast their lists to dicts or made their own list-dict class to solve >> that problem. >> >> >> The easier solution would be to provide list.get ;-) >> > > Exactly -- I think that was the point -- if there is a lot of custom code > out there essentially adding a get() to a list -- then that would indicate > that is is broadly useful. > > For my part, I think casting a list to a dict is often the RIGHT way to > address these issues. > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Mar 3 15:19:11 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 3 Mar 2017 12:19:11 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <58B6229D.10203@stoneleaf.us> <764b7a4e-04f1-bfda-8e65-f750a3281af6@gmail.com> <66c9aaad-ba3c-4ec9-36e5-d031950f5dc8@mozilla.com> Message-ID: On Fri, Mar 3, 2017 at 12:06 PM, Chris Barker wrote: > M.A. Lemberg has been talking about that on this list (in this thread? > I've lost track...) > it was in the "Optional parameters without default value" thread. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Fri Mar 3 15:21:43 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 3 Mar 2017 21:21:43 +0100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> Message-ID: On 03.03.2017 21:09, Chris Barker wrote: > On Fri, Mar 3, 2017 at 12:02 PM, Sven R. Kunze > wrote: > >> For me to think (list/tuple).get() was needed would be if lots of >> folk either cast their lists to dicts or made their own list-dict >> class to solve that problem. > > The easier solution would be to provide list.get ;-) > > > Exactly -- I think that was the point -- if there is a lot of custom > code out there essentially adding a get() to a list -- then that would > indicate that is is broadly useful. > > For my part, I think casting a list to a dict is often the RIGHT way > to address these issues. You can't be serious about this. Especially because it would negate your response to Ed "conditional on a len() call is the way to go". Now you tell people to use "convert to dict". For my part, __getitem__ is the technical argument for this proposal. My experience and those of other contributors to this thread make for the "broadly useful" argument. Regards, Sven -------------- next part -------------- An HTML attachment was scrubbed... URL: From srkunze at mail.de Fri Mar 3 15:33:50 2017 From: srkunze at mail.de (Sven R. Kunze) Date: Fri, 3 Mar 2017 21:33:50 +0100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> Message-ID: Thanks Chris for your idea. Right now, I could not think of an example "non-trivial and simple and small enough" especially in the context of JSON. But maybe the other proponents have. The part of data series from simulations (so proper datastructures available). So, data lists which aren't filled yet or have not filled till a certain amount yet I need some special pieces from them like the first, a sample, the 300th, etc. This was the case recently. There was also the case of a refactoring going on in some project, where things changed from dicts to lists (upgrade of a third party lib, I think). As a consequence, I needed to blow up certain functions from n one-liners [a.get] to n four-liners [try/except]. If I had know that this would be relevant to this discussion, I would have written it down, but it's just the negative memory/experience. Regards, Sven On 03.03.2017 21:16, Chris Barker wrote: > About JSON and schema-less data: > > I need to deal with this fairly often as well, but: > > JSON has a data model that includes both mappings and sequences: > > Sequences (arrays, lists, etc) are the "right" thing to use when an > object has zero or more of something. Usually, these somethings are > all the same. So you may need to answer the question: how many > somethings are there? but rarely: if there are less than this many > somethings, then I should use a default value. > > Mappings (objects, dicts) are the "right" thing to do when an object > has a bunch of somethings, and each of them may be different and > nameable. In this case, the if this name is in there, use its > associated object, otherwise use a default" is a pretty common action. > > so if your JSON is well formed (and I agree, being schema-less does > not mean it is poorly formed) then it should already be using the > appropriate data structures, and you are good to go. > > That being said, maybe a concrete example would persuade the skeptics > among us -- though I understand it may be hard to find one that is > both non-trivial and simple and small enough to post to a mailing list... > > -CHB > > > > > > > > > On Fri, Mar 3, 2017 at 12:09 PM, Chris Barker > wrote: > > On Fri, Mar 3, 2017 at 12:02 PM, Sven R. Kunze > wrote: > >> For me to think (list/tuple).get() was needed would be if >> lots of folk either cast their lists to dicts or made their >> own list-dict class to solve that problem. > > The easier solution would be to provide list.get ;-) > > > Exactly -- I think that was the point -- if there is a lot of > custom code out there essentially adding a get() to a list -- then > that would indicate that is is broadly useful. > > For my part, I think casting a list to a dict is often the RIGHT > way to address these issues. > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 > main reception > > Chris.Barker at noaa.gov > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Mar 3 16:21:30 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 3 Mar 2017 13:21:30 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> Message-ID: On Fri, Mar 3, 2017 at 12:21 PM, Sven R. Kunze wrote: > For my part, I think casting a list to a dict is often the RIGHT way to > address these issues. > > > You can't be serious about this. Especially because it would negate your > response to Ed "conditional on a len() call is the way to go". Now you tell > people to use "convert to dict". > I am serious. It depends on the use case. If the data are an arbitrary-size collection of essentially the same thing, then a sequence is the right data structure, and examining len() (or catching the IndexError, maybe) is the right way to handle there being fewer than you expect of them. I think the key point is there there is nothing particularly different about them -- in fact, often order isn't important at all. If the data in question is a bunch of stuff where it matters where they land in the sequence, and there may be missing values (Like a row in a CSV file, maybe) then a dict IS the right structure -- even if it has integer keys. This reminds me of a discussion by Guido years ago about the "usual" use cases for lists vs tuples -- lists are often a homogenous sequence of items, whereas tuples are more likely to be heterogeneos -- more like a struct Now that I write that -- maybe the structure you really want is a namedtuple. and maybe IT should have a get() so as to avoid catching attribute errors all over teh place... though getattr() does have a default -- so maybe namedtuple IS the answer :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Mar 3 16:26:50 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 3 Mar 2017 13:26:50 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> Message-ID: On Fri, Mar 3, 2017 at 12:33 PM, Sven R. Kunze wrote: > Right now, I could not think of an example "non-trivial and simple and > small enough" especially in the context of JSON. But maybe the other > proponents have. > Always a challenge -- sorry to lack imagination, I tend to need concrete examples to "get" some things. And so far the concrete examples in this thread seem to have been unconvincing... Which doesn't mean at all that there aren't good and common-enough use cases.. The part of data series from simulations (so proper datastructures > available). So, data lists which aren't filled yet or have not filled till > a certain amount yet I need some special pieces from them like the first, a > sample, the 300th, etc. This was the case recently. > I deal with that a fair bit -- but in that case, if I need, say the 300th sample, and there are not yet 300 available, then that IS an Exception I want to handle. if it didn't need to be the 300th, but rather an random sample, or maybe one "half way through the data", or .... then I would compute that index from teh length or something... Though I'm probably misunderstanding this use case. There was also the case of a refactoring going on in some project, where > things changed from dicts to lists (upgrade of a third party lib, I think). > As a consequence, I needed to blow up certain functions from n one-liners > [a.get] to n four-liners [try/except]. > well, THAT I would blame on the third party lib..... :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From desmoulinmichel at gmail.com Fri Mar 3 16:35:18 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Fri, 3 Mar 2017 22:35:18 +0100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> Message-ID: <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> Le 03/03/2017 ? 22:21, Chris Barker a ?crit : > On Fri, Mar 3, 2017 at 12:21 PM, Sven R. Kunze > wrote: > >> For my part, I think casting a list to a dict is often the RIGHT >> way to address these issues. > > You can't be serious about this. Especially because it would negate > your response to Ed "conditional on a len() call is the way to go". > Now you tell people to use "convert to dict". > > > I am serious. It depends on the use case. If the data are an But that's the all problem isn't it? Since the start of the discussion, contesters have been offering numerous solutions, all being contextual and with gotchas, none being obvious, simple or elegant. The best is still try/except. "There should be one obvious way to do it" right? Plus Sven already estimated the implementation would not be very hard. So we have one obvious solution to a problem that: - several professional programmers said they have - has a similar API in another built-in - has currently no elegant solutions The proposal is actionable, the cost of it seems low, and it's not remotely controversial. I get that on Python-idea you get "no" by default, but here we are having such resistance for a feature that is light to implement, does not clutter anything, does solve a problem, and is congruent with other APIs. Honestly what evil would happen if it's get accepted ? This is not a "yeah but we can't accept everything that goes in or we would bloat Python" thing. If somebody tells me that no one want to spend time to code it, I can understand. Everybody's has a life, and somebody else's pony can wait. And since I can't code in C I can't do it. But that doesn't seem to be the problem here. From ethan at stoneleaf.us Fri Mar 3 16:47:13 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 03 Mar 2017 13:47:13 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> Message-ID: <58B9E461.3010308@stoneleaf.us> On 03/03/2017 01:35 PM, Michel Desmoulin wrote: > The proposal is actionable, the cost of it seems low, and it's not > remotely controversial. Several people are against this idea, several of them are core-devs, and you claim it's not *remotely* controversial? Really?? -- ~Ethan~ From matt at getpattern.com Fri Mar 3 17:32:53 2017 From: matt at getpattern.com (Matt Gilson) Date: Fri, 3 Mar 2017 14:32:53 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> Message-ID: On Fri, Mar 3, 2017 at 1:35 PM, Michel Desmoulin wrote: > > > Le 03/03/2017 ? 22:21, Chris Barker a ?crit : > > On Fri, Mar 3, 2017 at 12:21 PM, Sven R. Kunze > > wrote: > > > >> For my part, I think casting a list to a dict is often the RIGHT > >> way to address these issues. > > > > You can't be serious about this. Especially because it would negate > > your response to Ed "conditional on a len() call is the way to go". > > Now you tell people to use "convert to dict". > > > > > > I am serious. It depends on the use case. If the data are an > > But that's the all problem isn't it? > > Since the start of the discussion, contesters have been offering > numerous solutions, all being contextual and with gotchas, none being > obvious, simple or elegant. > > The best is still try/except. > Which really isn't a big deal if you use it in one or two places. If you use it everywhere, it's not too hard to roll your own helper function. > > "There should be one obvious way to do it" right? > > Plus Sven already estimated the implementation would not be very hard. > So we have one obvious solution to a problem that: > > - several professional programmers said they have > - has a similar API in another built-in > You state that like it's a good thing ;-). I'm not quite so sure. > - has currently no elegant solutions > > The proposal is actionable, the cost of it seems low, and it's not > remotely controversial. > It seems to be pretty controversial to me :-). > > Honestly what evil would happen if it's get accepted ? Lots of things. For one thing, when scanning a function, if I see something with a `.get` method I generally think that it is probably a Mapping. Obviously that assumption may be wrong, but it's at least a good place to start. If this gets accepted, that's no longer a clear starting assumption. It breaks backward compatibility (in small ways). People might be relying on the presence/absence of a `.get` method in order to make their function polymorphic in some convoluted way which would break when this change is introduced. (I'm not saying that would be a good programming design idea -- and it might not hold water as an argument when real-world usage is looked at but it should be at least investigated before we claim that this change isn't going to hurt anybody). It's also not clear to me why we should be singling out `tuple` and `list`. Why not `str`, `bytes` and other sequences? Maybe it isn't that useful on those types, but I'd argue that it's not really useful on `tuple` either other than to keep the "`tuple` is an immutable `list`" paradigm. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Mar 3 18:17:14 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 03 Mar 2017 23:17:14 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> Message-ID: On Fri, Mar 3, 2017 at 1:35 PM, Michel Desmoulin wrote: > I am serious. It depends on the use case. If the data are an But that's the all problem isn't it? Since the start of the discussion, contesters have been offering numerous solutions, all being contextual and with gotchas, none being obvious, simple or elegant. in the context above, I was offering that there were obvious, simple and elegant solutions, but that which one was dependent on the use case. EVERY choice in programming is dependent on the use case. What I haven't seen yet is a compelling use case for a sequence .get() that does not have an existing simple and elegant solution. which doesn't mean they don't exist. (and for the my part, the machinations with or shortcutting are not, in my book, simple or elegant...) Plus Sven already estimated the implementation would not be very hard. ... The proposal is actionable, the cost of it seems low, This would not simply be adding one method to a class. It would be adding a method to the Sequence protocol ( ABC, whatever you want call it). So a much heavier lift and larger impact than you imply. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at lucidity.plus.com Fri Mar 3 19:23:21 2017 From: python at lucidity.plus.com (Erik) Date: Sat, 4 Mar 2017 00:23:21 +0000 Subject: [Python-ideas] For/in/as syntax In-Reply-To: <387f686d-ae24-e8ed-d884-f899d16b5d6c@mapgears.com> References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> <387f686d-ae24-e8ed-d884-f899d16b5d6c@mapgears.com> Message-ID: On 03/03/17 19:02, Alexandre Brault wrote: > I believe what Matthias is hoping for is an equivalent of Java's named > break feature. Breaking out of an outer loop implicitly breaks out of > all inner loops Yes, and although I think making this a runtime object is an interesting thought (in terms of perhaps allowing other funky stuff to be implemented by a custom object, in line with Python's general dynamic ethos), I think that it should perhaps be considered a lexer/parser level thing only. * Creating a real object at runtime for each loop which needs to be the target of a non-inner break or continue is quite a large overhead. How would this affect Python variants other than CPython? * For anything "funky" (my words, not yours ;)), there needs to be a way of creating a custom loop object - what would the syntax for that be? A callable needs to be invoked as well as the name bound (the current suggestion just binds a name to some magical object that appears from somewhere). * If nothing "funky" needs to be done then why not just make the whole thing syntax-only and have no real object, by making the 'as' name a parser-only token which is only valid as the optional subject of a break or continue statement: for foo in bar as LABEL: . # (a) . for spam in ham: . . if eggs(spam): continue LABEL . . if not frob(spam): break LABEL # (b) (a) is the code generator's implied 'continue' target for the LABEL loop. (b) is the code generator's implied 'break' target for the LABEL loop. I'm not saying that's a great solution either. It's probably not an option as there is now something that looks like a bound name but is not actually available at runtime - the following would not work: for foo in bar as LABEL: print(dir(LABEL)) (and presumably that is part of the reason why the proposal is the way it is). I'm generally +0 on the concept (it might be nice, but I'm not sure either the original proposal or what I mention above are particularly problem-free ;)). E. From martinezyander at gmail.com Fri Mar 3 20:49:59 2017 From: martinezyander at gmail.com (Yander Martinez) Date: Fri, 3 Mar 2017 21:49:59 -0400 Subject: [Python-ideas] Thank's for your confirmation. Message-ID: -------------- next part -------------- An HTML attachment was scrubbed... URL: From markusmeskanen at gmail.com Fri Mar 3 21:25:16 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Sat, 4 Mar 2017 04:25:16 +0200 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> Message-ID: Hi, I'm still new here, but if my vote has any value then this gets a heavy -1 from me. It makes no sense to have to access exact i:th element of a list without knowing if it exists, at least not in a scenario where checking against the length or using an exception (say CSV row should have index 2 but doesn't) wouldn't be better. I might've missed a message or two, but unless someone can provide a real example where get() has an actual use case, I see no reason to argue over this. - Markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From contact at brice.xyz Sat Mar 4 03:45:09 2017 From: contact at brice.xyz (Brice PARENT) Date: Sat, 4 Mar 2017 09:45:09 +0100 Subject: [Python-ideas] For/in/as syntax In-Reply-To: References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> <387f686d-ae24-e8ed-d884-f899d16b5d6c@mapgears.com> Message-ID: > > * Creating a real object at runtime for each loop which needs to be > the target of a non-inner break or continue is quite a large overhead. > How would this affect Python variants other than CPython? I don't know much about the specifics on how it works behind the scenes, and even less about other implementations of the language, so I just give my opinions on the usage and the benefits of such a syntax. However, I'm not sure the object should be constructed and fed for every loop usage. It should probably only be instanciated if explicitly asked by the coder (by the use of "as loop_name"). I don't know what happens to the caught exceptions when they are not set to a variable (using except ExceptionType as e), but I think we would be in a similar case, only, if the object is instanciated, its states would evolve each time the loop starts another iteration. > > * For anything "funky" (my words, not yours ;)), there needs to be a > way of creating a custom loop object - what would the syntax for that > be? A callable needs to be invoked as well as the name bound (the > current suggestion just binds a name to some magical object that > appears from somewhere). I don't really understand what this means, as I'm not aware of how those things work in the background. But it would create an object and feed it to the variable, yes. I guess it would be magical in the sense it's not the habitual way of constructing an object. But it's what we're already used to with "as". When we use a context manager, like "with MyPersonalStream() as my_stream:", my_stream is not an object of type "MyPersonalStream" that has been built using the constructor, but the return of __enter__() (at least, it's what I understood), and the MyPersonalStream instance is somewhere else waiting for its closing fate. And when catching an exception, we feed a new variable with an object created somewhere-else-in-the-code, not with a classic instanciation. So this behaviour, obtaining an object created somewhere else and not explicitly, seems logical with the other uses of "as", IMHO. > > * If nothing "funky" needs to be done then why not just make the whole > thing syntax-only and have no real object, by making the 'as' name a > parser-only token which is only valid as the optional subject of a > break or continue statement: > > for foo in bar as LABEL: > . # (a) > . > for spam in ham: > . > . > if eggs(spam): > continue LABEL > . > . > if not frob(spam): > break LABEL > # (b) > > (a) is the code generator's implied 'continue' target for the LABEL loop. > (b) is the code generator's implied 'break' target for the LABEL loop. > > I'm not saying that's a great solution either. It's probably not an > option as there is now something that looks like a bound name but is > not actually available at runtime - the following would not work: > > for foo in bar as LABEL: > print(dir(LABEL)) > > (and presumably that is part of the reason why the proposal is the way > it is). This solution, besides having been explicitly rejected by Guido himself, brings two functionalities that are part of the proposal, but are not its main purpose, which is having the object itself. Allowing to break and continue from it are just things that it could bring to us, but there are countless things it could also bring (not all of them being good ideas, of course), like the .skip() and the properties I mentioned, but we could discuss about some methods like forloop.reset(), forloop.is_first_iteration() which is just of shortcut to (forloop.count == 0), forloop.is_last_iteration() (which would probably only be available on fixed-length iterables), and probably many other things to play with the sequencing of the loop. > > I'm generally +0 on the concept (it might be nice, but I'm not sure > either the original proposal or what I mention above are particularly > problem-free ;)). I have no clue on the implementation side, and I'm pretty sure such a big change couldn't ever be problem-free, I'm just convinced it could bring a lot of control and readability on the user side. Brice From steve at pearwood.info Sat Mar 4 04:40:11 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 4 Mar 2017 20:40:11 +1100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> Message-ID: <20170304094009.GG5689@ando.pearwood.info> On Fri, Mar 03, 2017 at 10:35:18PM +0100, Michel Desmoulin wrote: > Since the start of the discussion, contesters have been offering > numerous solutions, all being contextual and with gotchas, none being > obvious, simple or elegant. I do not agree with that characterisation. > The best is still try/except. And I don't agree with that either. > "There should be one obvious way to do it" right? But what is "it" here? Don't say "look up an arbitrary-indexed item which may not exist from a sequence". That's too general, and in the most general case, the right way to do that is to use sequence[index] which will raise if the item doesn't exist. In other words, the status quo. Be specific. Show some code -- its okay if its simplified code, but it should be enough to demonstrate the *use-case* for this. "I have crappy JSON" is not a use-case. How is it crappy and how would you use list.get to fix it? This brings us back to the point I made really early on: this *seems* like an obviously useful method, by analogy with dicts. I agree! It *seems* useful, so obviously such that one of the first things I added to my own personal toolbox of helper functions was a sequence get() function: def get(sequence, index, default=None): try: return sequence[index] except IndexError: return default But then I never used it. "Seems useful" != "is useful", at least in my experience. > Plus Sven already estimated the implementation would not be very hard. The simplicity of the implementation argues *against* the need for this to be a built-in. If you really do need this, then why not add a sequence get() function to your project? Its only five lines! As far as I have seen, only one person apart from myself, Kyle Lahnakoski, has implemented this helper in their own code. And Kyle says he has talked himself out of supporting this change. One thing I haven't seen is anyone saying "I am constantly writing and re-writing this same helper function over and over again! I grepped my code base and I've recreated this helper in 30 different modules. Maybe it should be a built-in?" That would be a good argument, but nobody has made it. Lots of people saying that they desperately need this method, but apparently most of them don't need it enough to write a five line helper function to get it. They'd rather wait until they've migrated all their code to Python 3.7. > So we have one obvious solution to a problem that: > > - several professional programmers said they have I'm not convinced by claims that "I need to fetch arbitrary indexes from sequences ALL THE TIME, sorry I can't show any examples..." > - has a similar API in another built-in > - has currently no elegant solutions > > The proposal is actionable, the cost of it seems low, and it's not > remotely controversial. It is only "not remotely controversial" if you ignore all those who disagree that this is needed. > I get that on Python-idea you get "no" by default, but here we are > having such resistance for a feature that is light to implement, does > not clutter anything, does solve a problem, and is congruent with other > APIs. > > Honestly what evil would happen if it's get accepted ? > > This is not a "yeah but we can't accept everything that goes in or we > would bloat Python" thing. Yes it is. But fundamentally, although I really don't see the benefit to this, I'm not *strongly* against it either. I don't think the sky will fall if it is added to sequences. But if somebody wants to code this up (don't forget the Sequence ABC) and submit a patch or a PR for a senior developer to look up, I'm not going to deny you that opportunity, -- Steve From python at lucidity.plus.com Sat Mar 4 06:22:07 2017 From: python at lucidity.plus.com (Erik) Date: Sat, 4 Mar 2017 11:22:07 +0000 Subject: [Python-ideas] For/in/as syntax In-Reply-To: References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> <387f686d-ae24-e8ed-d884-f899d16b5d6c@mapgears.com> Message-ID: <6eaec9aa-ae7b-13b4-4403-1f1921a131cb@lucidity.plus.com> Hi Brice, On 04/03/17 08:45, Brice PARENT wrote: >> * Creating a real object at runtime for each loop which needs to be >> the target of a non-inner break or continue > However, I'm not sure the object should be constructed and fed for every > loop usage. It should probably only be instanciated if explicitly asked > by the coder (by the use of "as loop_name"). That's what I meant by "needs to be the target of a non-inner break or continue" (OK, you are proposing something more than just a referenced break/continue target, but we are talking about the same thing). Only loops which use the syntax get a loop manager object. >> * For anything "funky" (my words, not yours ;)), there needs to be a >> way of creating a custom loop object - what would the syntax for that >> be? A callable needs to be invoked as well as the name bound (the >> current suggestion just binds a name to some magical object that >> appears from somewhere). > I don't really understand what this means, as I'm not aware of how those > things work in the background. What I mean is, in the syntax "for spam in ham as eggs:" the name "eggs" is bound to your loop manager object. Where is the constructor call for this object? what class is it? That's what I meant by "magical". If you are proposing the ability to create user-defined loop managers then there must be somewhere where your custom class's constructor is called. Otherwise how does Python know what type of object to create? Something like (this is not a proposal, just something plucked out of the air to hopefully illustrate what I mean): for spam in ham with MyLoop() as eggs: eggs.continue() > I guess it would be magical in the sense it's not > the habitual way of constructing an object. But it's what we're already > used to with "as". When we use a context manager, like "with > MyPersonalStream() as my_stream:", my_stream is not an object of type > "MyPersonalStream" that has been built using the constructor, but the > return of __enter__() By you have to spell the constructor (MyPersonalStream()) to see what type of object is being created (whether or not the eventual name bound in your context is to the result of a method call on that object, the constructor of your custom context manager is explicitly called. If you are saying that the syntax always implicitly creates an instance of a builtin class which can not be subclassed by a custom class then that's a bit different. > This solution, besides having been explicitly rejected by Guido himself, I didn't realise that. Dead in the water then probably, which is fine, I wasn't pushing it. > brings two functionalities that are part of the proposal, but are not > its main purpose, which is having the object itself. Allowing to break > and continue from it are just things that it could bring to us, but > there are countless things it could also bring (not all of them being > good ideas, of course), like the .skip() and the properties I mentioned, I understand that, but I concentrated on those because they were easily converted into syntax (and would probably be the only things I'd find useful - all the other stuff is mostly doable using a custom iterator, I think). I would agree that considering syntax for all of the extra things you mention would be a bad idea - which your loop manager object idea gets around. > but we could discuss about some methods like forloop.reset(), > forloop.is_first_iteration() which is just of shortcut to (forloop.count > == 0), forloop.is_last_iteration() Also, FWIW, if I knew that in addition to the overhead of creating a loop manager object I was also incurring the overhead of a loop counter being maintained (usually, one is not required - if it is, use enumerate()) I would probably not use this construct and instead find ways of restructuring my code to avoid it using regular for loops. I'm not beating up on you - like I said, I think the idea is interesting. E. From pavol.lisy at gmail.com Sat Mar 4 06:52:19 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Sat, 4 Mar 2017 12:52:19 +0100 Subject: [Python-ideas] add __contains__ into the "type" object In-Reply-To: References: <1a6bb123.269.15a86f5d855.Coremail.mlet_it_bew@126.com> <20170228231250.GN5689@ando.pearwood.info> Message-ID: On 3/2/17, Stephan Houben wrote: > A crucial difference between a set and a type is that you cannot > explicitly iterate over the elements of a type, so while we could implement > > x in int > > to do something useful, we cannot make > > for x in int: > print(x) > > Because if we could, we could implement Russell's paradox in Python: > > R = set(x for x in object if x not in x) > > print(R in R) > > Bottom line: a set is not a type, even in mathematics. > > Stephan :) Problem of Russel paradox was that set was defined intuitively in that time. And it is why **class**, was introduced in set theory then! :) In set theory you could ask if x member of set or x is member of class and difference between set and class is just that set is not same as class. (class is some intuitively defined concept just to avoid paradoxes like Russel's) And python does not implement platonism (https://en.wikipedia.org/wiki/Philosophy_of_mathematics#Platonism) - object doesn't exists until they are created. class int is not infinite in its actuality. (Does not occupy infinite memory) So I think that print(R in R) would print True. Because in the time set is in creation process this object (set) is not member of that set so it would be added to that set. Or maybe it will return False if this incomplete set is not evaluated in process of its creation. It is matter of implementation, but it could be perfectly deterministic. So maybe Russel had to "implement" GIL in mathematics ;) In current python you could ask about "membership" (operator in) where object is not set: if 3 in itertools.count(): # this is ok type(itertools.count()) # this is ok else: set(itertools.count()) # I don't approve this :P python set is not mathematics set too: a = set() a.add(a) # TypeError: unhashable type: 'set' I am -1 about proposal in this moment. I probably understand motivation but I don't see yet that this change could bring enough goodness. From pobocks at gmail.com Sat Mar 4 11:54:37 2017 From: pobocks at gmail.com (David Mayo) Date: Sat, 4 Mar 2017 11:54:37 -0500 Subject: [Python-ideas] Suggestion: Collection type argument for argparse where nargs != None Message-ID: A friend of mine (@bcjbcjbcj on twitter) came up with an idea for an argparse improvement that I'd like to propose for inclusion. Currently, argparse with nargs= collects arguments into a list (or a list of lists in the case of action="append"). I would like to propose adding a "collection type" argument to the store and append actions and to add_argument, consisting of a callable that would be applied to the list of type-converted args before adding them to the Namespace. This would allow for alternate constructors (e.g. set), for modifying the list (e.g. with sorted), or to do checking of properties expected across all components of the argument at parse time. I've worked up a set of examples in this gist: https://gist.github.com/pobocks/bff0bea494f2b7ec7eba1e8ae281b888 And a rough implementation here: https://github.com/python/cpython/compare/master...pobocks:argparse_colltype I think this would be genuinely useful, and would require very little change to argparse, which should be backwards compatible provided that the default for the collection type is list, or None with list specified if None. Thank you all for your time in considering this, - Dave Mayo @pobocks on twitter, github, various others -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Mar 4 22:17:31 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 5 Mar 2017 13:17:31 +1000 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 3 March 2017 at 18:47, Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: > On 03/03/2017 04:36 AM, Nick Coghlan wrote: > >> On 2 March 2017 at 21:06, Wolfgang Maier >> > > wrote: >> >> - overall I looked at 114 code blocks that contain one or more breaks >>> >> >> >> Thanks for doing that research :) >> >> >> Of the remaining 19 non-trivial cases >>> >>> - 9 are variations of your classical search idiom above, i.e., >>> there's an else clause there and nothing more is needed >>> >>> - 6 are variations of your "nested side-effects" form presented >>> above with debatable (see above) benefit from except break >>> >>> - 2 do not use an else clause currently, but have multiple breaks >>> that do partly redundant things that could be combined in a single >>> except break clause >>> >> >> >> Those 8 cases could also be reviewed to see whether a flag variable >> might be clearer than relying on nested side effects or code repetition. >> >> > [...] > > >> This is a case where a flag variable may be easier to read than loop >> state manipulations: >> >> may_have_common_prefix = True >> while may_have_common_prefix: >> prefix = None >> for item in items: >> if not item: >> may_have_common_prefix = False >> break >> if prefix is None: >> prefix = item[0] >> elif item[0] != prefix: >> may_have_common_prefix = False >> break >> else: >> # all subitems start with a common "prefix". >> # move it out of the branch >> for item in items: >> del item[0] >> subpatternappend(prefix) >> >> Although the whole thing could likely be cleaned up even more via >> itertools.zip_longest: >> >> for first_uncommon_idx, aligned_entries in >> enumerate(itertools.zip_longest(*items)): >> if not all_true_and_same(aligned_entries): >> break >> else: >> # Everything was common, so clear all entries >> first_uncommon_idx = None >> for item in items: >> del item[:first_uncommon_idx] >> >> (Batching the deletes like that may even be slightly faster than >> deleting common entries one at a time) >> >> Given the following helper function: >> >> def all_true_and_same(entries): >> itr = iter(entries) >> try: >> first_entry = next(itr) >> except StopIteration: >> return False >> if not first_entry: >> return False >> for entry in itr: >> if not entry or entry != first_entry: >> return False >> return True >> >> - finally, 1 is a complicated break dance to achieve sth that >>> clearly would have been easier with except break; from typing.py: >>> >> >> > [...] > > >> I think is another case that is asking for the inner loop to be factored >> out to a named function, not for reasons of re-use, but for reasons of >> making the code more readable and self-documenting :) >> >> > It's true that using a flag or factoring out redundant code is always a > possibility. Having the except clause would clearly not let people do > anything they couldn't have done before. > On the other hand, the same is true for the else clause - it's only > advantage here is that it's existing already I forget where it came up, but I seem to recall Guido saying that if he were designing Python today, he wouldn't include the "else:" clause on loops, since it inevitably confuses folks the first time they see it. (Hence articles like mine that attempt to link it with try/except/else rather than if/else). > - because a single flag could always distinguish between a break having > occurred or not: > > brk = False > for item in iterable: > if some_condition: > brk = True > break > if brk: > do_stuff_upon_breaking_out() > else: > do_alternative_stuff() > > is a general pattern that would always work without except *and* else. > > However, the fact that else exists generates a regrettable asymmetry in > that there is direct language support for detecting one outcome, but not > the other. > It's worth noting that this asymmetry doesn't necessarily exist in the corresponding C idiom that I assume was the inspiration for the Python equivalent: int data_array_len = sizeof(data_array) / sizeof(data_array[0]); in idx = 0; for (idx = 0; idx < data_array_len; idx++) { if (condition(container[idx])) { break; } } if (idx < data_array_len) { // We found a relevant entry } else { // We didn't find anything } In Python prior to 2.1 (when PEP 234 added the iterator protocol), a similar approach could be used for Python's for loops: num_items = len(container): for idx in range(num_items): if condition(container[idx]): break if num_items and idx < num_items: # We found a relevant entry else: # We didn't find anything However, while my own experience with Python is mainly with 2.2+ (and hence largely after the era where "for i in range(len(container)):" was still common), I've spent a lot of time working with C and the corresponding iterator protocols in C++, and there it is pretty common to move the "entry found" code before the break and then invert the conditional check that appears after the loop: int data_array_len = sizeof(data_array) / sizeof(data_array[0]); int idx = 0; for (idx = 0; idx < data_array_len; idx++) { if (condition(container[idx])) { // We found a relevant entry break; } } if (idx >= data_array_len) { // We didn't find anything } And it's *this* version of the C/C++ idiom that Python's "else:" clause replicates. One key aspect of this particular idiomatic structure is that it retains the same overall shape regardless of whether the inner structure is: if condition(item): # Condition is true, so process the item process(item) break or: if maybe_process_item(item): # Item was processed, so we're done here break Whereas the "post-processing" model can't handle pre-composed helper functions that implement both the conditional check and the item processing, and then report back which branch they took. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Mar 4 23:29:07 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 5 Mar 2017 15:29:07 +1100 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: <20170305042907.GI5689@ando.pearwood.info> On Sun, Mar 05, 2017 at 01:17:31PM +1000, Nick Coghlan wrote: > I forget where it came up, but I seem to recall Guido saying that if he > were designing Python today, he wouldn't include the "else:" clause on > loops, since it inevitably confuses folks the first time they see it. Heh, if we exclude all features that confuse people the first time they see it, we'd have to remove threads, Unicode, floating point maths, calls to external processes, anything OS-dependent, metaclasses, classes, ... :-) It took me the longest time to realise that the "else" clause didn't *only* run when the loop sequence is empty. That is, I expected that given: for x in random.choice(["", "a"]): # either empty, or a single item print("run the loop body") else: print("loop sequence is empty") That's because it *seems* to work that way, if you do insufficient testing: for x in []: raise ValueError # dead code is not executed else: print("loop sequence is empty") It is my belief that the problem here is not the else clause itself, but that the name used is misleading. I've seen people other than myself conclude that it means: run the for-loop over the sequence otherwise the sequence is empty, run the ELSE block I've seen people think that it means: set break_seen flag to false run the for-loop if break is executed, set the break_seen flat to true then break if break_seen is false, run the "ELSE NOT BREAK" clause and consequently ask how they can access the break_seen flag for themselves. Presumably they want to write something like: run the for-loop if break_seen is true, do this else (break_seen is false) do that I think that the name "else" here is a "misunderstanding magnet", it leads people to misunderstand the nature of the clause and its implications. For example, I bet that right now there are people reading this and nodding along with me and thinking "maybe we should rename it something more explicit, like "else if no break", completely oblivious to the fact that `break` is NOT the only way to avoid running the `else` clause. I believe that the name should have been "then", not "else". It describes what the code does: run the for-block THEN run the "else" block There's no flag to be tested, and the "else" block simply runs once, after the for-loop, regardless of whether the for-loop runs once or ten times or zero times (empty sequence). To avoid running the "else" ("then") block, you have to exit the entire block of code using: - break - return - raise which will all transfer execution past the end of the for...else (for...then) compound statement. Since I realised that the else block merely runs directly after the for, I've never had any problem with the concept. `break` merely jumps past the for...else block, just as `return` exits the function and `raise` triggers an exception which transfers execution to the surrounding `except` clause. -- Steve From elliot.gorokhovsky at gmail.com Sun Mar 5 02:19:43 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Sun, 05 Mar 2017 07:19:43 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) Message-ID: (Summary of results: my patch at https://bugs.python.org/issue28685 makes list.sort() 30-50% faster in common cases, and at most 1.5% slower in the uncommon worst case.) Hello all, You may remember seeing some messages on here about optimizing list.sort() by exploiting type-homogeneity: since comparing apples and oranges is uncommon (though possible, i.e. float to int), it pays off to check if the list is type-homogeneous (as well as homogeneous with respect to some other properties), and, if it is, to replace calls to PyObject_RichCompareBool with calls to ob_type->tp_richcompare (or, in common special cases, to optimized compare functions). The entire patch is less than 250 lines of code, most of which is pretty boilerplate (i.e. a lot of assertions in #ifdef Py_DEBUG blocks, etc). I originally wrote that patch back in November. I've learned a lot since then, both about CPython and about mailing list etiquette :). Previous discussion about this can be found at https://mail.python.org/pipermail/python-dev/2016-October/146648.html and https://mail.python.org/pipermail/python-ideas/2016-October/042807.html. Anyway, I recently redid the benchmarks much more rigorously (in preparation for presenting this project at my school's science fair), achieving a standard deviation of less than 0.5% of the mean for all measurements. The exact benchmark script used can be found at https://github.com/embg/python-fastsort-benchmark (it's just sorting random lists of/lists of tuples of [type]. While listsort.txt talks about benchmarking different kinds of structured lists, instead of just random lists, the results here would hold in those cases just as well, because this makes individual comparisons cheaper, instead of reducing the number of comparisons based on structure). I also made a poster describing the optimization and including a pretty graph displaying the benchmark data: https://github.com/embg/python-fastsort-benchmark/blob/master/poster.pdf. For those who would rather read the results here (though it is a *really* pretty graph): *** Percent improvement for sorting random lists of [type] (1-patched/unpatched): float: 48% bounded int (magnitude smaller than 2^32): 48.4% latin string (all characters in [0,255]): 32.7% general int (reasonably uncommon?): 17.2% general string (reasonably uncommon?): 9.2% tuples of float: 63.2% tuples of bounded int: 64.8% tuples of latin string: 55.8% tuples of general int: 50.3% tuples of general string: 44.1% tuples of heterogeneous: 41.5% heterogeneous (lots of float with an int at the end; worst-case): -1.5% *** Essentially, it's a gamble where the payoff is 20-30 times greater than the cost, and the odds of losing are very small. Sorting is perhaps not a bottleneck in most applications, but considering how much work has gone into Python's sort (Timsort, etc; half of listobject.c is sort code), I think it's interesting that list.sort() can be made essentially twice faster by a relatively simple optimization. I would also add that Python dictionaries already implement this optimization: they start out optimizing based on the assumption that they'll only be seeing string keys, checking to make sure that assumption holds as they go. If they see a non-string key, they permanently switch over to the general implementation. So it's really the same idea, except here it doesn't matter as much what type we're dealing with, which is important, because lists are commonly used with lots of different types, as opposed to dictionaries, which overwhelmingly commonly use string keys, especially internally. (Correct me if I'm wrong in any of the above). I got a lot of great feedback/input on this patch as I was writing it, but after submitting it, I didn't hear much from anybody. (The reason I took so long to post was because I wanted to wait until I had the chance to do the benchmarks *right*). What do you all think? Thanks, Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Mar 5 06:27:07 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 5 Mar 2017 11:27:07 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On 3 March 2017 at 18:29, Ed Kellett wrote: > I guess I don't have any hope of convincing people who think there's no need > to ever do this, but I have a couple of questions for the people who think > the existing solutions are fine: > > - Which of the existing things (slice + [default], conditional on a slice, > conditional on a len() call) do you think is the obvious way to do it? Write a small wrapper function that implements the functionality (however you want, but it doesn't have to be a single expression, so you've much more flexibility to make it readable) and then use that. > - Are there any examples where list.get would be applicable and not the > obviously best way to do it? I don't understand the question. If you're asking which is better between list.get and a custom written function as described above, then a custom written function is better because (a) it works on all Python versions, (b) list.get needs a language change where a helper function doesn't, (c) "writing a helper function" is a generally useful idiom that works for many, many things, but list.get only solves a single problem and every other such problem would need its own separate language change. The disadvantage that you have to write the helper is trivial, because it's only a few lines of simple code: def get_listitem(lst, n, default=None): try: return lst[n] except IndexError: return default Paul From g.rodola at gmail.com Sun Mar 5 06:45:34 2017 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Sun, 5 Mar 2017 18:45:34 +0700 Subject: [Python-ideas] Add socket utilities for IPv4/6 dual-stack servers Message-ID: Some years ago I started working on a patch for the socket module which added a couple of utility functions for being able to easily create a server socket, with the addition of being able to accept both IPv4 and IPv6 connections (as a single socket): https://bugs.python.org/issue17561 Given that not all OSes (Windows, many UNIXes) support this natively, I later submitted a recipe adding a "multiple socket listener" class. https://code.activestate.com/recipes/578504-server-supporting-ipv4-and-ipv6/ >From the user perspective, the whole thing can be summarized as follows: >>> sock = create_server_sock(("", 8000)) >>> if not has_dual_stack(sock): ... sock.close() ... sock = MultipleSocketsListener([("0.0.0.0", 8000), ("::", 8000)]) >>> Part of this work ended up being included into Tulip internals: https://github.com/python/cpython/blob/70d28a184c42d107cc8c69a95aa52a4469e7929c/Lib/asyncio/base_events.py#L966-L1067 ...and after that I basically forgot about the original patch. The other day I bumped into a case where I needed exactly this (on Windows), so here I am, trying to revamp the original proposal. To be clear, the proposal is to add 3 new APIs in order to avoid the low-level cruft needed when creating a server socket (SO_REUSEADDR, getaddrinfo() etc.) and being able to support IPv4/6 dual-stack socket servers in a cross-platform fashion: - socket.has_dual_stack() - socket.create_server_sock() - socket.MultipleSocketsListener Whereas the first two functions are relatively straightforward, MultipleSocketsListener is more debatable because, for instance, it's not clear what methods like getsockname() should return (because there are 2 sockets involved). One possible solution is to *not* expose such (all get*?) methods and simply expose the underlying socket objects as in: >>> socket.MultipleSocketsListener(...).socks[0].getsockname() On the other hand, all set* / write methods (setblocking(), setsockopt(), shutdown(), ...) can be exposed and internally they can simply operate against both sockets. Thoughts? -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From edk141 at gmail.com Sun Mar 5 07:51:03 2017 From: edk141 at gmail.com (Ed Kellett) Date: Sun, 05 Mar 2017 12:51:03 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: <20170304094009.GG5689@ando.pearwood.info> References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> <20170304094009.GG5689@ando.pearwood.info> Message-ID: On Sat, 4 Mar 2017 at 09:46 Steven D'Aprano wrote: On Fri, Mar 03, 2017 at 10:35:18PM +0100, Michel Desmoulin wrote: > Since the start of the discussion, contesters have been offering > numerous solutions, all being contextual and with gotchas, none being > obvious, simple or elegant. I do not agree with that characterisation. Which solution do you think is obvious, simple or elegant? > "There should be one obvious way to do it" right? But what is "it" here? ... Any of the alternatives mentioned thus far solve the "too general" problem. As far as I can see, none of them would even be a contender if list.get existed, so I think the fact that people have spent time coming up with them is telling. In other words, the status quo. Be specific. Show some code -- its okay if its simplified code, but it should be enough to demonstrate the *use-case* for this. "I have crappy JSON" is not a use-case. How is it crappy and how would you use list.get to fix it? It's crappy because you have a list of things of unknown length (though I'm inclined to disagree with the "crappy", honestly?this would seem a perfectly reasonable thing to do if people weren't bizarrely against it). I haven't written much Python that is not secret for a while, so excuse the heavy paraphrasing: In my cases it is usually dealing with arguments: some sort of list has been sent to me down the wire, let's say ["kill", "edk"]. I could write a big fancy dispatcher, but I see that as unwarranted complexity, so I usually start with something simpler. I end up wanting to do something like: target = args[1] reason = args.get(2, "") I could equally use list.pop, which seems to be common when solving this problem with dicts?except list.pop doesn't take a default either, for some reason I've never quite understood. This brings us back to the point I made really early on: this *seems* like an obviously useful method, by analogy with dicts. I agree! It *seems* useful, so obviously such that one of the first things I added to my own personal toolbox of helper functions was a sequence get() function: def get(sequence, index, default=None): try: return sequence[index] except IndexError: return default But then I never used it. "Seems useful" != "is useful", at least in my experience. Helper functions suck, in my view. I'm all for a flat function namespace with a get() that works on anything, but that one doesn't, and "dict.get is for dicts and get is for sequences" seems ugly. > Plus Sven already estimated the implementation would not be very hard. The simplicity of the implementation argues *against* the need for this to be a built-in. If you really do need this, then why not add a sequence get() function to your project? Its only five lines! Why not: - Python tries very hard to stop you from adding get() to sequences. Mixing levels of namespacing feelss wrong, to me. - It's another function for everybody reading your program to have to remember about. - Functions have other costs too, in terms of documentation and testing. - It's likely incompatible with other copies of the same utility. As far as I have seen, only one person apart from myself, Kyle Lahnakoski, has implemented this helper in their own code. And Kyle says he has talked himself out of supporting this change. One thing I haven't seen is anyone saying "I am constantly writing and re-writing this same helper function over and over again! I grepped my code base and I've recreated this helper in 30 different modules. Maybe it should be a built-in?" That would be a good argument, but nobody has made it. Lots of people saying that they desperately need this method, but apparently most of them don't need it enough to write a five line helper function to get it. They'd rather wait until they've migrated all their code to Python 3.7. The helper function doesn't solve the problem for me. The existing solutions are, to my mind, ugly and non-obvious, and writing a helper that is still ugly and non-obvious doesn't make anything better. The place to solve this problem is in the API. > So we have one obvious solution to a problem that: > > - several professional programmers said they have I'm not convinced by claims that "I need to fetch arbitrary indexes from sequences ALL THE TIME, sorry I can't show any examples... It's hard to show examples because, generally speaking, when one can't do a thing one does something else instead. I can restructure my programs to avoid having this problem, or, if I'm in a hurry, I can use one of the many ugly h^W^Wobvious, simple and elegant solutions, like (args[2:3] + [""])[0]. In general, I remain curious about cases in which list.get could be used and would not be the preferred solution. Ed -------------- next part -------------- An HTML attachment was scrubbed... URL: From edk141 at gmail.com Sun Mar 5 08:03:33 2017 From: edk141 at gmail.com (Ed Kellett) Date: Sun, 05 Mar 2017 13:03:33 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On Sun, 5 Mar 2017 at 11:27 Paul Moore wrote: > On 3 March 2017 at 18:29, Ed Kellett wrote: > > - Which of the existing things (slice + [default], conditional on a slice, > > conditional on a len() call) do you think is the obvious way to do it? > > Write a small wrapper function that implements the functionality > (however you want, but it doesn't have to be a single expression, so > you've much more flexibility to make it readable) and then use that. > It's hardly a question of readability at that point. A reader is at the very least going to have to look at the signature of the utility function in order to be sure about argument order. > - Are there any examples where list.get would be applicable and not the > > obviously best way to do it? > > I don't understand the question. If you're asking which is better > between list.get and a custom written function as described above, > No. I'm asking: if list.get did exist, are there any cases (compatibility with old versions aside) where list.get's semantics would be applicable, but one of the alternatives would be the better choice? "writing a helper function" is a generally > useful idiom that works for many, many things, but list.get only > solves a single problem and every other such problem would need its > own separate language change. Custom helper functions can obviously accomplish anything in any language. If we had to choose between def: and list.get, I'd obviously opt for the former. The disadvantage that you have to write > the helper is trivial, because it's only a few lines of simple code: > I don't think the size of a helper function is relevant to how much of a disadvantage it is. Most built-in list methods are trivial to implement in Python, but I'm glad not to have to. Ed -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Mar 5 09:08:25 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 5 Mar 2017 14:08:25 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On 5 March 2017 at 13:03, Ed Kellett wrote: > > No. I'm asking: if list.get did exist, are there any cases (compatibility > with old versions aside) where list.get's semantics would be applicable, but > one of the alternatives would be the better choice? Self-evidently no. But what does that prove? That we should implement list.get? You could use the dientical argument for *anything*. There needs to be another reason for implementing it. >> "writing a helper function" is a generally >> useful idiom that works for many, many things, but list.get only >> solves a single problem and every other such problem would need its >> own separate language change. > > Custom helper functions can obviously accomplish anything in any language. > If we had to choose between def: and list.get, I'd obviously opt for the > former. > >> The disadvantage that you have to write >> the helper is trivial, because it's only a few lines of simple code: > > I don't think the size of a helper function is relevant to how much of a > disadvantage it is. Most built-in list methods are trivial to implement in > Python, but I'm glad not to have to. But you have yet to explain why you'd be glad not to write a helper for list.get, in any terms that don't simply boil down to "someone else did the work for me". I think we're going to have to just disagree. You won't convince me it's worth adding list.get unless you can demonstrate some *existing* costs that would be removed by adding list.get, and showing that they are greater than the costs of adding list.get (search this list if you don't know what those costs are - I'm not going to repeat them again, but they are *not* trivial). And I don't seem to be able to convince you that writing a helper function is a reasonable approach. Paul. From victor.stinner at gmail.com Sun Mar 5 09:26:06 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 5 Mar 2017 15:26:06 +0100 Subject: [Python-ideas] Add socket utilities for IPv4/6 dual-stack servers In-Reply-To: References: Message-ID: Would it be possible to create a PyPI project to experiement the API and wait until we collected enough user feedback first? Currently socket is low level. Not sure if I like the idea of putting more high level features in it? Asyncio is a good place for high level features, but is limited to async programming. Victor Le 5 mars 2017 12:46, "Giampaolo Rodola'" a ?crit : > Some years ago I started working on a patch for the socket module which > added a couple of utility functions for being able to easily create a > server socket, with the addition of being able to accept both IPv4 and IPv6 > connections (as a single socket): > https://bugs.python.org/issue17561 > > Given that not all OSes (Windows, many UNIXes) support this natively, I > later submitted a recipe adding a "multiple socket listener" class. > https://code.activestate.com/recipes/578504-server- > supporting-ipv4-and-ipv6/ > > From the user perspective, the whole thing can be summarized as follows: > > >>> sock = create_server_sock(("", 8000)) > >>> if not has_dual_stack(sock): > ... sock.close() > ... sock = MultipleSocketsListener([("0.0.0.0", 8000), ("::", 8000)]) > >>> > > Part of this work ended up being included into Tulip internals: > https://github.com/python/cpython/blob/70d28a184c42d107cc8c69a95aa52a > 4469e7929c/Lib/asyncio/base_events.py#L966-L1067 > ...and after that I basically forgot about the original patch. The other > day I bumped into a case where I needed exactly this (on Windows), so here > I am, trying to revamp the original proposal. > > To be clear, the proposal is to add 3 new APIs in order to avoid the > low-level cruft needed when creating a server socket > (SO_REUSEADDR, getaddrinfo() etc.) > and being able to support IPv4/6 dual-stack socket servers in a > cross-platform fashion: > > - socket.has_dual_stack() > - socket.create_server_sock() > - socket.MultipleSocketsListener > > Whereas the first two functions are relatively straightforward, > MultipleSocketsListener is more debatable because, for instance, it's not > clear what methods like getsockname() should return (because there are 2 > sockets involved). One possible solution is to *not* expose such (all > get*?) methods and simply expose the underlying socket objects as in: > > >>> socket.MultipleSocketsListener(...).socks[0].getsockname() > > On the other hand, all set* / write methods (setblocking(), setsockopt(), > shutdown(), ...) can be exposed and internally they can simply operate > against both sockets. > > Thoughts? > > -- > Giampaolo - http://grodola.blogspot.com > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Mar 5 12:51:57 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 09:51:57 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sat, Mar 4, 2017 at 11:19 PM, Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > (Summary of results: my patch at https://bugs.python.org/issue28685 makes > list.sort() 30-50% faster in common cases, and at most 1.5% slower in the > uncommon worst case.) > Thanks for the hard work, this looks very promising. > While listsort.txt talks about benchmarking different kinds of structured > lists, instead of just random lists, the results here would hold in those > cases just as well, because this makes individual comparisons cheaper, > instead of reducing the number of comparisons based on structure). > This point seems wrong on its face. The main point of Timsort is to avoid comparisons in mostly-sorted lists. In makes a lot of difference whether the initial data is random or mostly-sorted. I presume the type homogeneity check takes a fixed O(N) time to perform. In the random case, the sort will take O(N * log N) comparisons. As a multiplier, naive comparisons are clearly more expensive than type check. But whether that type checking overhead is worthwhile obviously depends on the number of comparisons, which might be O(N) if the data is sorted. Real world data tends to be mostly-sorted. So it would be useful for your benchmarks to include: A) Performance on completely sorted data i) Of homogenous type ii) Of mixed type B) Performance on mostly-sorted data i) Of homogenous type ii) Of mixed type -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Mar 5 13:13:53 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 10:13:53 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> <20170304094009.GG5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 4:51 AM, Ed Kellett wrote: > It's hard to show examples because, generally speaking, when one can't do > a thing one does something else instead. I can restructure my programs to > avoid having this problem, or, if I'm in a hurry, I can use one of the many > ugly h^W^Wobvious, simple and elegant solutions, like (args[2:3] + [" reason given>"])[0]. > > For the record, even though I was the first in this thread to give that spelling, I don't think it's the best way to spell it in current Python. Bracketing a helper function (which *could* after all deal with lists, dicts, and whatever other type you wanted depending on what you implement), I think the best spelling is with a ternary: reason = args[1] if len(args)>1 else "" The more the thread continues the more I actively want to avoid a list.get() method. Initially I thought it added symmetry; but as I look at it I realize it is mostly code smell. Being able to get "a value" from an arbitrary position in a list that isn't long enough often suggests something is deeply wrong in the logic of the code. Moreover, it is likely to let bugs pass silently and cause deeper problems elsewhere downstream. If you have a list that is expected to have a length of either 1 or 2, I can imagine this making sense (e.g. ["kill"] vs. ["kill", "edk"]): reason = args.get(1, "") But if the next line is: data = args.get(17, "") Then I'm pretty sure the programmer thinks she's being passed a very different type of collection than is actually available. I'd rather that fails right away and in an obvious way then silently produce a value. Specifically, if I think I'm dealing with a list that is likely to have 20 items (rather than maybe 4 or fewer), I'm almost sure the best way to deal with it is in a list (or comprehension, map(), etc) and NOT by poking into large index positions that may or may not be present. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Mar 5 13:17:59 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 10:17:59 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> <20170304094009.GG5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 10:13 AM, David Mertz wrote: > Specifically, if I think I'm dealing with a list that is likely to have 20 > items (rather than maybe 4 or fewer), I'm almost sure the best way to deal > with it is in a list (or comprehension, map(), etc) and NOT by poking into > large index positions that may or may not be present. > I meant "in a LOOP" above. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From edk141 at gmail.com Sun Mar 5 14:13:49 2017 From: edk141 at gmail.com (Ed Kellett) Date: Sun, 05 Mar 2017 19:13:49 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On Sun, 5 Mar 2017 at 14:08 Paul Moore wrote: > Self-evidently no. But what does that prove? That we should implement > list.get? You could use the dientical argument for *anything*. There > needs to be another reason for implementing it. > I don't think that's true. It's not true for many other list methods, for example. It's not a reason for implementing it, but it does suggest to me that it would increase the one-obvious-way-to-do-it-ness of the language. But you have yet to explain why you'd be glad not to write a helper > for list.get, in any terms that don't simply boil down to "someone > else did the work for me". > What? Five lines isn't work, it's just ugly. I don't want to add a lot of random utility functions like this because it drastically increases the effort required to read my code. It's hardly any effort, but it doesn't solve any problems, so why bother? The point about this as a Python change is that it's a standard. Who does the work couldn't be less relevant; what matters is that it would add a consistent and easy spelling for something that doesn't have one. > I think we're going to have to just disagree. You won't convince me > it's worth adding list.get unless you can demonstrate some *existing* > costs that would be removed by adding list.get, and showing that they > are greater than the costs of adding list.get (search this list if you > don't know what those costs are - I'm not going to repeat them again, > but they are *not* trivial). They seem to be "it'd need to be added to Sequence too" and "it would mess with code that checks for a .get method to determine whether something is a mapping". It's easily implementable in Sequence as a mixin method, so I'm prepared to call that trivial, and I'm somewhat sceptical that the latter code does?let alone should?exist. > And I don't seem to be able to convince > you that writing a helper function is a reasonable approach. > I feel like I'm saying this a lot, but writing helper functions has its own readability cost. I'm not trying to get anyone to implement list.get, I'm trying to get it centrally documented and allowed into list's overly-mappingproxied namespace. Ed -------------- next part -------------- An HTML attachment was scrubbed... URL: From edk141 at gmail.com Sun Mar 5 14:22:28 2017 From: edk141 at gmail.com (Ed Kellett) Date: Sun, 05 Mar 2017 19:22:28 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> <20170304094009.GG5689@ando.pearwood.info> Message-ID: On Sun, 5 Mar 2017 at 18:13 David Mertz wrote: > But if the next line is: > > data = args.get(17, "") > > Then I'm pretty sure the programmer thinks she's being passed a very > different type of collection than is actually available. I'd rather that > fails right away and in an obvious way then silently produce a value. > That's up to the programmer. args[17] exists and does fail immediately. If the programmer provides a default value, presumably they know they want one. > Specifically, if I think I'm dealing with a list that is likely to have 20 > items (rather than maybe 4 or fewer), I'm almost sure the best way to deal > with it is in a loop (or comprehension, map(), etc) and NOT by poking into > large index positions that may or may not be present. > I really think that depends what it's a list of. If the positions of things in the list are important (as with an argument parser, or perhaps a lookup table) I fail to see why it would be wrong to peek. If lists were really designed to be used only as you and some others in this thread are suggesting, I don't think they'd have indexed access at all. Ed -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Mar 5 14:54:12 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 11:54:12 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> <20170304094009.GG5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 11:22 AM, Ed Kellett wrote: > On Sun, 5 Mar 2017 at 18:13 David Mertz wrote: > >> data = args.get(17, "") >> >> Then I'm pretty sure the programmer thinks she's being passed a very >> different type of collection than is actually available. I'd rather that >> fails right away and in an obvious way then silently produce a value. >> Specifically, if I think I'm dealing with a list that is likely to have 20 >> items (rather than maybe 4 or fewer), I'm almost sure the best way to deal >> with it is in a loop (or comprehension, map(), etc) and NOT by poking into >> large index positions that may or may not be present. >> > > That's up to the programmer. args[17] exists and does fail immediately. If > the programmer provides a default value, presumably they know they want one. > In terms of an actual use case, I can see it for "Lists no longer than 4". Any other use of this hypothetical method would be an anti-pattern and be a bad habit. Yes, programmers can do what they want, but providing a method is a hint to users (especially beginners, but not only) that that is the "right way" to do it. > I really think that depends what it's a list of. If the positions of > things in the list are important (as with an argument parser, or perhaps a > lookup table) I fail to see why it would be wrong to peek. > But the positions NEVER are important when you get to a 20 item list. If you design an argument parser that is looking for "the 17th argument" you are doing it wrong. I'm not saying that's impossible (nor even hard) to program, but it's not good practice. Sure, I'm happy to take 20+ arguments, especially if they result from a glob pattern used at the command line, naming files. But when I'm doing that, I want to deal with those filenames in a loop, handling each one as necessary. In that pattern, I *never* want exactly 20 arguments, but rather "however many things there are to handle." > If lists were really designed to be used only as you and some others in > this thread are suggesting, I don't think they'd have indexed access at all. > Actually, when I teach I make a big point of telling students (for me, professional scientists and programmers who have used other PLs) that if they are indexing a list they should look again and question whether that's the right pattern. Of course there are times when it's needed, but they are fewer than C, Java, or Fortran programmers think. If this method existed, I'd want it implemented roughly like this: In [1]: class GetList(list): ...: def get(self, i, default=None): ...: if i > 4: ...: raise NotImplemented("You should NOT use this method for long lists!") ...: try: ...: return self[i] ...: except IndexError: ...: return default ...: In [2]: l = GetList(['err','my message']) In [3]: l.get(1, 'no message') Out[3]: 'my message' In [4]: l.get(2, 'no details') Out[4]: 'no details' In [5]: l.get(10, 'some data') --------------------------------------------------------------------- TypeError Traceback (most recent call last) in () ----> 1 l.get(10, 'some data') in get(self, i, default) 2 def get(self, i, default=None): 3 if i > 4: ----> 4 raise NotImplemented("You should NOT use this method for long lists!") 5 try: 6 return self[i] TypeError: 'NotImplementedType' object is not callable -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Mar 5 15:06:58 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Mar 2017 07:06:58 +1100 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On Mon, Mar 6, 2017 at 6:13 AM, Ed Kellett wrote: > The point about this as a Python change is that it's a standard. Who does > the work couldn't be less relevant; what matters is that it would add a > consistent and easy spelling for something that doesn't have one. Oh, absolutely! Because a language totally needs to have a consistent and easy spelling for everything. How about reading one line of text from a compressed file and stripping HTML tags from it? http://php.net/manual/en/function.gzgetss.php Not everything has to be a single function/method. Some things are allowed to be more than one. To justify list.get(), you have to show that it's actually worth adding to the language, and so far, all you've said is "but wouldn't it be nice". Show us code that would be improved by this method. SHOW US CODE. > They seem to be "it'd need to be added to Sequence too" and "it would mess > with code that checks for a .get method to determine whether something is a > mapping". It's easily implementable in Sequence as a mixin method, so I'm > prepared to call that trivial, and I'm somewhat sceptical that the latter > code does?let alone should?exist. Classes don't generally inherit from Sequence, though. You can't simply add methods to these kinds of protocols. ChrisA From edk141 at gmail.com Sun Mar 5 15:16:12 2017 From: edk141 at gmail.com (Ed Kellett) Date: Sun, 05 Mar 2017 20:16:12 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> <20170304094009.GG5689@ando.pearwood.info> Message-ID: On Sun, 5 Mar 2017 at 19:54 David Mertz wrote: > In terms of an actual use case, I can see it for "Lists no longer than 4". > That's an excessively hard limit. > Any other use of this hypothetical method would be an anti-pattern > What really is the point of this? You (not uniquely) have been quick to dismiss any uses for this as misguided. If you must do that, you might stick to the reasons; meaningless labels don't add anything. > Yes, programmers can do what they want, but providing a method is a hint > to users (especially beginners, but not only) that that is the "right way" > to do it. > > > I really think that depends what it's a list of. If the positions of > things in the list are important (as with an argument parser, or perhaps a > lookup table) I fail to see why it would be wrong to peek. > > > But the positions NEVER are important when you get to a 20 item list. If > you design an argument parser that is looking for "the 17th argument" you > are doing it wrong. I'm not saying that's impossible (nor even hard) to > program, but it's not good practice. > That's probably true of argument parsers, certainly not lookup tables. > Sure, I'm happy to take 20+ arguments, especially if they result from a > glob pattern used at the command line, naming files. But when I'm doing > that, I want to deal with those filenames in a loop, handling each one as > necessary. In that pattern, I *never* want exactly 20 arguments, but > rather "however many things there are to handle." > > > If lists were really designed to be used only as you and some others in > this thread are suggesting, I don't think they'd have indexed access at all. > > > Actually, when I teach I make a big point of telling students (for me, > professional scientists and programmers who have used other PLs) that if > they are indexing a list they should look again and question whether that's > the right pattern. Of course there are times when it's needed, but they > are fewer than C, Java, or Fortran programmers think. > I don't think that assessment applies neatly everywhere. Indexing is generally unnecessary when it's being used instead of iteration, but this thread is explicitly about cases where iteration isn't wanted. Ed -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Mar 5 15:29:13 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 12:29:13 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> <20170304094009.GG5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 12:16 PM, Ed Kellett wrote: > On Sun, 5 Mar 2017 at 19:54 David Mertz wrote: > >> In terms of an actual use case, I can see it for "Lists no longer than 4". >> > Any other use of this hypothetical method would be an anti-pattern >> > > That's an excessively hard limit. > Maybe 5... in special circumstances :-) > But the positions NEVER are important when you get to a 20 item list. If >> you design an argument parser that is looking for "the 17th argument" you >> are doing it wrong. I'm not saying that's impossible (nor even hard) to >> program, but it's not good practice. >> > > That's probably true of argument parsers, certainly not lookup tables. > I can think of a few special cases where index positions are useful. But they aren't common enough to warrant a new method, nor hard to do with the existing language. E.g.: for i, data in enumerate(base_data): extra = extra_data[i] if len(extra_data) > i else DEFAULT combine(data, extra) This might well use lists thousands of items long, and maybe `extra_data` runs out before `base_data`. That code would look very slightly nicer with `extra_data.get()`. On the other hand, better than either is: from itertools import zip_longest for data, extra in zip_longest(base, extra_data, fillvalue=DEFAULT): combine(data, extra) So far no one in this thread has presented any (non-trivial) code that would be better if `list.get()` existed. I think I have personally come closest, but I actively want it not to happen because it's an anti-pattern. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Sun Mar 5 16:19:32 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Sun, 05 Mar 2017 21:19:32 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 10:51 AM David Mertz wrote: > Thanks for the hard work, this looks very promising. > Thank you! > Real world data tends to be mostly-sorted. So it would be useful for your > benchmarks to include: > > A) Performance on completely sorted data > i) Of homogenous type > ii) Of mixed type > B) Performance on mostly-sorted data > i) Of homogenous type > ii) Of mixed type > You are entirely correct, as the benchmarks below demonstrate. I used the benchmark lists from Objects/listsort.txt, which are: \sort: descending data /sort: ascending data 3sort: ascending, then 3 random exchanges +sort: ascending, then 10 random at the end %sort: ascending, then randomly replace 1% of elements w/ random values ~sort: many duplicates =sort: all equal My results are below (the script can be found at https://github.com/embg/python-fastsort-benchmark/blob/master/bench-structured.py ): Homogeneous ([int]): \sort: 54.6% /sort: 56.5% 3sort: 53.5% +sort: 55.3% %sort: 52.4% ~sort: 48.0% =sort: 45.2% Heterogeneous ([int]*n + [0.0]): \sort: -17.2% /sort: -19.8% 3sort: -18.0% +sort: -18.8% %sort: -10.0% ~sort: -2.1% =sort: -21.3% As you can see, because there's a lot less non-comparison overhead in the structured lists, the impact of the optimization is much greater, both in performance gain and in worst-case cost. However, I would argue that these data do not invalidate the utility of my patch: the probability of encountering a type-heterogeneous list is certainly less than 5% in practice. So the expected savings, even for structured lists, is something like (5%)(-20%) + (95%)(50%) = 46.5%. And, of course, not *all* the lists one encounters in practice are structured; certainly not *this* structured. So, overall, I would say the numbers above are extremely encouraging. Thanks for pointing out the need for this benchmark, though! Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sun Mar 5 16:36:16 2017 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sun, 5 Mar 2017 21:36:16 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <58B9A1B6.80501@stoneleaf.us> <58B9C567.8020507@stoneleaf.us> <143be2c8-b1d2-40d5-8a3d-2fd11cd9b7b8@gmail.com> <20170304094009.GG5689@ando.pearwood.info> Message-ID: On 05/03/2017 20:16, Ed Kellett wrote: > On Sun, 5 Mar 2017 at 19:54 David Mertz > wrote: > > In terms of an actual use case, I can see it for "Lists no longer > than 4". > > That's an excessively hard limit. > > Any other use of this hypothetical method would be an anti-pattern > > > What really is the point of this? You (not uniquely) have been quick to > dismiss any uses for this as misguided. If you must do that, you might > stick to the reasons; meaningless labels don't add anything. > > > Yes, programmers can do what they want, but providing a method is a > hint to users (especially beginners, but not only) that that is the > "right way" to do it. > > > I really think that depends what it's a list of. If the > positions of things in the list are important (as with an > argument parser, or perhaps a lookup table) I fail to see why it > would be wrong to peek. > > > But the positions NEVER are important when you get to a 20 item > list. If you design an argument parser that is looking for "the > 17th argument" you are doing it wrong. I'm not saying that's > impossible (nor even hard) to program, but it's not good practice. > > > That's probably true of argument parsers, certainly not lookup tables. > > > Sure, I'm happy to take 20+ arguments, especially if they result > from a glob pattern used at the command line, naming files. But > when I'm doing that, I want to deal with those filenames in a loop, > handling each one as necessary. In that pattern, I *never* want > exactly 20 arguments, but rather "however many things there are to > handle." > > > If lists were really designed to be used only as you and some > others in this thread are suggesting, I don't think they'd have > indexed access at all. > > > Actually, when I teach I make a big point of telling students (for > me, professional scientists and programmers who have used other PLs) > that if they are indexing a list they should look again and question > whether that's the right pattern. Of course there are times when > it's needed, but they are fewer than C, Java, or Fortran programmers > think. > > > I don't think that assessment applies neatly everywhere. Indexing is > generally unnecessary when it's being used instead of iteration, but > this thread is explicitly about cases where iteration isn't wanted. > > Ed > Patches are always welcome. If you insist that it's needed, you do the work. Hopefully it's easier with the move to github. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From p.f.moore at gmail.com Sun Mar 5 18:22:33 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 5 Mar 2017 23:22:33 +0000 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: On 5 March 2017 at 19:13, Ed Kellett wrote: >> I think we're going to have to just disagree. You won't convince me >> it's worth adding list.get unless you can demonstrate some *existing* >> costs that would be removed by adding list.get, and showing that they >> are greater than the costs of adding list.get (search this list if you >> don't know what those costs are - I'm not going to repeat them again, >> but they are *not* trivial). > > > They seem to be "it'd need to be added to Sequence too" and "it would mess > with code that checks for a .get method to determine whether something is a > mapping". It's easily implementable in Sequence as a mixin method, so I'm > prepared to call that trivial, and I'm somewhat sceptical that the latter > code does?let alone should?exist. > You didn't seem to find the post(s) I referred to. I did a search for you. Here's one of the ones I was talking about - https://mail.python.org/pipermail/python-ideas/2017-February/044807.html You need to present sufficient benefits for list.get to justify all of the costs discussed there - or at least show some understanding of those costs and an appreciation that you're asking *someone* to pay those costs if you expect a proposal to add *anything* to the Python language seriously. But I quit at this point - you seem intent on not appreciating the other sides of this argument, so there's not really much point continuing. Paul PS And yes, I do appreciate your point here - a get method on lists may be useful. And helpers (if you don't name them well, for instance) aren't always the best solution. But I've never yet seen *any* code that would be improved by using a list.get method, so although I understand the argument in theory, I don't see the practical benefits. From tjreedy at udel.edu Sun Mar 5 18:50:21 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 5 Mar 2017 18:50:21 -0500 Subject: [Python-ideas] for/except/else In-Reply-To: References: Message-ID: On 3/4/2017 10:17 PM, Nick Coghlan wrote: > I forget where it came up, but I seem to recall Guido saying that if he > were designing Python today, he wouldn't include the "else:" clause on > loops, since it inevitably confuses folks the first time they see it. > (Hence articles like mine that attempt to link it with try/except/else > rather than if/else). The link needs to be done in two separate steps: if-else => while-else => for-else. Step 1. The conceptual difference between an if-clause and a while-clause is that an if-clause ends with an implied 'break' (jump past the rest of the statement to the next statement) while a while-clause ends with an implied 'continue' (jump back to the condition test). In the CPython byte code, this difference is implemented by JUMP_FORWARD versus JUMP_ABSOLUTE (backwards). (The byte code for a while-clause also has two bookkeeping additions, SETUP_LOOP and POP_BLOCK.) In both if-statements and while-statements, the else-clause is executed if and when the condition is false. I wonder if having never programming with with 'jump' or 'goto' makes this harder to understand. I think our doc could better explain how if-statements and while-statements are similar but different. Step 2. A for-loop can be viewed as an initialized while-loop. Kernighan and Ritchie make this explicit in The C Programming Language (p. 56). ''' The for statement for (expr1; expr2; expr3) statement is equivalent to expr1; while (expr2) { statement expr3; } ''' For Python's more specialized for-loop, 'for target-list in iterable: for-suite' can be paraphrased as 'while the iterator yields an object (is not exhausted): do the assignment and suite'. An else-clause, if present, is executed if and when the condition is false, when the iterator is exhausted and raises StopIteration instead of yielding an object. If while-else is understood and the implicit for condition is understood, for-else is pretty straightforward. Equivalent code, with an else-clause, is trickier than in C, without for-else. I think the following is close. I think something like this should be in the doc. _it = iter(iterable) _exhausted = False while not _exhausted: try: _obj = next(_it) except StopIteration: _exhausted = True continue target-list = _obj for-suite else: else-suite Note 1: without the else-clause, _exhausted is not needed. The loop could be 'while True' and the except clause could just be 'break'. Note 2: _obj being assignment compatible with target list is NOT part of the implicit while condition. For example, for a,b in (1,2): print('for') else: print('else') prints nothing except the traceback for TypeError: 'int' object is not iterable. Note 3: C for typically assigns to the loop variable once before the loop and and again at the end of each loop. Python for does the assignment once at the top of the loop. -- Terry Jan Reedy From songofacandy at gmail.com Sun Mar 5 20:30:29 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 6 Mar 2017 10:30:29 +0900 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: Good job. I'll read your work later. > relatively simple optimization. I would also add that Python dictionaries > already implement this optimization: they start out optimizing based on the > assumption that they'll only be seeing string keys, checking to make sure > that assumption holds as they go. If they see a non-string key, they > permanently switch over to the general implementation. Please notice dict is very very special. Since Python is dynamic language, name resolution is done in runtime, not compile time. And most name resolution cause lookup string key from dict. So please don't use string-key dict specialization to rationalize list-sort specialization. Each specialization should be considered carefully about balance between maintenance cost and benefit. I feel 2 (int, string) or 3 (int, string, tuple) special case may be worth enough if code is clean and benefit seems good. But more complex cases can be done in third party libraries, like numpy. In third party library, you can use more powerful optimization like nan-boxing. In Python builtin list, this is hard because we guarantee object identity is not changed. Regards, From elliot.gorokhovsky at gmail.com Sun Mar 5 20:53:56 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 01:53:56 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 6:30 PM INADA Naoki wrote: > Good job. I'll read your work later. > Thanks! So please don't use string-key dict specialization to rationalize > list-sort specialization. > I'm not quite doing that: I agree that dict is a special case because of internal use. I'm just saying that I'm not the first person to try to exploit type homogeneity in data structures in CPython. However, my justification is independent of dict specialization. Namely, it's the following two considerations: (1) lists are idiomatically type-homogeneous. To quote the Python documentation, "Lists are mutable, and *their elements are usually homogeneous* and are accessed by iterating over the list". (2) it's not very common to have to "compare apples and oranges". While it's technically possible to define comparison between any two types you like, and in practice one sometimes compares e.g. ints and floats, in practice it's pretty safe to assume the lists you're sorting are going to be type-homogeneous 95% or 99% of the time. I feel 2 (int, string) or 3 (int, string, tuple) special case may be worth > enough if code is clean and benefit seems good. > I agree -- I only optimized for int, string, float, and tuple. However, my code optimizes sorting for *any* type-homogeneous list as well, by simply replacing PyObject_RichCompareBool with ob_type->richcompare, saving the method lookup time. This is what gives the speedups for non-latin strings and bigints in the benchmarks I shared in my original post: I don't have a special case compare function for them, but by setting the compare function pointer equal to ob_type->richcompare, I can at least save a little bit of method lookup time. In terms of clean code, the special-case compare functions include assertions in #ifdef Py_DEBUG blocks that list all the requirements necessary to make them safe. The pre-sort check then verifies the assumptions for whichever optimized compare is going to be used. So all that's necessary to verify correctness is to convince oneself that (1) the assertions are sufficient and (2) the pre-sort check will never use a compare function for which it cannot prove the assertions. These two points are pretty easy to check: (1) Just plug in the assertions in the original code, e.g. unicode_compare in Objects/unicodeobject.c. Then you have a bunch of if(1) and if(0) blocks, and if you delete the if(0) blocks you'll get exactly what I have in the patch. (2) The pre-sort check code is very simple, and well commented. I would add further that this patch only actually adds one line of code to listsort: assign_compare_function(lo.keys, saved_ob_size). The rest of the patch is just function definitions, and replacing PyObject_RichCompareBool with a function pointer (in the macro for ISLT(X,Y)). Anyway, thanks in advance for your time! Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sun Mar 5 21:44:08 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 05 Mar 2017 18:44:08 -0800 Subject: [Python-ideas] get() method for list and tuples In-Reply-To: References: <2b7ac126-881e-f79b-6157-c45100904707@gmail.com> <20170228144519.GK5689@ando.pearwood.info> <58B9A1B6.80501@stoneleaf.us> Message-ID: <58BCCCF8.7020403@stoneleaf.us> On 03/05/2017 11:13 AM, Ed Kellett wrote: > I'm not trying to get anyone to implement list.get, I'm trying to get it centrally > documented and allowed into list's overly-mappingproxied namespace. --> dir(list) # non dunder methods 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort' --> dir(dict) # non dunder methods 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values' I will admit I'm not entirely sure what you meant with that last statement, but it sounds like you think `list` has a bunch of the same methods as `dict` does. I count 11 methods on dict, 11 methods on list, and only 3 methods that are shared (1 of which doesn't work precisely the same on both). So I'm not seeing a bunch of similarity between the two. And I'm not seeing any progress here in this discussion, so I'm dropping out now. -- ~Ethan~ From steve at pearwood.info Sun Mar 5 21:45:55 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 6 Mar 2017 13:45:55 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: <20170306024554.GK5689@ando.pearwood.info> On Sun, Mar 05, 2017 at 07:19:43AM +0000, Elliot Gorokhovsky wrote: > You may remember seeing some messages on here about optimizing list.sort() > by exploiting type-homogeneity: since comparing apples and oranges is > uncommon (though possible, i.e. float to int), it pays off to check if the > list is type-homogeneous I sometimes need to know if a list is homogenous, but unfortunately checking large lists for a common type in pure Python is quote slow. Here is a radical thought... why don't lists track their common type themselves? There's only a few methods which can add items: - append - extend - insert - __setitem__ Suppose we gave lists a read-only attrribute, __type_hint__, which returns None for hetrogeneous lists and the type for homogeneous lists. Adding an item to the list does as follows: - if the list is empty, adding an item sets __type_hint__ to type(item); - if the list is not empty, adding an item tests whether type(item) is identical to (not a subclass) of __type_hint__, and if not, sets __type_hint__ to None; - removing an item doesn't change the __type_hint__ unless the list becomes empty, in which case it is reset to None; - if the internal allocated space of the list shrinks, that triggers a recalculation of the __type_hint__ if it is currently None. (There's no need to recalculate the hint if it is not None.) Optional: doing a list.sort() could also recalculate the hint. The effect will be: - if __type_hint__ is a type object, then you can be sure that the list is homogeneous; - if the __type_hint__ is None, then it might still be homogeneous, but it isn't safe to assume so. Not only could sorting take advantage of the type hint without needing to do a separate O(N) scan of the list, but so could other code. I know I would be interested in using this. I have a fair amount of code that has to track the type of any items seen in a list, and swap to a "type agnostic but slow" version if the list is not homogeneous. I could probably replace that with some variation of: if thelist.__type_hint__ is None: process_slow(thelist) else: process_fast(thelist) At the very least, I'd be interested in experimenting with this. Thoughts? -- Steve From rosuav at gmail.com Sun Mar 5 21:46:46 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Mar 2017 13:46:46 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Mon, Mar 6, 2017 at 12:53 PM, Elliot Gorokhovsky wrote: > (1) lists are idiomatically type-homogeneous. To quote the Python > documentation, "Lists are mutable, and *their elements are usually > homogeneous* and are accessed by iterating over the list". > (2) it's not very common to have to "compare apples and oranges". While it's > technically possible to define comparison between any two types you like, > and in practice one sometimes compares e.g. ints and floats, in practice > it's pretty safe to assume the lists you're sorting are going to be > type-homogeneous 95% or 99% of the time. I would be rather curious to know how frequently a list consists of "numbers", but a mix of ints and floats. From the point of view of a list's purpose, they're all numbers (point 1 satisfied), and they're all comparable (point 2 satisfied), but from the POV of your patch, it's heterogeneous and suffers a performance penalty. Does it happen a lot in real-world code? (Apologies if this has already been mentioned; I've been skimming the thread, not reading it in detail.) ChrisA From elliot.gorokhovsky at gmail.com Sun Mar 5 21:57:44 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 02:57:44 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <20170306024554.GK5689@ando.pearwood.info> References: <20170306024554.GK5689@ando.pearwood.info> Message-ID: This is exactly what I would have done if I didn't think the changes would be too large-scale to ever be accepted (at least if proposed by someone without any experience). This is also what my friend suggested when I explained my project to him. In fact, this is exactly what dictionaries do: at each insert, they check whether a string is being inserted. If yes, then stay in a "string" state, if no, then switch to your None state. Again, the only reason I did it differently is because I wanted to be realistic about what changes I could get accepted. My patch does everything inside the list sort function, and doesn't modify anything outside of that. So it's presumably much easier to review. I mean, I submitted my patch in November, and it still hasn't been accepted :) so presumably something as large-scale as this would have very little chance. One more thing: changing mutating methods to check type would make them non-trivially slower; if you aren't resizing the list, appending should really just be as simple as storing a pointer and incrementing a counter. If we have to check type, that turns two simple things into three simple things, which could have significant performance implications. Most applications of lists *don't* care about type homogeneity (i.e. iterating through the list or whatever), so one would have to justify imposing this cost on all list use just to optimize for the special cases where you *do* care. I'm not sure how one would go about checking if that trade-off is a good one or not: what is the distribution of list use across all Python code? It's not something you can really check statically (it's undecidable if you're being pedantic). With my patch, it's easy: if you aren't sorting, you don't have to pay anything. If you are sorting, you either win or lose, based on whether or not you're sorting type-homogeneous data or not (which you basically always are). If anything, I think my patch could be a starting point for later optimization along these lines, depending on whether the problem raised in the previous paragraph can be addressed. What do you think? On Sun, Mar 5, 2017 at 7:47 PM Steven D'Aprano wrote: On Sun, Mar 05, 2017 at 07:19:43AM +0000, Elliot Gorokhovsky wrote: > You may remember seeing some messages on here about optimizing list.sort() > by exploiting type-homogeneity: since comparing apples and oranges is > uncommon (though possible, i.e. float to int), it pays off to check if the > list is type-homogeneous I sometimes need to know if a list is homogenous, but unfortunately checking large lists for a common type in pure Python is quote slow. Here is a radical thought... why don't lists track their common type themselves? There's only a few methods which can add items: - append - extend - insert - __setitem__ Suppose we gave lists a read-only attrribute, __type_hint__, which returns None for hetrogeneous lists and the type for homogeneous lists. Adding an item to the list does as follows: - if the list is empty, adding an item sets __type_hint__ to type(item); - if the list is not empty, adding an item tests whether type(item) is identical to (not a subclass) of __type_hint__, and if not, sets __type_hint__ to None; - removing an item doesn't change the __type_hint__ unless the list becomes empty, in which case it is reset to None; - if the internal allocated space of the list shrinks, that triggers a recalculation of the __type_hint__ if it is currently None. (There's no need to recalculate the hint if it is not None.) Optional: doing a list.sort() could also recalculate the hint. The effect will be: - if __type_hint__ is a type object, then you can be sure that the list is homogeneous; - if the __type_hint__ is None, then it might still be homogeneous, but it isn't safe to assume so. Not only could sorting take advantage of the type hint without needing to do a separate O(N) scan of the list, but so could other code. I know I would be interested in using this. I have a fair amount of code that has to track the type of any items seen in a list, and swap to a "type agnostic but slow" version if the list is not homogeneous. I could probably replace that with some variation of: if thelist.__type_hint__ is None: process_slow(thelist) else: process_fast(thelist) At the very least, I'd be interested in experimenting with this. Thoughts? -- Steve _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Sun Mar 5 22:03:22 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 03:03:22 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico wrote: > > I would be rather curious to know how frequently a list consists of > "numbers", but a mix of ints and floats. Does it happen a > lot in real-world code? > > This is of course undecidable to verify statically, so we can't just crawl PyPI... however, I would argue that using mixed float-int lists is dangerous, and is more dangerous in Python 3 than in Python 2. So hopefully this is not very common. However, even if 10% (surely a vast overestimate) of sort calls are to mixed int-float lists, my patch would still yield a significant savings on average. (Apologies if this has already been mentioned; I've been skimming the > thread, not reading it in detail.) It hasn't been mentioned in this thread :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Mar 5 22:08:05 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 19:08:05 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <20170306024554.GK5689@ando.pearwood.info> References: <20170306024554.GK5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 6:45 PM, Steven D'Aprano wrote: > Here is a radical thought... why don't lists track their common type > themselves? There's only a few methods which can add items: > I had exactly the same thought. Lists would need to grow a new attribute, of course. I'm not sure how that would affect the object layout and word boundaries. But there might be free space for another attribute slot. The real question is whether doing this is a win. On each append/mutation operation we would need to do a comparison to the __type_hint__ (assuming Steven's spelling of the attribute). That's not free. Balancing that, however, when we actually *did* a sort, it would be O(1) to tell if it was homogeneous (and also the actual type if yes) rather than O(N). This question isn't really subject to microbenchmarks though. If we added __type_hint__ as None/type-object and added those comparisons to it on .insert()/.append()/etc, then we would be slower by some increment while all we were doing was adding things. There could only be a win when the list is sorted (but a big win for that case). In real world code, how often are lists sorted? Also I'm not really confident that Elliot's estimates of 95% of lists being homogeneous holds, but the speedup he proposes would seem to win even if that percentage is a lot lower than 95%. If only 10% of lists in real world code ever get `my_list.sort()` called on them, Steven's idea is probably not good. If 50% of lists do, it probably is. But then, it depends just how *often* lists that get sorted are re-sorted too. Yours, David... P.S. I think that given that we can .append() then delete items to make a heterogenous list become homogeneous again, Elliot's idea is somewhat orthogonal to Steven's. That is, even if we had .__type_hint__ on lists, it might be a "false None" and it could still be worth doing Elliot's linear scan anyway. On the other hand, the None-ness on non-empty lists might be a good enough predictor of heterogeneity in real world code that the linear scan would almost always be a waste. I do not know without benchmarks against real codebases. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Mar 5 22:09:49 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Mar 2017 14:09:49 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Mon, Mar 6, 2017 at 2:03 PM, Elliot Gorokhovsky wrote: > On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico wrote: >> >> >> I would be rather curious to know how frequently a list consists of >> "numbers", but a mix of ints and floats. Does it happen a >> lot in real-world code? >> > > This is of course undecidable to verify statically, so we can't just crawl > PyPI... however, I would argue that using mixed float-int lists is > dangerous, and is more dangerous in Python 3 than in Python 2. So hopefully > this is not very common. However, even if 10% (surely a vast overestimate) > of sort calls are to mixed int-float lists, my patch would still yield a > significant savings on average. I agree that it's dangerous, but it is still common for programmers and Python alike to treat 10 as functionally identical to 10.0 - although as to being more dangerous in Py3, that's much of a muchness (for instance, the single-slash division operator in Py2 can potentially truncate, but in Py3 it's always going to give you a float). But, fair point. I very much doubt it's as high as 10%, so yeah, that would be advantageous. Also, the performance hit is so small, and even that is in the very worst case (a homogeneous list with one different type at the end). I like changes that make stuff run faster. ChrisA From mertz at gnosis.cx Sun Mar 5 22:18:55 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 19:18:55 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: In my experience and/or estimation... well two things: A. Mixtures of floats/ints seem a lot more common than the 10% being thrown around. I don't even consider that bad practice currently (it might become bad practice if Elliot's patch is used, since keeping elements float-only would allow much faster sorting). B. I think a very large percentage of lists are heterogeneous. But most of the time when they are, it's not because there are several different numeric types but rather because a list collects some sort of custom objects. Maybe those even all share some ancestor, but the point is they are user/library defined classes that won't have a fast comparison like ints or floats do. B (part 2): I haven't actually read his patch, but I'm assuming that Elliot's approach can start scanning the list, and as soon as it find a complex/custom object at index 0 ignore the rest of the scan. So for that case, there is very little harm in his linear scan (over one item). On Sun, Mar 5, 2017 at 7:09 PM, Chris Angelico wrote: > On Mon, Mar 6, 2017 at 2:03 PM, Elliot Gorokhovsky > wrote: > > On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico wrote: > >> > >> > >> I would be rather curious to know how frequently a list consists of > >> "numbers", but a mix of ints and floats. Does it happen a > >> lot in real-world code? > >> > > > > This is of course undecidable to verify statically, so we can't just > crawl > > PyPI... however, I would argue that using mixed float-int lists is > > dangerous, and is more dangerous in Python 3 than in Python 2. So > hopefully > > this is not very common. However, even if 10% (surely a vast > overestimate) > > of sort calls are to mixed int-float lists, my patch would still yield a > > significant savings on average. > > I agree that it's dangerous, but it is still common for programmers > and Python alike to treat 10 as functionally identical to 10.0 - > although as to being more dangerous in Py3, that's much of a muchness > (for instance, the single-slash division operator in Py2 can > potentially truncate, but in Py3 it's always going to give you a > float). But, fair point. I very much doubt it's as high as 10%, so > yeah, that would be advantageous. > > Also, the performance hit is so small, and even that is in the very > worst case (a homogeneous list with one different type at the end). I > like changes that make stuff run faster. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Sun Mar 5 22:20:01 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 03:20:01 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 8:10 PM Chris Angelico wrote: > > Also, the performance hit is so small, and even that is in the very > worst case (a homogeneous list with one different type at the end). I > like changes that make stuff run faster. > I agree. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Mar 5 22:23:49 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 19:23:49 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 7:16 PM, Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > I would imagine that fewer than even 10% of lists in real world code ever > get sorted. I mean, just crawl PyPI and look for `.sort()` or `sorted()`; > you'll find it's not that common. > I think I must sort things a lot more often than most people :-). But also, it's not just the number of list objects that are sorted, but how often it's done. I could have a 10,000 line program with only one call to `my_list.sort()` in it... but that one line is something that is called inside an inner loop. This feels like it really needs profiling not just static analysis. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Sun Mar 5 22:16:25 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 03:16:25 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 8:09 PM David Mertz wrote: > > If we added __type_hint__ as None/type-object and added those comparisons > to it on .insert()/.append()/etc, then we would be slower by some increment > while all we were doing was adding things. There could only be a win when > the list is sorted (but a big win for that case). > Exactly. > > In real world code, how often are lists sorted? Also I'm not really > confident that Elliot's estimates of 95% of lists being homogeneous holds, > but the speedup he proposes would seem to win even if that percentage is a > lot lower than 95%. If only 10% of lists in real world code ever get > `my_list.sort()` called on them, Steven's idea is probably not good. If > 50% of lists do, it probably is. But then, it depends just how *often* > lists that get sorted are re-sorted too. > I would imagine that fewer than even 10% of lists in real world code ever get sorted. I mean, just crawl PyPI and look for `.sort()` or `sorted()`; you'll find it's not that common. I know, because I was hoping to be able to demonstrate non-trivial improvements on the benchmark suites, but they just don't sort enough for it to show up. You only see application-level speedups if you're doing a *lot* of sorting, like in DEAP Pareto selection (DEAP is a popular Python evolutionary algorithms library; you sometimes have to sort individuals in the population by fitness, and the population is huge). And the cost of doing the pre-sort check isn't that bad, anyway... I think that making *every* append slower just for sorting/etc would be a net loss. Anyway, like I said earlier, my patch could be a first step towards a more broad optimization like this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Sun Mar 5 22:24:17 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 03:24:17 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 8:20 PM David Mertz wrote: > > B. I think a very large percentage of lists are heterogeneous. But most > of the time when they are, it's not because there are several different > numeric types but rather because a list collects some sort of custom > objects. Maybe those even all share some ancestor, but the point is they > are user/library defined classes that won't have a fast comparison like > ints or floats do. > As long as the ob_type for all the objects is the same, my patch will sort significantly faster, as it replaces PyObject_RichCompareBool with ob_type->tp_richcompare. It doesn't matter if your list is builtin-types or not, as long as the types are all the same, you get a speedup (though it's greater for built-in types). Also, I actually wrote a crawler to see how many PyPI packages implement a custom compare function (like you would have to for user-defined types). The answer is less than 6%. Code: https://github.com/embg/python-fast-listsort/tree/master/crawler > B (part 2): I haven't actually read his patch, but I'm assuming that > Elliot's approach can start scanning the list, and as soon as it find a > complex/custom object at index 0 ignore the rest of the scan. So for that > case, there is very little harm in his linear scan (over one item). > Yup, the pre-sort check breaks out of the loop as soon as it sees heterogeneity. So unless your float is at the end of the whole list of ints (as in my worst-case benchmark), the cost is basically 0. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Sun Mar 5 22:28:03 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 03:28:03 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 8:23 PM David Mertz wrote: > > But also, it's not just the number of list objects that are sorted, but > how often it's done. I could have a 10,000 line program with only one call > to `my_list.sort()` in it... but that one line is something that is called > inside an inner loop. This feels like it really needs profiling not just > static analysis. > Totally -- but what I mean is if you look at the performance benchmark suite, for example, most of the benchmarks do not have the string "sort" in their source code (IIRC). I think that most applications don't spend most of their time sorting. That's not to say sorting isn't important -- I wouldn't have written my patch if I didn't think it is. I'm just saying it probably isn't a good idea to penalize *all* list use just to benefit the minority of list use that involves sorting. -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Sun Mar 5 22:28:56 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Sun, 5 Mar 2017 21:28:56 -0600 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: I see the benchmarks, and while I assume the asymptotic complexity is the same, is there a longer "start-up" time your optimizations need? Do you have benchmarks where you sort 10, 100...10**6 items to show that beyond the types you're sorting, you're not amortizing any increased overhead out to oblivion? On Sun, Mar 5, 2017 at 9:24 PM, Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > On Sun, Mar 5, 2017 at 8:20 PM David Mertz wrote: > >> >> B. I think a very large percentage of lists are heterogeneous. But most >> of the time when they are, it's not because there are several different >> numeric types but rather because a list collects some sort of custom >> objects. Maybe those even all share some ancestor, but the point is they >> are user/library defined classes that won't have a fast comparison like >> ints or floats do. >> > > As long as the ob_type for all the objects is the same, my patch will sort > significantly faster, as it replaces PyObject_RichCompareBool with > ob_type->tp_richcompare. It doesn't matter if your list is builtin-types or > not, as long as the types are all the same, you get a speedup (though it's > greater for built-in types). > > Also, I actually wrote a crawler to see how many PyPI packages implement a > custom compare function (like you would have to for user-defined types). > The answer is less than 6%. Code: https://github.com/embg/ > python-fast-listsort/tree/master/crawler > > >> B (part 2): I haven't actually read his patch, but I'm assuming that >> Elliot's approach can start scanning the list, and as soon as it find a >> complex/custom object at index 0 ignore the rest of the scan. So for that >> case, there is very little harm in his linear scan (over one item). >> > > Yup, the pre-sort check breaks out of the loop as soon as it sees > heterogeneity. So unless your float is at the end of the whole list of ints > (as in my worst-case benchmark), the cost is basically 0. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Sun Mar 5 22:55:13 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 03:55:13 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 8:29 PM Nick Timkovich wrote: > I see the benchmarks, and while I assume the asymptotic complexity is the > same, is there a longer "start-up" time your optimizations need? Do you > have benchmarks where you sort 10, 100...10**6 items to show that beyond > the types you're sorting, you're not amortizing any increased overhead out > to oblivion? > This is addressed in my post to the bug tracker with a perf benchmark (link in the first email in this thread). In short: the pre-sort check is really cheap, even for tiny lists. Though it gets cheaper (percent-wise) as you get bigger. You could also trivially modify the benchmark script I linked to in my original post to check for this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sun Mar 5 23:11:07 2017 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 6 Mar 2017 04:11:07 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: <75d521f8-c42b-4f53-cf15-951bf005fa33@mrabarnett.plus.com> On 2017-03-06 03:09, Chris Angelico wrote: > On Mon, Mar 6, 2017 at 2:03 PM, Elliot Gorokhovsky > wrote: >> On Sun, Mar 5, 2017 at 7:50 PM Chris Angelico wrote: >>> >>> >>> I would be rather curious to know how frequently a list consists of >>> "numbers", but a mix of ints and floats. Does it happen a >>> lot in real-world code? >>> >> >> This is of course undecidable to verify statically, so we can't just crawl >> PyPI... however, I would argue that using mixed float-int lists is >> dangerous, and is more dangerous in Python 3 than in Python 2. So hopefully >> this is not very common. However, even if 10% (surely a vast overestimate) >> of sort calls are to mixed int-float lists, my patch would still yield a >> significant savings on average. > > I agree that it's dangerous, but it is still common for programmers > and Python alike to treat 10 as functionally identical to 10.0 - > although as to being more dangerous in Py3, that's much of a muchness > (for instance, the single-slash division operator in Py2 can > potentially truncate, but in Py3 it's always going to give you a > float). But, fair point. I very much doubt it's as high as 10%, so > yeah, that would be advantageous. > > Also, the performance hit is so small, and even that is in the very > worst case (a homogeneous list with one different type at the end). I > like changes that make stuff run faster. > Although it's true that both programmers and Python might treat 10 as functionally identical to 10.0, in practice the numbers that are being added to the list probably come from some code that returns integers /or/ floats, rather than a mixture. From elliot.gorokhovsky at gmail.com Sun Mar 5 23:15:24 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 04:15:24 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <75d521f8-c42b-4f53-cf15-951bf005fa33@mrabarnett.plus.com> References: <75d521f8-c42b-4f53-cf15-951bf005fa33@mrabarnett.plus.com> Message-ID: On Sun, Mar 5, 2017 at 9:12 PM MRAB wrote: > > Although it's true that both programmers and Python might treat 10 as > functionally identical to 10.0, in practice the numbers that are being > added to the list probably come from some code that returns integers > /or/ floats, rather than a mixture. > Yes, exactly. So we can see how the homogeneity assumption is a reasonable one to make; unless you're defining custom compares (uncommon), I don't see where you would ever be sorting a heterogeneous list except for the int/float case. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Sun Mar 5 23:25:05 2017 From: tim.peters at gmail.com (Tim Peters) Date: Sun, 5 Mar 2017 22:25:05 -0600 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: [Elliot Gorokhovsky ] > (Summary of results: my patch at https://bugs.python.org/issue28685 makes list.sort() 30-50% > faster in common cases, and at most 1.5% slower in the uncommon worst case.) > ... Would someone please move the patch along? I expect it's my fault it's languished so long, since I'm probably the natural person to review it, but I've been buried under other stuff. But the patch doesn't change anything about the sorting algorithm itself - even shallow knowledge of how timsort works is irrelevant. It's just plugging in a different bottom-level object comparison _function_ when that appears valuable. I've said from the start that it's obvious (to me ;-) ) that it's an excellent tradeoff. At worst it adds one simple (pre)pass over the list doing C-level pointer equality comparisons. That's cheap. The worst-case damage is obviously small, the best-case gain is obviously large, and the best cases are almost certainly far more common than the worst cases in most code. In reviewing my own code, after it was fiddled to work under Python 3 there are no mixed-type lists that are ever sorted. There are lists with complex objects, but whenever those are sorted there's a `key=` argument that reduces the problem to sorting ints or tuples of builtin scalar types. I don't care about anyone else's code ;-) One subtle thing to look at: thread safety. IIRC, the patch plugged the comparison function into a file global. That's obviously hosed if multiple threads sort different kinds of lists simultaneously. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Sun Mar 5 23:33:11 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 04:33:11 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 9:25 PM Tim Peters wrote: > > Would someone please move the patch along? I expect it's my fault it's > languished so long, since I'm probably the natural person to review it, but > I've been buried under other stuff. > > But the patch doesn't change anything about the sorting algorithm itself - > even shallow knowledge of how timsort works is irrelevant. It's just > plugging in a different bottom-level object comparison _function_ when that > appears valuable. > > I've said from the start that it's obvious (to me ;-) ) that it's an > excellent tradeoff. At worst it adds one simple (pre)pass over the list > doing C-level pointer equality comparisons. That's cheap. The worst-case > damage is obviously small, the best-case gain is obviously large, and the > best cases are almost certainly far more common than the worst cases in > most code. > Thank you so much for the support! Yes to all of those things! > > In reviewing my own code, after it was fiddled to work under Python 3 > there are no mixed-type lists that are ever sorted. There are lists with > complex objects, but whenever those are sorted there's a `key=` argument > that reduces the problem to sorting ints or tuples of builtin scalar types. > I'm adding that quote to the next version my poster :) > > One subtle thing to look at: thread safety. IIRC, the patch plugged the > comparison function into a file global. That's obviously hosed if multiple > threads sort different kinds of lists simultaneously. > Wow, that is a *very* good point. I never think about those kinds of things, being a n00b, so thanks for catching that... I'll have to go in an fix that. I'm not sure how, though, because the ISLT macro gets called in a bunch of different functions. The only way I can think of to fix it would be to pass the function pointer as an argument to *all* the functions that use ISLT, which would be pretty messy. What do you think would be the easiest fix? Thanks! Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Mar 6 00:25:06 2017 From: mertz at gnosis.cx (David Mertz) Date: Sun, 5 Mar 2017 21:25:06 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 8:33 PM, Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > On Sun, Mar 5, 2017 at 9:25 PM Tim Peters wrote: > >> One subtle thing to look at: thread safety. IIRC, the patch plugged the >> comparison function into a file global. That's obviously hosed if multiple >> threads sort different kinds of lists simultaneously. >> > > Wow, that is a *very* good point. I never think about those kinds of > things, being a n00b, so thanks for catching that... I'll have to go in and > fix that. I'm not sure how, though, because the ISLT macro gets called in a > bunch of different functions. The only way I can think of to fix it would > be to pass the function pointer as an argument to *all* the functions that > use ISLT, which would be pretty messy. What do you think would be the > easiest fix? > Could we make the file global a table of comparison functions and have each thread reference a position in the table? It would be fine if multiple threads happened to use the position of e.g. the int comparison, just so long as each chose the right slot in the table. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. On Sun, Mar 5, 2017 at 8:33 PM, Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > On Sun, Mar 5, 2017 at 9:25 PM Tim Peters wrote: > >> >> Would someone please move the patch along? I expect it's my fault it's >> languished so long, since I'm probably the natural person to review it, but >> I've been buried under other stuff. >> >> But the patch doesn't change anything about the sorting algorithm itself >> - even shallow knowledge of how timsort works is irrelevant. It's just >> plugging in a different bottom-level object comparison _function_ when that >> appears valuable. >> >> I've said from the start that it's obvious (to me ;-) ) that it's an >> excellent tradeoff. At worst it adds one simple (pre)pass over the list >> doing C-level pointer equality comparisons. That's cheap. The worst-case >> damage is obviously small, the best-case gain is obviously large, and the >> best cases are almost certainly far more common than the worst cases in >> most code. >> > > Thank you so much for the support! Yes to all of those things! > > >> >> In reviewing my own code, after it was fiddled to work under Python 3 >> there are no mixed-type lists that are ever sorted. There are lists with >> complex objects, but whenever those are sorted there's a `key=` argument >> that reduces the problem to sorting ints or tuples of builtin scalar types. >> > > I'm adding that quote to the next version my poster :) > > >> >> One subtle thing to look at: thread safety. IIRC, the patch plugged the >> comparison function into a file global. That's obviously hosed if multiple >> threads sort different kinds of lists simultaneously. >> > > Wow, that is a *very* good point. I never think about those kinds of > things, being a n00b, so thanks for catching that... I'll have to go in an > fix that. I'm not sure how, though, because the ISLT macro gets called in a > bunch of different functions. The only way I can think of to fix it would > be to pass the function pointer as an argument to *all* the functions that > use ISLT, which would be pretty messy. What do you think would be the > easiest fix? > > Thanks! > Elliot > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Mon Mar 6 00:45:14 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 05:45:14 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 10:25 PM David Mertz wrote: > > Could we make the file global a table of comparison functions and have > each thread reference a position in the table? It would be fine if multiple > threads happened to use the position of e.g. the int comparison, just so > long as each chose the right slot in the table. > I don't think that would solve anything. The problem is that the pre-sort check is carried out in the main sort function, but then lots of smaller functions need to know which compare to use. Since the scope isn't shared, then, how can we inform the smaller functions of which compare to use? I solved this problem by just putting the compare function pointer in global scope. Obviously, however, this is not thread-safe. So the question is, how can the smaller functions be made aware of the results of the pre-sort check? Your proposal would merely replace the problem of communicating the function pointer with the problem of communicating the appropriate index into the function table. The smaller functions wouldn't know which index they're supposed to use unless you passed it in. However, I believe the following *would* work: Suppose it were possible to access an identifier unique to each thread, such as that returned by gettid() on Linux. Then maintain a global table of (unique_thread_id, compare_func) pairs, and simply have the ISLT macro index into the table! A bit of a performance hit, sure, but hopefully not that bad? Certainly better than passing it in every time... the problem is, how can we get a unique identifier for the thread in a platform-independent way? Any ideas? -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Mon Mar 6 00:52:43 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 05:52:43 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 10:45 PM Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > > the problem is, how can we get a unique identifier for the thread in a > platform-independent way? Any ideas? > Oh, I could probably just copy over code from threading.get_ident()... not sure if the key-value table is a good solution, though. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavol.lisy at gmail.com Mon Mar 6 01:08:01 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Mon, 6 Mar 2017 07:08:01 +0100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On 3/6/17, Tim Peters wrote: > One subtle thing to look at: thread safety. One other subtle: Is it gilectomy neutral? From elliot.gorokhovsky at gmail.com Mon Mar 6 01:08:10 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 06:08:10 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: Another solution: check if there is more than one thread; if there is, then disable optimization. Is sorting in multithreaded programs common enough to warrant adding the complexity to deal with it? On Sun, Mar 5, 2017 at 10:52 PM Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > On Sun, Mar 5, 2017 at 10:45 PM Elliot Gorokhovsky < > elliot.gorokhovsky at gmail.com> wrote: > > > the problem is, how can we get a unique identifier for the thread in a > platform-independent way? Any ideas? > > > Oh, I could probably just copy over code from threading.get_ident()... not > sure if the key-value table is a good solution, though. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jelle.zijlstra at gmail.com Mon Mar 6 01:21:23 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Sun, 5 Mar 2017 22:21:23 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: 2017-03-05 22:08 GMT-08:00 Elliot Gorokhovsky : > Another solution: check if there is more than one thread; if there is, then > disable optimization. Is sorting in multithreaded programs common enough to > warrant adding the complexity to deal with it? > I think using a global is unsafe even without multithreading, because the compare function itself could end up doing list.sort() (it's calling arbitrary Python code after all). > > On Sun, Mar 5, 2017 at 10:52 PM Elliot Gorokhovsky > wrote: >> >> On Sun, Mar 5, 2017 at 10:45 PM Elliot Gorokhovsky >> wrote: >>> >>> >>> the problem is, how can we get a unique identifier for the thread in a >>> platform-independent way? Any ideas? >> >> >> Oh, I could probably just copy over code from threading.get_ident()... not >> sure if the key-value table is a good solution, though. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From elliot.gorokhovsky at gmail.com Mon Mar 6 01:28:16 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 06:28:16 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 11:21 PM Jelle Zijlstra wrote: > > I think using a global is unsafe even without multithreading, because > the compare function itself could end up doing list.sort() (it's > calling arbitrary Python code after all). > Right, of course. So, clearly, the only safe solution is to just keep everything in local scope and pass the compare function pointer into every function that calls ISLT or IFLT. Too bad. I'll rewrite the patch to implement this and open a new issue on the bug tracker (and close my current issue). -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Mar 6 01:31:05 2017 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 6 Mar 2017 00:31:05 -0600 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: [Elliot Gorokhovsky ] > Another solution: check if there is more than one thread; if there is, then > disable optimization. Is sorting in multithreaded programs common enough > to warrant adding the complexity to deal with it? Not a solution. Not even close ;-) Even if it made good sense, there's nothing to stop a custom __lt__ method from creating new threads _during_ a sort. The best approach is indeed to pass the function pointer to every location that needs it. Note that a MergeState struct is already created per sort invocation, That isn't a file global for much the same reason. However, it's not just threads that are a potential problem. Suppose a custom __lt__ method, invoked during a sort, does a new sort of its own. That's in the same thread, but may well want a different specialized comparison function. Solve _that_, and "the thread problem" will almost certainly solve itself by magic too. But solving "the thread problem" doesn't necessarily solve "the same-thread reentrancy problem". That's why the MergeState struct is a function local ("auto" in silly C terminology). Since it lives in fresh stack space for each invocation of `listsort()` it solves both the thread and reentrancy problems: every invocation of `listsort()` (regardless of whether from different threads or from the same thread) gets its own MergeState space. You may or may not get simpler code by storing the function pointer as a new member of the MergeState struct. But however that's spelled, it does need to be passed to each function that needs it. From songofacandy at gmail.com Mon Mar 6 01:39:05 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 6 Mar 2017 15:39:05 +0900 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Mon, Mar 6, 2017 at 3:28 PM, Elliot Gorokhovsky wrote: > On Sun, Mar 5, 2017 at 11:21 PM Jelle Zijlstra > wrote: >> >> >> I think using a global is unsafe even without multithreading, because >> the compare function itself could end up doing list.sort() (it's >> calling arbitrary Python code after all). > > > Right, of course. So, clearly, the only safe solution is to just keep > everything in local scope and pass the compare function pointer into every > function that calls ISLT or IFLT. Too bad. I'll rewrite the patch to > implement this and open a new issue on the bug tracker (and close my current > issue). I think there is another safe solution: Gave up unsafe_object_compare. Compare function of long, float, and unicode must not call list.sort(), and must not release GIL. So all you need is, backup old compare_function before sort, and restore it after sort. From rosuav at gmail.com Mon Mar 6 01:40:16 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Mar 2017 17:40:16 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Mon, Mar 6, 2017 at 5:31 PM, Tim Peters wrote: > Not a solution. Not even close ;-) Even if it made good sense, > there's nothing to stop a custom __lt__ method from creating new > threads _during_ a sort. Arbitrary comparison functions let you do anything.... but whoa, I cannot imagine any way that this would ever happen outside of "hey look, here's how you can trigger a SystemError"! ChrisA From elliot.gorokhovsky at gmail.com Mon Mar 6 01:50:55 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 06:50:55 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 11:39 PM INADA Naoki wrote: > > I think there is another safe solution: Gave up unsafe_object_compare. > > Compare function of long, float, and unicode must not call list.sort(), > and must not release GIL. > So all you need is, backup old compare_function before sort, and > restore it after sort. > That would definitely work, but I think it's too much of a sacrifice for these reasons: (1) You would have to give up on optimizing tuples (because tuple compares call PyObject_RichCompareBool, which could itself call arbitrary code). That's a real shame, because sorting tuples is common (e.g., in DEAP, you sort tuples of floats representing the fitnesses of individuals). The alternative would be to only optimize tuples of long/string/float. However, I think the code complexity of *that* would be greater than the, admittedly messy, solution of passing in the compare everywhere it's needed. (2) There are lots of types, e.g. bytes, that would fall through the cracks. (3) Correct me if I'm wrong, but this would make us depend on GIL, right? And we don't want to depend on that if we don't have to, right? It seems to me that the only solution here is to go in and add a compare function pointer to each function call. Extremely unfortunate, but necessary. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Mon Mar 6 01:52:33 2017 From: tim.peters at gmail.com (Tim Peters) Date: Mon, 6 Mar 2017 00:52:33 -0600 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: [Chris Angelico ] > Arbitrary comparison functions let you do anything.... but whoa, I > cannot imagine any way that this would ever happen outside of "hey > look, here's how you can trigger a SystemError"! CPython is full of defensive code protecting against malicious crap. That's why it rarely crashes ;-) def __lt__(self, other): return self.size < other.size Looks harmless? Can't tell! For all we know, there are proxy objects, and other.__getattr__ invokes some elaborate library to open a socket in a new thread to fetch the value of `size` over a network. From rosuav at gmail.com Mon Mar 6 01:56:12 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 6 Mar 2017 17:56:12 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Mon, Mar 6, 2017 at 5:52 PM, Tim Peters wrote: > [Chris Angelico ] >> Arbitrary comparison functions let you do anything.... but whoa, I >> cannot imagine any way that this would ever happen outside of "hey >> look, here's how you can trigger a SystemError"! > > CPython is full of defensive code protecting against malicious crap. > That's why it rarely crashes ;-) > > def __lt__(self, other): > return self.size < other.size > > Looks harmless? Can't tell! For all we know, there are proxy > objects, and other.__getattr__ invokes some elaborate library to open > a socket in a new thread to fetch the value of `size` over a network. Exactly. It's always fun to discover some nice tidy exception that cleanly copes with a ridiculous situation. def gen(): yield next(g) g = gen() next(g) Fortunately in this case, the solution isn't to say "SystemError: cannot create threads while sorting", but even if that were the case, I can't imagine that much production code would be stopped by it. ChrisA From elliot.gorokhovsky at gmail.com Mon Mar 6 01:56:39 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Mon, 06 Mar 2017 06:56:39 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Sun, Mar 5, 2017 at 11:31 PM Tim Peters wrote: > > The best approach is indeed to pass the function pointer to every > location that needs it. Note that a MergeState struct is already > created per sort invocation, That isn't a file global for much the > same reason. > Right. It's a real shame, because the patch as it stands right now is extremely surgical, but there's clearly no way around it. There are some functions that don't take in MergeState (e.g. gallop_left) but that call ISLT/IFLT, so I think I'll just add a compare function pointer parameter to all the function calls. I mean, the diff will be a lot hairier, which might make the patch more difficult to review, but when it comes down to it the code complexity won't really increase, so overall I don't think this is the end of the world. Thanks so much for pointing this out! P.S. Is it OK if I close my current issue on the bug tracker and open a new issue, where I'll post the revised patch? The writing on my current issue uses the old, less-rigorous benchmarks, and I feel it would be less confusing if I just made a new issue and posted the correct benchmarks/discussion at the top. The current issue doesn't have many comments, so not much would be lost by just linking to it from the new issue. If this violates bug tracker etiquette, however, :) I'll just post the revised patch on my current issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guettliml at thomas-guettler.de Mon Mar 6 05:12:28 2017 From: guettliml at thomas-guettler.de (=?UTF-8?Q?Thomas_G=c3=bcttler?=) Date: Mon, 6 Mar 2017 11:12:28 +0100 Subject: [Python-ideas] Smoothing transition: 'unicode' and 'basestring' as aliases for 'str'? In-Reply-To: References: <31701b30-03a5-3a8d-32e2-2dd59a26c09c@thomas-guettler.de> Message-ID: <6425c4aa-c421-fe65-8739-1aad7fda0d8d@thomas-guettler.de> yes, you are right. It's better to leave Python3 clean (without "basestring"). I see two ways now. six ---- six.string_types # replacement for basestring Source https://docs.djangoproject.com/en/1.10/topics/python3/#string-handling-with-six future ------ from past.builtins import basestring # pip install future Source http://python-future.org/compatible_idioms.html#basestring I have no clue which one I should use. Regards, Thomas Am 03.03.2017 um 16:43 schrieb Joao S. O. Bueno: > I see no reason to introduce clutter like this at this point in time - > code needing to run in both Py 2 nd 3, if not using something like > "six" could do: > > compat.py > try: > unicode > except NameError: > unicode = basestring = str > > elsewhere: > from compat import unicode, basestring > > > Or rather: > > try: > unicode > else: > str = basestring = unicode > > and > from compat import str > # therefore having Python3 valid and clear code from here. > > On 3 March 2017 at 11:37, Ryan Birmingham wrote: >> The thread is here in the archive >> (https://mail.python.org/pipermail/python-ideas/2016-June/040761.html) if >> anyone's wondering context, as I was. >> >> In short, someone wanted an alias from string to basestring. >> This is addressed in the "What's new in Python 3.0" >> (https://docs.python.org/3/whatsnew/3.0.html) page: >>> >>> The built-in basestring abstract type was removed. Use str instead. The >>> strand bytes types don?t have functionality enough in common to warrant a >>> shared base class. The 2to3 tool (see below) replaces every occurrence of >>> basestring with str. >> >> Personally, I have no issue with leaving an alias like this in 2to3, since >> adding it to the language feels more like forced backwards compatibility to >> me. >> >> That said, there are more related subtleties on the "What's new in Python >> 3.0" page, some of which seem less intuitive, so I understand where a desire >> like this would come from. Would more specific and succinct documentation on >> this change alone help? >> >> -Ryan Birmingham >> >> On 3 March 2017 at 06:44, Thomas G?ttler >> wrote: >>> >>> I found this in an old post: >>> >>>> Maybe too late now but there should have been 'unicode', >>>> 'basestring' as aliases for 'str'. >>> >>> I guess it is too late to think about it again ... >>> >>> Regards, >>> Thomas G?ttler >>> >>> >>> -- >>> Thomas Guettler http://www.thomas-guettler.de/ >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ -- Thomas Guettler http://www.thomas-guettler.de/ From mmueller at python-academy.de Mon Mar 6 06:01:33 2017 From: mmueller at python-academy.de (=?UTF-8?Q?Mike_M=c3=bcller?=) Date: Mon, 6 Mar 2017 12:01:33 +0100 Subject: [Python-ideas] Smoothing transition: 'unicode' and 'basestring' as aliases for 'str'? In-Reply-To: <6425c4aa-c421-fe65-8739-1aad7fda0d8d@thomas-guettler.de> References: <31701b30-03a5-3a8d-32e2-2dd59a26c09c@thomas-guettler.de> <6425c4aa-c421-fe65-8739-1aad7fda0d8d@thomas-guettler.de> Message-ID: <3b88fd7f-bda9-1bce-78cc-252a1c69504c@python-academy.de> Am 06.03.17 um 11:12 schrieb Thomas G?ttler: > yes, you are right. It's better to leave Python3 clean (without "basestring"). > > I see two ways now. > > > six > ---- > > six.string_types # replacement for basestring > > Source > https://docs.djangoproject.com/en/1.10/topics/python3/#string-handling-with-six > > > future > ------ > > from past.builtins import basestring # pip install future > > Source http://python-future.org/compatible_idioms.html#basestring > > > I have no clue which one I should use. I would recommend future. It gives you a Python-3-like experience in Python 2. Once you fully transition to Python 3, you only need to remove the future imports and you don't have any dependency on it any more. For example: from builtins import bytes, str gives you Python 3 bytes and strings in Python 2. Now, you can replace basestring with str and it works the same in Python 2 and 3. Maybe this works for you. I am pretty happy with future. Best, Mike From tjreedy at udel.edu Mon Mar 6 16:47:32 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 6 Mar 2017 16:47:32 -0500 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On 3/6/2017 1:56 AM, Elliot Gorokhovsky wrote: > P.S. Is it OK if I close my current issue on the bug tracker and open a > new issue, where I'll post the revised patch? The writing on my current > issue uses the old, less-rigorous benchmarks, and I feel it would be > less confusing if I just made a new issue and posted the correct > benchmarks/discussion at the top. The current issue doesn't have many > comments, so not much would be lost by just linking to it from the new > issue. If this violates bug tracker etiquette, however, :) I'll just > post the revised patch on my current issue. That would be normal. Issues often get revised patches, sometimes with more severe changes than this. Or people post competing patches. Do reverence this thread, and quote Tim's approval in principle, if he did not post on the tracker. -- Terry Jan Reedy From chris.barker at noaa.gov Mon Mar 6 20:13:22 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 6 Mar 2017 17:13:22 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <20170306024554.GK5689@ando.pearwood.info> References: <20170306024554.GK5689@ando.pearwood.info> Message-ID: On Sun, Mar 5, 2017 at 6:45 PM, Steven D'Aprano wrote: > I sometimes need to know if a list is homogenous, but unfortunately > checking large lists for a common type in pure Python is quote slow. > > Here is a radical thought... why don't lists track their common type > themselves? There's only a few methods which can add items: > For what it's worth, I suggested this a LONG time ago -- well before Python ideas existed.... I thought a "homogenous sequence" could be rally useful for all sorts of optimizations. (at the time I was writing C extensions, and often converting a lot of list to numpy arrays -- which ARE homogenous sequences) Anyway -- it was roundly rejected by Guido and others no one had any interest in the idea. But maybe now that there is a compelling use-case for the built in object the time is right?? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Mon Mar 6 22:41:40 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 07 Mar 2017 03:41:40 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: Message-ID: On Mon, Mar 6, 2017, 4:42 PM Jim J. Jewett wrote: (1) Good Job. Thanks! (3) Ideally, your graph would have the desired-to-be lines after the as-is lines; for English writing, that would mean putting your (short red) lines to the right of the (tall blue) lines. (4) I suspect colors other than red and blue would be helpful as well, but I am explicitly not a design wiz. My best suggestion (other than "ask someone who isn't me") would be to use blue for as-is and green for to-be. Ya, I know the colors are terrible, I just have no graphic design experience so I figured I'd make the graph the same colors as the diagram (5) I don't know that all-ASCII (or at least all-Latin1) is true for most applications, but it is certainly true for most datasets run in countries where latin1 is sufficient, which includes most places outside of Asia, Africa, and perhaps Eastern Europe. If anything, that strengthens your case, since you can win on plenty of datasets even for applications where it isn't always safe, and for those datasets that do require a wider charset, you're likely to discover this quickly. True. I think a much bigger part of the consideration is also that a lot of software (e.g. file systems) *don't* support unicode, at least by default, so if you're dealing with text you got from another program or from the OS, it's usually ASCII. And usually, our shell scripts our getting their text from other programs (e.g. file names). (6) When I saw the flowchart around f(v, w), at first I was thinking about the key function used in some sorting... I suppose that isn't relevant, since those sorts (If I Recall Correctly) already create a parallel array to avoid recomputing the keys, but ... it might be worth clarifying, if you can find a way to do it easily without adding too much complexity. Maybe just change "this f" to "the compare function f"? You are correct -- key sorts create a parallel array. In fact, *all* sorts create a parallel array, for safety: the only way to make sure the objects aren't getting mutated as you sort is to keep them safe! If the objects are getting mutated during the sort, something is clearly going horribly wrong, but at least you won't segfault. (This is not part of my patch -- it's part of the original implementation). (7) I hope you have put in a pull request to get this added to python 3.7. Thanks! I don't think I can make pull requests, but I am going to submit a fixed version to the bug tracker (Tim pointed out that my current code isn't thread-safe or adversary-safe because it stores compare_function in a global. I have to modify the code keep it in local scope and pass in to every function that needs it. This will make the diff hairier, but not increase code complexity. -jJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From george at fischhof.hu Tue Mar 7 04:43:15 2017 From: george at fischhof.hu (George Fischhof) Date: Tue, 7 Mar 2017 10:43:15 +0100 Subject: [Python-ideas] Wrapper for ctypes Message-ID: Hi Guys, right now I had to call functions from a dll, and I started using ctypes. I found this library too https://pypi.python.org/pypi/pywrap/0.1.0 which says (qutation): Replace this: prototype = ctypes.WINFUNCTYPE(wintypes.HANDLE, wintypes.UINT, wintypes.HANDLE)paramflags = (1, "uFormat"), (1, "hMem")SetClipboardData = prototype(("SetClipboardData", user32), paramflags)SetClipboardData.errcheck = null_errcheck With this: SetClipboardData = pywrap.wrap_winapi(name="SetClipboardData", library=user32, restype=wintypes.BOOL, params=[ Parameter("uFormat", wintypes.UINT), Parameter("hMem", wintypes.HANDLE) ], errcheck=null_errcheck) (end qutation) My idea: something like this library should be implemented in Python to make it more simple to use ctypes. BR, George -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Tue Mar 7 06:32:02 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Tue, 7 Mar 2017 12:32:02 +0100 Subject: [Python-ideas] Wrapper for ctypes In-Reply-To: References: Message-ID: I have in the past written a small utility to use compact signatures like: i i -> i to indicate a function taking two ints and returning an int. See for example: https://github.com/stephanh42/armasm Op 7 mrt. 2017 10:43 a.m. schreef "George Fischhof" : > Hi Guys, > > right now I had to call functions from a dll, and I started using ctypes. > > > I found this library too > > https://pypi.python.org/pypi/pywrap/0.1.0 > > which says (qutation): > > Replace this: > > prototype = ctypes.WINFUNCTYPE(wintypes.HANDLE, wintypes.UINT, wintypes.HANDLE)paramflags = (1, "uFormat"), (1, "hMem")SetClipboardData = prototype(("SetClipboardData", user32), paramflags)SetClipboardData.errcheck = null_errcheck > > With this: > > SetClipboardData = pywrap.wrap_winapi(name="SetClipboardData", > library=user32, > restype=wintypes.BOOL, > params=[ > Parameter("uFormat", wintypes.UINT), > Parameter("hMem", wintypes.HANDLE) > ], > errcheck=null_errcheck) > > > > (end qutation) > > My idea: something like this library should be implemented in Python to make it more simple to use ctypes. > > BR, > > George > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Tue Mar 7 10:13:42 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Tue, 7 Mar 2017 09:13:42 -0600 Subject: [Python-ideas] Wrapper for ctypes In-Reply-To: References: Message-ID: Ever looked up cffi? You won't be disappointed. -- Ryan (????) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else http://refi64.com On Mar 7, 2017 3:43 AM, "George Fischhof" wrote: > Hi Guys, > > right now I had to call functions from a dll, and I started using ctypes. > > > I found this library too > > https://pypi.python.org/pypi/pywrap/0.1.0 > > which says (qutation): > > Replace this: > > prototype = ctypes.WINFUNCTYPE(wintypes.HANDLE, wintypes.UINT, wintypes.HANDLE)paramflags = (1, "uFormat"), (1, "hMem")SetClipboardData = prototype(("SetClipboardData", user32), paramflags)SetClipboardData.errcheck = null_errcheck > > With this: > > SetClipboardData = pywrap.wrap_winapi(name="SetClipboardData", > library=user32, > restype=wintypes.BOOL, > params=[ > Parameter("uFormat", wintypes.UINT), > Parameter("hMem", wintypes.HANDLE) > ], > errcheck=null_errcheck) > > > > (end qutation) > > My idea: something like this library should be implemented in Python to make it more simple to use ctypes. > > BR, > > George > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guettliml at thomas-guettler.de Tue Mar 7 10:32:05 2017 From: guettliml at thomas-guettler.de (=?UTF-8?Q?Thomas_G=c3=bcttler?=) Date: Tue, 7 Mar 2017 16:32:05 +0100 Subject: [Python-ideas] Smoothing transition: 'unicode' and 'basestring' as aliases for 'str'? In-Reply-To: <3b88fd7f-bda9-1bce-78cc-252a1c69504c@python-academy.de> References: <31701b30-03a5-3a8d-32e2-2dd59a26c09c@thomas-guettler.de> <6425c4aa-c421-fe65-8739-1aad7fda0d8d@thomas-guettler.de> <3b88fd7f-bda9-1bce-78cc-252a1c69504c@python-academy.de> Message-ID: <243aa894-2bcd-a194-208a-862c8c6b5d3d@thomas-guettler.de> Thank you for guiding me, Mike. We see us on CLT this weekend :-) Regards, Thomas G?ttler Am 06.03.2017 um 12:01 schrieb Mike M?ller: > Am 06.03.17 um 11:12 schrieb Thomas G?ttler: >> yes, you are right. It's better to leave Python3 clean (without "basestring"). >> >> I see two ways now. >> >> >> six >> ---- >> >> six.string_types # replacement for basestring >> >> Source >> https://docs.djangoproject.com/en/1.10/topics/python3/#string-handling-with-six >> >> >> future >> ------ >> >> from past.builtins import basestring # pip install future >> >> Source http://python-future.org/compatible_idioms.html#basestring >> >> >> I have no clue which one I should use. > > I would recommend future. It gives you a Python-3-like experience in Python 2. > Once you fully transition to Python 3, you only need to remove the future > imports and you don't have any dependency on it any more. > > For example: > > from builtins import bytes, str > > gives you Python 3 bytes and strings in Python 2. Now, you can replace > basestring with str and it works the same in Python 2 and 3. > Maybe this works for you. > > I am pretty happy with future. > > Best, > Mike > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Thomas Guettler http://www.thomas-guettler.de/ From python at lucidity.plus.com Tue Mar 7 15:46:45 2017 From: python at lucidity.plus.com (Erik) Date: Tue, 7 Mar 2017 20:46:45 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> Message-ID: <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> On 06/03/17 03:08, David Mertz wrote: > On Sun, Mar 5, 2017 at 6:45 PM, Steven D'Aprano > wrote: > > Here is a radical thought... why don't lists track their common type > themselves? There's only a few methods which can add items: > > > I had exactly the same thought. Lists would need to grow a new > attribute, of course. I'm not sure how that would affect the object > layout and word boundaries. But there might be free space for another > attribute slot. > > The real question is whether doing this is a win. On each > append/mutation operation we would need to do a comparison to the > __type_hint__ (assuming Steven's spelling of the attribute). That's not > free. Balancing that, however, when we actually *did* a sort, it would > be O(1) to tell if it was homogeneous (and also the actual type if yes) > rather than O(N). I don't think anyone has mentioned this yet, but FWIW I think the 'type hint' may need to be tri-state: heterogeneous (NULL), homogeneous (the pointer to the type structure) and also "unknown" (a sentinel value - the address of a static char or something). Otherwise, a delete operation on the list would need to scan the list to work out if it had changed from heterogeneous to homogeneous (unless it was acceptable that once heterogeneous, a list is always considered heterogeneous - i.e., delete always sets the hint to NULL). Instead, delete would change a NULL hint to the sentinel (leaving a valid type hint as it is) and then prior to sorting - as the hint is being checked anyway - if it's the sentinel value, perform the pre-scan that the existing patch is doing to restore the knowledge of just what type of list it is. I'd prefer the sort optimization to be based on what my list contains NOW, not on what it may have contained some time in the past, so I'm not a fan of the "once heterogeneous, always considered heterogeneous" behaviour if it's cheap enough to avoid it. E. From python at lucidity.plus.com Tue Mar 7 15:55:31 2017 From: python at lucidity.plus.com (Erik) Date: Tue, 7 Mar 2017 20:55:31 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> Message-ID: On 07/03/17 20:46, Erik wrote: > (unless it > was acceptable that once heterogeneous, a list is always considered > heterogeneous - i.e., delete always sets the hint to NULL). Rubbish. I meant that delete would not touch the hint at all. E. From elliot.gorokhovsky at gmail.com Tue Mar 7 16:10:00 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 07 Mar 2017 21:10:00 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> Message-ID: On Tue, Mar 7, 2017 at 1:47 PM Erik wrote: > > I'd prefer the sort optimization to be based on what my list contains > NOW, not on what it may have contained some time in the past, so I'm not > a fan of the "once heterogeneous, always considered heterogeneous" > behaviour if it's cheap enough to avoid it. > Sure. Dictionaries actually don't implement this, though: as soon as they see a non-string key, they permanently switch to a heterogeneous state (IIRC). I think the bigger problem, though, is that most list use does *not* involve sorting, so it would be a shame to impose the non-trivial overhead of type-checking on *all* list use. With dictionaries, you have to type-check inserts anyway (for equality testing), so it's less of a problem. But the fundamental list operations *don't* require type-checking currently, so why add it in? In practice, the pre-sort check is *very* cheap, and its cost is only imposed on sort usage. Anyway, my patch could always be a precursor to a more general optimization along these lines. I'm almost finished fixing the problem Tim identified earlier in this thread; after that, it'll be ready for review! -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at lucidity.plus.com Tue Mar 7 16:36:57 2017 From: python at lucidity.plus.com (Erik) Date: Tue, 7 Mar 2017 21:36:57 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> Message-ID: Hi Elliot, On 07/03/17 21:10, Elliot Gorokhovsky wrote: > On Tue, Mar 7, 2017 at 1:47 PM Erik > wrote: > > > I'd prefer the sort optimization to be based on what my list contains > NOW, not on what it may have contained some time in the past, so I'm not > a fan of the "once heterogeneous, always considered heterogeneous" > behaviour if it's cheap enough to avoid it. > > > Sure. Dictionaries actually don't implement this, though: as soon as > they see a non-string key, they permanently switch to a heterogeneous > state (IIRC). I'd be interested to know if this approach had been considered and rejected for dicts - but I think dicts are a bit of a special case anyway. Because they are historically a fundamental building block of the language (for name lookups etc) they are probably more sensitive to small changes than other objects. > I think the bigger problem, though, is that most list use does *not* > involve sorting, so it would be a shame to impose the non-trivial > overhead of type-checking on *all* list use. Yes, I understand that issue - I just thought I'd mention something that hadn't been pointed out yet _IF_ the idea of a type hint were to be considered (that's the sub-thread I'm replying to). If you're not doing that, then fine - I just wanted to put down things that occurred to me so they were documented (if only for rejection). So, while I'm at it ;), here are some other things I noticed scanning the list object source (again, only if a type hint was considered): * What is the type hint of an empty list? (this probably depends on how naturally the code for all of the type hint checking deals with NULL vs "unknown"). * listextend() - this should do the right thing with the type hint when extending one list with another. * Several other methods ('contains', 'remove', 'count', 'index') also use PyObject_RichCompareBool(). They could also presumably benefit from the same optimisation (perhaps it's not all about sort() - perhaps this gives a little more weight to the idea). > Anyway, my patch could always be a precursor to a more general > optimization along these lines. I'm almost finished fixing the problem > Tim identified earlier in this thread; after that, it'll be ready for > review! Nice - good job. E. From mertz at gnosis.cx Tue Mar 7 17:39:29 2017 From: mertz at gnosis.cx (David Mertz) Date: Tue, 7 Mar 2017 17:39:29 -0500 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> Message-ID: On Tue, Mar 7, 2017 at 4:36 PM, Erik wrote: > * listextend() - this should do the right thing with the type hint when >> extending one list with another. >> > > * Several other methods ('contains', 'remove', 'count', 'index') also use > PyObject_RichCompareBool(). They could also presumably benefit from the > same optimisation (perhaps it's not all about sort() - perhaps this gives a > little more weight to the idea). Good point about list.extend(). I don't think __type_hint__ could help with .__contains__() or .count() or .remove(). E.g.: In [7]: lst = [1.0, 2.0, 1+0j, F(1,1)] In [8]: from fractions import Fraction as F In [9]: lst = [1.0, 2.0, 1+0j, F(1,1)] In [10]: 1 in lst Out[10]: True In [11]: lst.count(1) Out[11]: 3 In [12]: l.index(1) Out[12]: 0 The list has absolutely nothing of the right type. Yet it contains an item, counts things that are equal, finds a position for an equal item. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at lucidity.plus.com Tue Mar 7 18:27:00 2017 From: python at lucidity.plus.com (Erik) Date: Tue, 7 Mar 2017 23:27:00 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> Message-ID: <344b9086-94f3-411b-bb1f-ff696d30c28d@lucidity.plus.com> Hi David, On 07/03/17 22:39, David Mertz wrote: > On Tue, Mar 7, 2017 at 4:36 PM, Erik > wrote: > > * Several other methods ('contains', 'remove', 'count', 'index') > also use PyObject_RichCompareBool(). They could also presumably > benefit from the same optimisation (perhaps it's not all about > sort() - perhaps this gives a little more weight to the idea). > > > Good point about list.extend(). I don't think __type_hint__ could help > with .__contains__() or .count() or .remove(). E.g.: > > In [7]: lst = [1.0, 2.0, 1+0j, F(1,1)] > In [8]: from fractions import Fraction as F > In [9]: lst = [1.0, 2.0, 1+0j, F(1,1)] > In [10]: 1 in lst > Out[10]: True > In [11]: lst.count(1) > Out[11]: 3 > In [12]: l.index(1) > Out[12]: 0 > > > The list has absolutely nothing of the right type. Yet it contains an > item, counts things that are equal, finds a position for an equal item. Sure, but if the needle doesn't have the same type as the (homogeneous) haystack, then the rich comparison would still need to be done as a fallback (and would produce the result you indicate). But if the needle and the homogeneous haystack have the _same_ type, then a more optimised version of the operation can be done. Regards, E. From mertz at gnosis.cx Tue Mar 7 18:53:18 2017 From: mertz at gnosis.cx (David Mertz) Date: Tue, 7 Mar 2017 18:53:18 -0500 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <344b9086-94f3-411b-bb1f-ff696d30c28d@lucidity.plus.com> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <344b9086-94f3-411b-bb1f-ff696d30c28d@lucidity.plus.com> Message-ID: On Tue, Mar 7, 2017 at 6:27 PM, Erik wrote: > Good point about list.extend(). I don't think __type_hint__ could help >> with .__contains__() or .count() or .remove(). E.g.: >> >> In [7]: lst = [1.0, 2.0, 1+0j, F(1,1)] >> In [8]: from fractions import Fraction as F >> In [9]: lst = [1.0, 2.0, 1+0j, F(1,1)] >> In [10]: 1 in lst >> Out[10]: True >> >> The list has absolutely nothing of the right type. Yet it contains an >> item, counts things that are equal, finds a position for an equal item. >> > > Sure, but if the needle doesn't have the same type as the (homogeneous) > haystack, then the rich comparison would still need to be done as a > fallback (and would produce the result you indicate). > In [22]: class Eq(int): def __eq__(self, other): return True ....: In [23]: four, five, six = Eq(4), Eq(5), Eq(6) In [24]: lst = [four, five, six] In [25]: lst.count(Eq(7)) Out[25]: 3 How would this work (other than saying "don't do that it's perverse")? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Tue Mar 7 18:59:17 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Tue, 07 Mar 2017 23:59:17 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <344b9086-94f3-411b-bb1f-ff696d30c28d@lucidity.plus.com> Message-ID: On Tue, Mar 7, 2017 at 4:53 PM David Mertz wrote: > > > In [22]: class Eq(int): > def __eq__(self, other): > return True > ....: > In [23]: four, five, six = Eq(4), Eq(5), Eq(6) > In [24]: lst = [four, five, six] > In [25]: lst.count(Eq(7)) > Out[25]: 3 > > > How would this work (other than saying "don't do that it's perverse")? > There would be two needless checks in the equality testing. First, PyObject_RichCompareBool would see if other is a subclass of self, in which case other->tp_richcompare would be used iff. it is non-null. Otherwise, we would check if self->tp_richcompare is non-null, the check would pass, and we would call self.__eq__. See the flow chart on my poster (linked to in the first email on this thread). -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Mar 7 19:18:15 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Mar 2017 11:18:15 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> Message-ID: <20170308001814.GM5689@ando.pearwood.info> On Tue, Mar 07, 2017 at 08:46:45PM +0000, Erik wrote: > I don't think anyone has mentioned this yet, but FWIW I think the 'type > hint' may need to be tri-state: heterogeneous (NULL), homogeneous (the > pointer to the type structure) and also "unknown" (a sentinel value - > the address of a static char or something). I thought about that and rejected it as an unnecessary complication. Hetrogeneous and unknown might as well be the same state: either way, you cannot use the homogeneous-type optimization. Part of the complexity here is that I'd like this flag to be available to Python code, not just a hidden internal state of the list. But my instinct for this could be wrong -- if anyone is interested to do some experiments on this, it might be that a three-state flag works out better in practice than two-states. > Otherwise, a delete operation on the list would need to scan the list to > work out if it had changed from heterogeneous to homogeneous (unless it > was acceptable that once heterogeneous, a list is always considered > heterogeneous - i.e., delete always sets the hint to NULL). In a later email, you corrected this: a delete operation need not touch the type-hint (except when the last item is deleted, at which point it resets to None/unknown. With a three-state flag, you can make a three-way decision: If the flag is Unknown: - do a O(N) scan to determine whether the list is homogeneous or hetrogeneous, then choose the optimized or unoptimized routine; if the flag is Hetrogeneous: - there is no point doing a scan, so always choose the unoptimized routine; if the flag is a type: - the list is definitely homogeneous, so depending on the type, you may be able to choose an optimized rountine. Compared to that, a two-state flag misses some opportunities to run the optimized routine: - list contains [1, 2, 3, 4, "foo"] so the hint is set to None; - delete the last item; - list is now homogeneous but the hint is still None. But also avoids bothering with an O(N) scan in some situations where the list really is hetrogeneous. So there's both an opportunity cost and a benefit. There may be other opportunities to do the scan, such as when the underlying pre-allocated array resizes, so even with the two-state flag, "unknown" need not stay unknown forever. > I'd prefer the sort optimization to be based on what my list contains > NOW, not on what it may have contained some time in the past, Remember, we're talking about opportunities for applying an optimization here, nothing more. You're not giving up anything: at worst, the ordinary, unoptimized routine will run and you're no worse off than you are today. > so I'm not > a fan of the "once heterogeneous, always considered heterogeneous" > behaviour if it's cheap enough to avoid it. It is not just a matter of the cost of tracking three states versus two. It is a matter of the complexity of the interface. I suppose this could be reported to Python code as None, False or the type. Although, I don't know about you, but I know I'll never be able to remember whether None means Unknown and False means Hetrogeneous, or the other way around. Ultimately, this is all very pie-in-the-sky unless somebody tests just how expensive this is and whether the benefit is worthwhile. That's not going to be me: I'm not able to hack on the list C code. -- Steve From steve at pearwood.info Tue Mar 7 19:44:36 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Mar 2017 11:44:36 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> Message-ID: <20170308004435.GN5689@ando.pearwood.info> On Tue, Mar 07, 2017 at 09:10:00PM +0000, Elliot Gorokhovsky wrote: > I think the bigger problem, though, is that most list use does *not* > involve sorting, There are other uses for a type hint apart from sorting. There may be optimized versions of other functions (sum, math.fsum, ...) and list methods (e.g. count, remove), anything that has to walk the list comparing items). All very hypothetical at the moment, I admit it... > so it would be a shame to impose the non-trivial overhead > of type-checking on *all* list use. You might be right, but my guess is that the overhead isn't quite as big as you may think, at least for some operations. The type-check itself is just a pointer compare, not an issubclass() operation. The types are either identical, or the list is hetrogeneous. True, `mylist[i] = x` should be a quick assignment into an array of pointers, but there's a bounds check to ensure i is within the correct range. Other methods are even more expensive: `mylist[i:j] = [a...z]` has to move list items around, possibly allocate more memory, and update the list length, compared to that the scan of a...z is probably minimal. `mylist.append(x)` is *usually* a simple assignment into the array (plus updating the list length) but every now and again it triggers a resize, which is costly. I don't think we can predict whether this is a nett gain or loss just from first principles. > Anyway, my patch could always be a precursor to a more general optimization > along these lines. Indeed! Even if nothing else comes of this than your patch, thank you! -- Steve From python at lucidity.plus.com Tue Mar 7 20:20:19 2017 From: python at lucidity.plus.com (Erik) Date: Wed, 8 Mar 2017 01:20:19 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <20170308001814.GM5689@ando.pearwood.info> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> Message-ID: <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> On 08/03/17 00:18, Steven D'Aprano wrote: > I thought about that and rejected it as an unnecessary complication. > Hetrogeneous and unknown might as well be the same state: either way, > you cannot use the homogeneous-type optimization. Knowing it's definitely one of two positive states and not knowing which of those two states it is is not the same thing when it comes to what one can and can't optimize cheaply :) It sort of depends on how cheaply one can track the states though ... > Part of the complexity here is that I'd like this flag to be available > to Python code, not just a hidden internal state of the list. Out of interest, for what purpose? Generally, I thought Python code should not need to worry about low-level optimisations such as this (which are C-Python specific AIUI). A list.is_heterogeneous() method could be implemented if it was necessary, but how would that be used? > But also avoids bothering with an O(N) scan in some situations where > the list really is hetrogeneous. So there's both an opportunity cost and > a benefit. O(N) is worst case. Most of the anecdotal evidence in this thread so far seems to suggest that heterogeneous lists are not common. May or may not be true. Empirically, for me, it is true. Who knows? (and there is the question). > Remember, we're talking about opportunities for applying an optimization > here, nothing more. You're not giving up anything: at worst, the > ordinary, unoptimized routine will run and you're no worse off than you > are today. You are a little bit - the extra overhead of checking all of this (which is the unknown factor we're all skirting around ATM) costs. So converting a previously-heterogeneous list to a homogeneous list via a delete or whatever has a benefit if the optimisations can then be applied to that list many times in the future (i.e., once it becomes recognised as homogeneous again, it benefits from optimised paths in the interpreter). And of course, all that depends on your use case. It might work out better for one application over another. As you quite rightly point out, it needs someone to measure the alternatives and work out if _overall_ it has a positive impact ... >> so I'm not >> a fan of the "once heterogeneous, always considered heterogeneous" >> behaviour if it's cheap enough to avoid it. > > It is not just a matter of the cost of tracking three states versus two. > It is a matter of the complexity of the interface. > > I suppose this could be reported to Python code as None, False or the > type. I didn't think any of this stuff would come back to Python code (I thought we were talking about C-Python specific implementation only). How is this useful to Python code? > Ultimately, this is all very pie-in-the-sky unless somebody tests just > how expensive this is and whether the benefit is worthwhile. I agree. As I said before, I'm just pointing out things I noticed while looking at the current C code which could be picked up on if someone wants to try implementing and benchmarking any of this. It sort of feels like an argument, but I hope we're just violently agreeing on a generally shared goal ;) Regards, E. From steve at pearwood.info Wed Mar 8 06:07:45 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 8 Mar 2017 22:07:45 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> Message-ID: <20170308110744.GO5689@ando.pearwood.info> On Wed, Mar 08, 2017 at 01:20:19AM +0000, Erik wrote: > >Part of the complexity here is that I'd like this flag to be available > >to Python code, not just a hidden internal state of the list. > > Out of interest, for what purpose? Generally, I thought Python code > should not need to worry about low-level optimisations such as this > (which are C-Python specific AIUI). I mentioned earlier that I have code which has to track the type of list items, and swaps to a different algorithm when the types are not all the same. In practice, I just run the "mixed types" algorithm regardless, because the cost of doing a scan of the items in pure Python is too expensive relative to the time saved. Moving some of the work into the C infrastructure might change that. I'm not completely sure that this would, in fact, be useful to me, but I'd like the opportunity to experiment. I could try using a list subclass, but again, the cost of doing the type-checking in Python instead of C is (I think) prohibitive. Nevertheless, I understand that the burden of proving the usefulness of this is on me. (Or anyone else that wants to argue the case.) > A list.is_heterogeneous() method > could be implemented if it was necessary, but how would that be used? I would prefer to get the list item's type: if mylist.__type_hint__ is float: # run optimized float version ... elif mylist.__type_hint__ is int: ... else: # unoptimized version > >But also avoids bothering with an O(N) scan in some situations where > >the list really is hetrogeneous. So there's both an opportunity cost and > >a benefit. > > O(N) is worst case. It is also the best and average case. Given a homogenous list of N items, for some arbitrary N between 0 and ?, you have to look at all N items before knowing that they're all the same type. So for the homogenous case, the best, worst and average are identically O(N). Given a hetrogeneous list of N items where the first difference is found at index K, K can range from 1 through N-1. (By definition, it cannot be at index 0: you can only detect a difference in types after checking *two* items.) The worst case is that you don't find the difference until the last item, which is O(N). The best case is that you find the difference in position 1 and bail out early. On average, you will find the first difference halfway through the list, which makes it O(N/2) but that's just O(N). (Constant factors don't matter.) If you want to call that O(1) for the best hetrogeneous case, I won't argue except to say that's rather against the spirit of Big Oh analysis in my opinion. I think it's more realistic to say its O(N) across all combinations of best/worst/average and homogeneous/hetrogeneous. But of course if you bail out early, the constant multiplier may differ. > Most of the anecdotal evidence in this thread so far seems to suggest > that heterogeneous lists are not common. May or may not be true. > Empirically, for me, it is true. Who knows? (and there is the question). That will strongly depend on where the data is coming from, but when it comes to sorting, randomly mixed types will usually fail since comparisons between different types are generally illegal: py> [1, "a"].sort() Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: str() < int() -- Steve From cescus92 at gmail.com Wed Mar 8 11:01:22 2017 From: cescus92 at gmail.com (Francesco Franchina) Date: Wed, 8 Mar 2017 17:01:22 +0100 Subject: [Python-ideas] Proposal: making __str__ count in time's class Message-ID: Hello everyone, I'm shortly writing to you about a reflection I lately made upon the current functioning of __str__ for the time's class. Before expressing my thought and proposal, I want to make sure we all agree on a simple and clear fact: the __str__ magic method is used to give a literal and human-readable representation to the object (unlike __repr__). Generally this is true across the python panorama. It's not true for the time class, for example. *>>> import time>>> a = time.localtime()>>> a.__str__()'time.struct_time(tm_year=2017, tm_mon=3, tm_mday=8, tm_hour=16, tm_min=6, tm_sec=16, tm_wday=2, tm_yday=67, tm_isdst=0)'* Well, don't get me wrong: the main aim of the __str__ method has been accomplished but, imho, not in the most pythonic way. I just wanted to ask you: what do you think about re-writing the __str__ of the time class so it would return something like ISO 8601 [https://en.wikipedia.org/wiki/ISO_8601] format? Wouldn't it be more meaningful? Especially in the JS-everywhere-era it could be more more productive. *TL;DR* __str__ for dates should return a human-readable date format (eg: https://en.wikipedia.org/wiki/ISO_8601) I'm waiting for your opinions. Thank you for your time and ideas! Francesco Franchina -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Wed Mar 8 11:16:43 2017 From: barry at barrys-emacs.org (Barry Scott) Date: Wed, 8 Mar 2017 16:16:43 +0000 Subject: [Python-ideas] Proposal: making __str__ count in time's class In-Reply-To: References: Message-ID: > On 8 Mar 2017, at 16:01, Francesco Franchina wrote: > > Hello everyone, > > I'm shortly writing to you about a reflection I lately made upon the current functioning of __str__ for the time's class. > > Before expressing my thought and proposal, I want to make sure we all agree on a simple and clear fact: > the __str__ magic method is used to give a literal and human-readable representation to the object (unlike __repr__). > > Generally this is true across the python panorama. It's not true for the time class, for example. > > >>> import time > >>> a = time.localtime() > >>> a.__str__() > 'time.struct_time(tm_year=2017, tm_mon=3, tm_mday=8, tm_hour=16, tm_min=6, tm_sec=16, tm_wday=2, tm_yday=67, tm_isdst=0)' > > Well, don't get me wrong: the main aim of the __str__ method has been accomplished but, imho, not in the most pythonic way. > > I just wanted to ask you: what do you think about re-writing the __str__ of the time class so it would return something like > ISO 8601 [https://en.wikipedia.org/wiki/ISO_8601 ] format? Wouldn't it be more meaningful? Especially in the JS-everywhere-era > it could be more more productive. > > > TL;DR > __str__ for dates should return a human-readable date format (eg: https://en.wikipedia.org/wiki/ISO_8601 ) Just use datetime module instead of time? >>> datetime.datetime.now().isoformat() '2017-03-08T16:14:58.448801' Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Mar 8 11:25:55 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 08 Mar 2017 08:25:55 -0800 Subject: [Python-ideas] Proposal: making __str__ count in time's class In-Reply-To: References: Message-ID: <58C03093.9020805@stoneleaf.us> On 03/08/2017 08:01 AM, Francesco Franchina wrote: > Before expressing my thought and proposal, I want to make sure we all agree on a simple and clear fact: > the __str__ magic method is used to give a literal and human-readable representation to the object (unlike __repr__). If __str__ has not been defined, then __repr__ is used instead. > time.struct_time(tm_year=2017, tm_mon=3, tm_mday=8, tm_hour=16, tm_min=6, tm_sec=16, tm_wday=2, tm_yday=67, tm_isdst=0) Which is what that looks like. -- ~Ethan~ From python at lucidity.plus.com Wed Mar 8 12:08:51 2017 From: python at lucidity.plus.com (Erik) Date: Wed, 8 Mar 2017 17:08:51 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <20170308110744.GO5689@ando.pearwood.info> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> Message-ID: <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> On 08/03/17 11:07, Steven D'Aprano wrote: > I mentioned earlier that I have code which has to track the type of list > items, and swaps to a different algorithm when the types are not all the > same. Hmmm. Yes, I guess if the expensive version requires a lot of isinstance() messing or similar for each element then it could be better to have optimized versions for homogeneous lists of ints or strings etc. >> A list.is_heterogeneous() method >> could be implemented if it was necessary, > > I would prefer to get the list item's type: > > if mylist.__type_hint__ is float: If you know the list is homogeneous then the item's type is "type(mylist[0])". Also, having it be a function call gives an obvious place to put the transition from "unknown" to known state if the tri-state hint approach was taken. Otherwise, that would have to be hooked into the attribute access somehow. That's for someone who wants to try implementing it to decide and propose though :) E. From barry at barrys-emacs.org Wed Mar 8 15:47:24 2017 From: barry at barrys-emacs.org (Barry) Date: Wed, 8 Mar 2017 20:47:24 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> Message-ID: <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> Can you assume that list of of type(list[0]) and use that type's optimised sort? But in the optimised sort code check that the types are as required. If you hit an element that is not of the required type then fall back to the unoptimised sort. So part of list is sorted using optimised code and the sort is completed with the unoptimised code. Is that sematically clean? Barry > On 8 Mar 2017, at 17:08, Erik wrote: > >> On 08/03/17 11:07, Steven D'Aprano wrote: >> I mentioned earlier that I have code which has to track the type of list >> items, and swaps to a different algorithm when the types are not all the >> same. > > Hmmm. Yes, I guess if the expensive version requires a lot of isinstance() messing or similar for each element then it could be better to have optimized versions for homogeneous lists of ints or strings etc. > >>> A list.is_heterogeneous() method >>> could be implemented if it was necessary, >> >> I would prefer to get the list item's type: >> >> if mylist.__type_hint__ is float: > > If you know the list is homogeneous then the item's type is "type(mylist[0])". > > Also, having it be a function call gives an obvious place to put the transition from "unknown" to known state if the tri-state hint approach was taken. Otherwise, that would have to be hooked into the attribute access somehow. > > That's for someone who wants to try implementing it to decide and propose though :) > > E. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From spiantado at gmail.com Wed Mar 8 17:09:23 2017 From: spiantado at gmail.com (Steven Piantadosi) Date: Wed, 8 Mar 2017 17:09:23 -0500 Subject: [Python-ideas] dict(default=int) Message-ID: Hi All, I find importing defaultdict from collections to be clunky and it seems like having a default should just be an optional keyword to dict. Thus, something like, d = dict(default=int) would be the same as from collections import defaultdict d = defaultdict(int) Any thoughts? Thanks, ++Steve From mahmoud at hatnote.com Wed Mar 8 17:23:54 2017 From: mahmoud at hatnote.com (Mahmoud Hashemi) Date: Wed, 8 Mar 2017 14:23:54 -0800 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: Message-ID: That's already valid dict syntax. >>> dict(default=int) {'default': } Generally that in itself makes this a no go. Mahmoud On Wed, Mar 8, 2017 at 2:09 PM, Steven Piantadosi wrote: > Hi All, > > I find importing defaultdict from collections to be clunky and it seems > like having a default should just be an optional keyword to dict. Thus, > something like, > > d = dict(default=int) > > would be the same as > > from collections import defaultdict > d = defaultdict(int) > > Any thoughts? > > Thanks, > > ++Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From contact at brice.xyz Wed Mar 8 17:23:48 2017 From: contact at brice.xyz (Brice PARENT) Date: Wed, 8 Mar 2017 23:23:48 +0100 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: Message-ID: Le 08/03/17 ? 23:09, Steven Piantadosi a ?crit : > Hi All, > > I find importing defaultdict from collections to be clunky and it > seems like having a default should just be an optional keyword to > dict. Thus, something like, > > d = dict(default=int) > > would be the same as > > from collections import defaultdict > d = defaultdict(int) > > Any thoughts? I have never really used it, so I might say something stupid, but doesn't it prevent the use of "default" as a key in the generated dict? Would those 2 dicts be equal ? d1 = dict(default=5) d2 = {'default': 5} Brice From rosuav at gmail.com Wed Mar 8 17:30:24 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 9 Mar 2017 09:30:24 +1100 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: Message-ID: On Thu, Mar 9, 2017 at 9:23 AM, Brice PARENT wrote: > Would those 2 dicts be equal ? > d1 = dict(default=5) > d2 = {'default': 5} Easy to find out: >>> d1 = dict(default=5) >>> d2 = {'default': 5} >>> d1 == d2 True ChrisA From contact at brice.xyz Wed Mar 8 17:39:23 2017 From: contact at brice.xyz (Brice PARENT) Date: Wed, 8 Mar 2017 23:39:23 +0100 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: Message-ID: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> Le 08/03/17 ? 23:30, Chris Angelico a ?crit : > On Thu, Mar 9, 2017 at 9:23 AM, Brice PARENT wrote: >> Would those 2 dicts be equal ? >> d1 = dict(default=5) >> d2 = {'default': 5} > Easy to find out: > >>>> d1 = dict(default=5) >>>> d2 = {'default': 5} >>>> d1 == d2 > True > > ChrisA That's my point... If they are equal, it clearly means that the declaration of d1 is *not* declaring a defaultdict right now, and is a valid syntax. So the behaviour would have to be changed, which would make legacy code erroneous in such a case. But a possible workaround, is if we used the first positional argument of dict() as the default value. As right now it doesn't accept positional arguments (or at least if they are not iterable, which complicates a bit the thing), we could allow a syntax like : d = dict([default, ][*args, ]**kwargs) where default is a callable, *args made of iterables, and kwargs any kwargs. From rosuav at gmail.com Wed Mar 8 17:43:48 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 9 Mar 2017 09:43:48 +1100 Subject: [Python-ideas] dict(default=int) In-Reply-To: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> Message-ID: On Thu, Mar 9, 2017 at 9:39 AM, Brice PARENT wrote: > But a possible workaround, is if we used the first positional argument of > dict() as the default value. As right now it doesn't accept positional > arguments (or at least if they are not iterable, which complicates a bit the > thing), we could allow a syntax like : > d = dict([default, ][*args, ]**kwargs) > where default is a callable, *args made of iterables, and kwargs any kwargs. There'd still be a pile of special cases. Granted, there aren't going to be very many objects that are both callable and iterable, but there certainly _can be_, and if one were passed as the first argument, it would be ambiguous. Safer to keep this out of the signature of dict itself. ChrisA From eric at trueblade.com Wed Mar 8 17:49:38 2017 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 8 Mar 2017 17:49:38 -0500 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> Message-ID: <70aca7b8-6d7b-a95c-5df0-24c438c1c0ad@trueblade.com> On 3/8/2017 5:43 PM, Chris Angelico wrote: > On Thu, Mar 9, 2017 at 9:39 AM, Brice PARENT wrote: >> But a possible workaround, is if we used the first positional argument of >> dict() as the default value. As right now it doesn't accept positional >> arguments (or at least if they are not iterable, which complicates a bit the >> thing), we could allow a syntax like : >> d = dict([default, ][*args, ]**kwargs) >> where default is a callable, *args made of iterables, and kwargs any kwargs. > > There'd still be a pile of special cases. Granted, there aren't going > to be very many objects that are both callable and iterable, but there > certainly _can be_, and if one were passed as the first argument, it > would be ambiguous. > > Safer to keep this out of the signature of dict itself. If we really want to make defaultdict feel more "builtin" (and I don't see any reason to do so), I'd suggest adding a factory function: dict.defaultdict(int) Similar in spirit to dict.fromkeys(), except of course returning a defauldict, not a dict. Eric. From elliot.gorokhovsky at gmail.com Wed Mar 8 17:58:10 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Wed, 08 Mar 2017 22:58:10 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> Message-ID: On Wed, Mar 8, 2017 at 2:14 PM Barry wrote: > Can you assume that list of of type(list[0]) and use that type's optimised > sort? > But in the optimised sort code check that the types are as required. > If you hit an element that is not of the required type then fall back to > the unoptimised sort. > Well, how would you tell if you've hit an element that is not of the required type? You'd have to check during every compare, right? And that's precisely what we're trying to avoid! The whole point of my patch is that we do O(nlogn) compares, but only have O(n) elements, so it's much cheaper to do all the type checks in advance, and in the very likely case that our list is homogeneous, switch to an optimized special-case compare function. Even when we only do O(n) compares, my patch is still much faster (see benchmarks higher up in this thread). Why? Well, if you're doing the type checks during the compares, you're doing them across different function calls, with other stuff interspersed in between. So pipeline/branch prediction/caching is less able to optimize away the overhead of the safety checks (I don't know how CPUs work, but one of those is relevant here). With the pre-sort check, it's all in a single iteration through the list, and we're taking the same exact branch every time; it's much faster. Think iterating over a matrix row-wise versus iterating column-wise (not exactly relevant here since that's about cache, but that's the idea. Again, I don't really know how CPUs work). So in short: no, we can't just check as we go, because that's what we're already doing. But we can optimize significantly by checking in advance. I mean, practically speaking, the advance check is basically free. The compare-time checks, in sum, are not, both asymptotically and practically. Best Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From dan at tombstonezero.net Wed Mar 8 18:22:13 2017 From: dan at tombstonezero.net (Dan Sommers) Date: Wed, 8 Mar 2017 23:22:13 +0000 (UTC) Subject: [Python-ideas] dict(default=int) References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> Message-ID: On Thu, 09 Mar 2017 09:43:48 +1100, Chris Angelico wrote: > On Thu, Mar 9, 2017 at 9:39 AM, Brice PARENT wrote: >> But a possible workaround, is if we used the first positional >> argument of dict() as the default value [...] > ... Granted, there aren't going to be very many objects that are both > callable and iterable ... Many stream-like objects are both callable (get the next element of the stream) and iterable (iterate through the remainder of the stream). Or at least that's the way I make mine. Sometimes. > Safer to keep this out of the signature of dict itself. Agreed. Dan From chris.barker at noaa.gov Wed Mar 8 18:30:50 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 8 Mar 2017 15:30:50 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> Message-ID: On Wed, Mar 8, 2017 at 2:58 PM, Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > > The whole point of my patch is that we do O(nlogn) compares, but only have > O(n) elements, so it's much cheaper to do all the type checks in advance, > > I mean, practically speaking, the advance check is basically free. The > compare-time checks, in sum, are not, both asymptotically and practically. > hmm -- I know folks like to say that "the constants don't matter", but it seems they do in this case: without pre-checking: O(nlogn) with pre-checking If homogenous: O(n) + O(nlogn) so the pre-checking only adds if you ignore the constants.. But I'm assuming (particularly with locality and branch prediction and all that included) the constant to type-check is much smaller than the constant to compare two unknown types, so: TC*n + KC*n*logn vs UC*n*logn where: TC -- Constant to type check KC -- Constant known compare UC -- Constant unknown type check So if UC > TC/logn + KC Then this optimization make sense. If UC >KC and UC >>> TC, then this all works out. But if n is HUGE, it may end up being slower (maybe more so with cache locality???) Which is why you need to profile all this carefully. So far Elliott's experiments seem to show it works out well. Which doesn't surprise me for built-ins like float and int that have a native compare, but apparently also for more complex types? How about custom classes? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Thu Mar 9 05:04:44 2017 From: barry at barrys-emacs.org (Barry Scott) Date: Thu, 9 Mar 2017 10:04:44 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> Message-ID: <5BA5FFB7-ECE1-49AB-B2F6-F9650A3D6DCE@barrys-emacs.org> > On 8 Mar 2017, at 22:58, Elliot Gorokhovsky wrote: > > On Wed, Mar 8, 2017 at 2:14 PM Barry > wrote: > Can you assume that list of of type(list[0]) and use that type's optimised sort? > But in the optimised sort code check that the types are as required. > If you hit an element that is not of the required type then fall back to the unoptimised sort. > > Well, how would you tell if you've hit an element that is not of the required type? You'd have to check during every compare, right? And that's precisely what we're trying to avoid! What it seemed the trick for optimisation is is to compare the type pointer of an object to see if its the same as a type supported by the chosen optimised sort. It was not clear to me that you need to scan the list at the start to make sure its homogeneous. Given that the type check is so cheap will it slow the sort if you do the pointer check in the compare code? I am not suggesting you run rich compare full fat on each compare. > The whole point of my patch is that we do O(nlogn) compares, but only have O(n) elements, so it's much cheaper to do all the type checks in advance, and in the very likely case that our list is homogeneous, switch to an optimized special-case compare function. So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which is smaller the O(n) pointer checks? > > Even when we only do O(n) compares, my patch is still much faster (see benchmarks higher up in this thread). Why? Well, if you're doing the type checks during the compares, you're doing them across different function calls, with other stuff interspersed in between. So pipeline/branch prediction/caching is less able to optimize away the overhead of the safety checks (I don't know how CPUs work, but one of those is relevant here). With the pre-sort check, it's all in a single iteration through the list, and we're taking the same exact branch every time; it's much faster. Think iterating over a matrix row-wise versus iterating column-wise (not exactly relevant here since that's about cache, but that's the idea. Again, I don't really know how CPUs work). Provided the code is small I think both versions of the algorithm will benefit from cache and branch prediction. > > So in short: no, we can't just check as we go, because that's what we're already doing. But we can optimize significantly by checking in advance. I mean, practically speaking, the advance check is basically free. The compare-time checks, in sum, are not, both asymptotically and practically. I not clear this is a true. But I have not read the code. Barry -------------- next part -------------- An HTML attachment was scrubbed... URL: From contact at brice.xyz Thu Mar 9 07:08:26 2017 From: contact at brice.xyz (Brice PARENT) Date: Thu, 9 Mar 2017 13:08:26 +0100 Subject: [Python-ideas] For/in/as syntax In-Reply-To: <6eaec9aa-ae7b-13b4-4403-1f1921a131cb@lucidity.plus.com> References: <825ac651-225e-877b-6808-e6f36e7bd311@brice.xyz> <9588100b-bb90-7472-77bc-ead02b3a02a2@brice.xyz> <387f686d-ae24-e8ed-d884-f899d16b5d6c@mapgears.com> <6eaec9aa-ae7b-13b4-4403-1f1921a131cb@lucidity.plus.com> Message-ID: <95a188ab-d9dd-1014-ad57-f3b3a68d6b62@brice.xyz> Hi Erik, >> I don't really understand what this means, as I'm not aware of how those >> things work in the background. > > What I mean is, in the syntax "for spam in ham as eggs:" the name > "eggs" is bound to your loop manager object. Where is the constructor > call for this object? what class is it? That's what I meant by "magical". It's what I thought, thanks for the clarification. > > If you are proposing the ability to create user-defined loop managers > then there must be somewhere where your custom class's constructor is > called. Otherwise how does Python know what type of object to create? I wasn't thinking about a custom object, although this syntax wouldn't ease the process of customizing the looping behaviour itself. > > Something like (this is not a proposal, just something plucked out of > the air to hopefully illustrate what I mean): > > for spam in ham with MyLoop() as eggs: > eggs.continue() I get it. If there are use cases where we'd like to use a custom loop type, it surely gets a bit more complicated, but we could find some alternative syntaxes like yours or one of those : with LoopIterator.use(MyLoop): for spam in ham as eggs: eggs.continue() or for spam in ham as (MyLoop, eggs): # When it's a 2-tuple, first element is the class, second the instance. But I'm not sure about a tuple where one value is to be read while the other is assigned to... eggs.continue() or eggs = MyLoop() for spam in ham using eggs: # I particularly dislike this one... eggs.continue() But anyway, I'm not sure we're yet at this point of the thinking! > > >> brings two functionalities that are part of the proposal, but are not >> its main purpose, which is having the object itself. Allowing to break >> and continue from it are just things that it could bring to us, but >> there are countless things it could also bring (not all of them being >> good ideas, of course), like the .skip() and the properties I mentioned, > > I understand that, but I concentrated on those because they were > easily converted into syntax (and would probably be the only things > I'd find useful - all the other stuff is mostly doable using a custom > iterator, I think). Probably, i'll probably try to implement those as I've needed one or the other some times already. > > I would agree that considering syntax for all of the extra things you > mention would be a bad idea - which your loop manager object idea gets > around. > >> but we could discuss about some methods like forloop.reset(), >> forloop.is_first_iteration() which is just of shortcut to (forloop.count >> == 0), forloop.is_last_iteration() > > Also, FWIW, if I knew that in addition to the overhead of creating a > loop manager object I was also incurring the overhead of a loop > counter being maintained (usually, one is not required - if it is, use > enumerate()) I would probably not use this construct and instead find > ways of restructuring my code to avoid it using regular for loops. I would certainly not enforce the users to use this syntax, nor the system to maintain the object if it is not instanciated explicitly by the user. Also, I don't know how it works behind the doors, so I have no idea whether having such an object and a counter (which is probably the only things to maintain, as everything else just depend on the counter) would change a lot the cost of the loop in term of speed and memory (and anything else). > > > I'm not beating up on you - like I said, I think the idea is interesting. Don't worry, I didn't think you were. I like this idea as I think it could be of a good help to many people, but it doesn't get much traction here, so I probably overestimate it! Brice From elliot.gorokhovsky at gmail.com Thu Mar 9 10:30:05 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Thu, 09 Mar 2017 15:30:05 +0000 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <5BA5FFB7-ECE1-49AB-B2F6-F9650A3D6DCE@barrys-emacs.org> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> <5BA5FFB7-ECE1-49AB-B2F6-F9650A3D6DCE@barrys-emacs.org> Message-ID: On Thu, Mar 9, 2017 at 3:04 AM Barry Scott wrote: > > So you do O(nlogn)*2 pointer compares with my suggestion it seems? Which > is smaller the O(n) pointer checks? > Not sure what you mean here... pretty sure your inequality is backwards? Look -- the point is, we already *do* the pointer checks during every compare. That's the current implementation. I believe my benchmarks are sufficient to prove that my patch is much faster than the current implementation. Which makes sense, because we do one type check per object, as opposed to something like log n per object, and we do them all at once, which I do believe is faster than doing them across calls. Now, you're not entirely wrong -- the current implementation doesn't do the type checks as efficiently as it could. Specifically, it does them once in PyObject_RichCompareBool, and then *again* in ob_type->tp_richcompare (which it has to for safety: ob_type->tp_richcompare doesn't know the arguments have already been type-checked!) You could totally define optimized compares that do the type-checks more efficiently, and you would see a benefit, just not nearly as extreme as the benefit I demonstrate. Thanks again for your feedback! Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From elliot.gorokhovsky at gmail.com Thu Mar 9 10:38:23 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Thu, 09 Mar 2017 15:38:23 +0000 Subject: [Python-ideas] Submitted a PR! In-Reply-To: References: Message-ID: Just submitted a PR implementing this: https://github.com/python/cpython/pull/582 -- just need someone to review it now :) Thanks for all your feedback, everyone! On Sun, Mar 5, 2017 at 12:19 AM Elliot Gorokhovsky < elliot.gorokhovsky at gmail.com> wrote: > (Summary of results: my patch at https://bugs.python.org/issue28685 makes > list.sort() 30-50% faster in common cases, and at most 1.5% slower in the > uncommon worst case.) > > Hello all, > > You may remember seeing some messages on here about optimizing list.sort() > by exploiting type-homogeneity: since comparing apples and oranges is > uncommon (though possible, i.e. float to int), it pays off to check if the > list is type-homogeneous (as well as homogeneous with respect to some other > properties), and, if it is, to replace calls to PyObject_RichCompareBool > with calls to ob_type->tp_richcompare (or, in common special cases, to > optimized compare functions). The entire patch is less than 250 lines of > code, most of which is pretty boilerplate (i.e. a lot of assertions in > #ifdef Py_DEBUG blocks, etc). > > I originally wrote that patch back in November. I've learned a lot since > then, both about CPython and about mailing list etiquette :). Previous > discussion about this can be found at > https://mail.python.org/pipermail/python-dev/2016-October/146648.html and > https://mail.python.org/pipermail/python-ideas/2016-October/042807.html. > > Anyway, I recently redid the benchmarks much more rigorously (in > preparation for presenting this project at my school's science fair), > achieving a standard deviation of less than 0.5% of the mean for all > measurements. The exact benchmark script used can be found at > https://github.com/embg/python-fastsort-benchmark (it's just sorting > random lists of/lists of tuples of [type]. While listsort.txt talks about > benchmarking different kinds of structured lists, instead of just random > lists, the results here would hold in those cases just as well, because > this makes individual comparisons cheaper, instead of reducing the number > of comparisons based on structure). > > I also made a poster describing the optimization and including a pretty > graph displaying the benchmark data: > https://github.com/embg/python-fastsort-benchmark/blob/master/poster.pdf. > For those who would rather read the results here (though it is a *really* > pretty graph): > > *** > Percent improvement for sorting random lists of [type] > (1-patched/unpatched): > float: 48% > bounded int (magnitude smaller than 2^32): 48.4% > latin string (all characters in [0,255]): 32.7% > general int (reasonably uncommon?): 17.2% > general string (reasonably uncommon?): 9.2% > tuples of float: 63.2% > tuples of bounded int: 64.8% > tuples of latin string: 55.8% > tuples of general int: 50.3% > tuples of general string: 44.1% > tuples of heterogeneous: 41.5% > heterogeneous (lots of float with an int at the end; worst-case): -1.5% > *** > > Essentially, it's a gamble where the payoff is 20-30 times greater than > the cost, and the odds of losing are very small. Sorting is perhaps not a > bottleneck in most applications, but considering how much work has gone > into Python's sort (Timsort, etc; half of listobject.c is sort code), I > think it's interesting that list.sort() can be made essentially twice > faster by a relatively simple optimization. I would also add that Python > dictionaries already implement this optimization: they start out optimizing > based on the assumption that they'll only be seeing string keys, checking > to make sure that assumption holds as they go. If they see a non-string > key, they permanently switch over to the general implementation. So it's > really the same idea, except here it doesn't matter as much what type we're > dealing with, which is important, because lists are commonly used with lots > of different types, as opposed to dictionaries, which overwhelmingly > commonly use string keys, especially internally. (Correct me if I'm wrong > in any of the above). > > I got a lot of great feedback/input on this patch as I was writing it, but > after submitting it, I didn't hear much from anybody. (The reason I took so > long to post was because I wanted to wait until I had the chance to do the > benchmarks *right*). What do you all think? > > Thanks, > Elliot > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Mar 9 11:08:26 2017 From: barry at python.org (Barry Warsaw) Date: Thu, 9 Mar 2017 11:08:26 -0500 Subject: [Python-ideas] dict(default=int) References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> <70aca7b8-6d7b-a95c-5df0-24c438c1c0ad@trueblade.com> Message-ID: <20170309110826.66097a63@subdivisions.wooz.org> On Mar 08, 2017, at 05:49 PM, Eric V. Smith wrote: >If we really want to make defaultdict feel more "builtin" (and I don't see >any reason to do so), I'd suggest adding a factory function: > >dict.defaultdict(int) > >Similar in spirit to dict.fromkeys(), except of course returning a >defauldict, not a dict. Nice. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 801 bytes Desc: OpenPGP digital signature URL: From timothy.c.delaney at gmail.com Thu Mar 9 16:20:38 2017 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 10 Mar 2017 08:20:38 +1100 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <5BA5FFB7-ECE1-49AB-B2F6-F9650A3D6DCE@barrys-emacs.org> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> <5BA5FFB7-ECE1-49AB-B2F6-F9650A3D6DCE@barrys-emacs.org> Message-ID: On 9 March 2017 at 21:04, Barry Scott wrote: > It was not clear to me that you need to scan the list at the start to make > sure its homogeneous. Given that the type check is so cheap will it > slow the sort if you do the pointer check in the compare code? I am not > suggesting you run rich compare full fat on each compare. > Isn't there already always a scan of the iterable to build the keys array for sorting (even if no key keyword param is specified)? In which case adding the homogenity check there seems like it shouldn't add much overhead at all (I have to say that I was surprised with 10+% reductions in speed in some of the heterogenous TimSort tests for this reason). And could specific richcompares be refactored so there was a "we really know what the types are is, no need to check" version available to sort() (with the typechecking version available for general use/unoptimised sorting)? Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Mar 9 17:57:05 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 9 Mar 2017 14:57:05 -0800 Subject: [Python-ideas] dict(default=int) In-Reply-To: <20170309110826.66097a63@subdivisions.wooz.org> References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> <70aca7b8-6d7b-a95c-5df0-24c438c1c0ad@trueblade.com> <20170309110826.66097a63@subdivisions.wooz.org> Message-ID: >If we really want to make defaultdict feel more "builtin" (and I don't see > >any reason to do so), I'd suggest adding a factory function: > > > >dict.defaultdict(int) > > Nice. > I agree -- what about: dict.sorteddict() ?? make easy access to various built-in dict variations... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From spencerb21 at live.com Thu Mar 9 18:04:23 2017 From: spencerb21 at live.com (Spencer Brown) Date: Thu, 9 Mar 2017 23:04:23 +0000 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> <70aca7b8-6d7b-a95c-5df0-24c438c1c0ad@trueblade.com> <20170309110826.66097a63@subdivisions.wooz.org>, Message-ID: Might make more sense to be dict.default(int), that way it doesn't have redundant dict names. Only problem is then it might be a bit confusing, since you could do {1:2, 3:4}.default(int) and not get the values back. Maybe 'withdefault', and return a copy if called on the instance? From chris.barker at noaa.gov Thu Mar 9 18:19:36 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 9 Mar 2017 15:19:36 -0800 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <5BA5FFB7-ECE1-49AB-B2F6-F9650A3D6DCE@barrys-emacs.org> References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> <5BA5FFB7-ECE1-49AB-B2F6-F9650A3D6DCE@barrys-emacs.org> Message-ID: On Thu, Mar 9, 2017 at 2:04 AM, Barry Scott wrote: > It was not clear to me that you need to scan the list at the start to make > sure its homogeneous. Given that the type check is so cheap will it > slow the sort if you do the pointer check in the compare code? I am not > suggesting you run rich compare full fat on each compare. > I think you have a point here. IIUC, the current code makes no assumptions about type homogeneity. So it must do something generic at each compare. Perhaps that generic thing (of course, Elliot knows that is) does do a pointer compare, and then something smart and fast for some built-ins. but there is still a few steps: these are both the same type what type are they how do I compare them? do the compare These may well all be fast, but it's still a few steps, and have to be done O(n*log(n)) times IIUC, Elliot's patch does something like: First go through the whole list and do: What is the type of the first item (only once) How do I compare that type (only once) Is everything else in the list the same type ( O(n) ) Then the actual sort: - do the compare ( O(n*log(n)) ) So there is one operation done n times, and one done n*log(n) times, rather than 4 done n*log(n) times, if the constants are about the same, that saves 3n*log(n) - n operations, or -- if n is non-trivial, then n*3*log(n) operations so we go from 4*n*log(n) to n*log(n) about a 4 times speed up -- in theory. with all the other complications of computer performance, only profiling will tell you! (also, it's not 4 times on the whole sort -- just the compare part -- you still need to shuffle the values around, which presumably takes at last as long as each of the above operations) (not sure about all four of those steps, it may be only three -- but still a 3 time speed up) Now Barry's idea: Assume that the list is homogenous, and crap out if it turns out it's not: Start off: What is the type of the first item (only once) How do I compare that type (only once) Do the sort: Is the next item the correct type? O(n*log(n)) Do the compare ( O(n*log(n)) ) So we now have 2 operations to be run O(n*log(n)) -- so not as good at only one, but still better than the 3 or 4 of the fully general sort. And this would have less impact on the non-homogenous case. Though Elliot points out that this would be much harder to branch-predict, etc... so maybe not. Maybe worth trying out and profiling?? NOTE: I really have no idea what really goes in in that code -- so I may have this all wrong... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at lucidity.plus.com Thu Mar 9 19:29:09 2017 From: python at lucidity.plus.com (Erik) Date: Fri, 10 Mar 2017 00:29:09 +0000 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> <70aca7b8-6d7b-a95c-5df0-24c438c1c0ad@trueblade.com> <20170309110826.66097a63@subdivisions.wooz.org> Message-ID: <647b5e0f-9274-2b76-318a-dd910e121c29@lucidity.plus.com> On 09/03/17 23:04, Spencer Brown wrote: > Might make more sense to be dict.default(int), that way it doesn't > have redundant dict names. I thought that, too. > since you could do {1:2, 3:4}.default(int) Could you? Python 3.6.0 (default, Mar 9 2017, 00:43:06) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> type(dict()) >>> type({}) >>> type(dict) The thing bound to the name 'dict' is not the same as the object returned by _calling_ 'dict'. E. From rosuav at gmail.com Thu Mar 9 19:42:02 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 10 Mar 2017 11:42:02 +1100 Subject: [Python-ideas] dict(default=int) In-Reply-To: <647b5e0f-9274-2b76-318a-dd910e121c29@lucidity.plus.com> References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> <70aca7b8-6d7b-a95c-5df0-24c438c1c0ad@trueblade.com> <20170309110826.66097a63@subdivisions.wooz.org> <647b5e0f-9274-2b76-318a-dd910e121c29@lucidity.plus.com> Message-ID: On Fri, Mar 10, 2017 at 11:29 AM, Erik wrote: > On 09/03/17 23:04, Spencer Brown wrote: >> >> Might make more sense to be dict.default(int), that way it doesn't >> have redundant dict names. > > > I thought that, too. > >> since you could do {1:2, 3:4}.default(int) > > > Could you? > > Python 3.6.0 (default, Mar 9 2017, 00:43:06) > [GCC 5.4.0 20160609] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> type(dict()) > >>>> type({}) > >>>> type(dict) > > > The thing bound to the name 'dict' is not the same as the object returned by > _calling_ 'dict'. Yes, you could; it'd be a classmethod, like dict.fromkeys. IMO it should just ignore any instance argument - same as you see here: >>> dict.fromkeys(range(3)) {0: None, 1: None, 2: None} >>> {1:2,3:4}.fromkeys(range(3)) {0: None, 1: None, 2: None} ChrisA From python at lucidity.plus.com Thu Mar 9 20:18:21 2017 From: python at lucidity.plus.com (Erik) Date: Fri, 10 Mar 2017 01:18:21 +0000 Subject: [Python-ideas] Submitted a PR! In-Reply-To: References: Message-ID: <1ee74cf2-1470-25a5-0850-a5609ff39ef4@lucidity.plus.com> Hi. I may be way off-base here, but having scanned the patch I'm not sure I agree that it's the right way forward. What seems to be happening is that the homogeneity of the list is determined somehow (whether tracked with a hint or scanned just-in-time) and then a specific comparison function for a known subset of built-in types is selected if appropriate. I had assumed that there would be an "apples-to-apples" comparison function in the type structure and that the patch was simply tracking the list's homogeneity in order to enter a (generic) alternative loop to call that function over PyObject_RichCompare(). Why is that not the case? When a new C-level type is introduced (either a built-in or an extension module), why does the list object's code need to know about it in order to perform this optimisation? Why is there not a "tp_apple2apple" slot in the type structure which higher level functions (including the RichCompare() stuff - the first thing that function does is check the type of the objects anyway) can call if it determines that the two objects have the same type? Such a slot would also speed up "contains", "count", etc (for all classes) with no extra work, and no overhead of tracking or scanning the sequence's homogeneity. E. From elliot.gorokhovsky at gmail.com Thu Mar 9 21:55:24 2017 From: elliot.gorokhovsky at gmail.com (Elliot Gorokhovsky) Date: Fri, 10 Mar 2017 02:55:24 +0000 Subject: [Python-ideas] Submitted a PR! In-Reply-To: <1ee74cf2-1470-25a5-0850-a5609ff39ef4@lucidity.plus.com> References: <1ee74cf2-1470-25a5-0850-a5609ff39ef4@lucidity.plus.com> Message-ID: There is an "apple to apple" compare function. It's unsafe_object_compare. If you look at the pre-sort check, you will find that if the list is homogeneous but not of float, int, string, or tuple, it sets compare_funcs.key_richcompare = ob_type->tp_richcompare and sets compare_funcs.key_compare = unsafe_object_compare. The latter is a wrapper for the former, bypassing the type checks in PyObject_RichCompareBool. Examples of benchmarks that use this functionality include the non-latin string and unbounded int benchmarks. In the table at the bottom of my patch description, it's described as follows: Compare function for general homogeneous lists; just a wrapper for ob_type->tp_richcompare, which is stored by the pre-sort check at compare_funcs.key_richcompare. This yields modest optimization (neighbourhood of 10%), but we generally hope we can do better. Further, in the code, the comments describe it as follows: /* Homogeneous compare: safe for any two compareable objects of the same type. * (compare_funcs.key_richcompare is set to ob_type->tp_richcompare in the * pre-sort check.) */ Does that answer your question? On Thu, Mar 9, 2017 at 6:18 PM Erik wrote: > Hi. > > I may be way off-base here, but having scanned the patch I'm not sure I > agree that it's the right way forward. > > What seems to be happening is that the homogeneity of the list is > determined somehow (whether tracked with a hint or scanned just-in-time) > and then a specific comparison function for a known subset of built-in > types is selected if appropriate. > > I had assumed that there would be an "apples-to-apples" comparison > function in the type structure and that the patch was simply tracking > the list's homogeneity in order to enter a (generic) alternative loop to > call that function over PyObject_RichCompare(). > > Why is that not the case? When a new C-level type is introduced (either > a built-in or an extension module), why does the list object's code need > to know about it in order to perform this optimisation? > > Why is there not a "tp_apple2apple" slot in the type structure which > higher level functions (including the RichCompare() stuff - the first > thing that function does is check the type of the objects anyway) can > call if it determines that the two objects have the same type? > > Such a slot would also speed up "contains", "count", etc (for all > classes) with no extra work, and no overhead of tracking or scanning the > sequence's homogeneity. > > E. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.peters at gmail.com Thu Mar 9 23:43:45 2017 From: tim.peters at gmail.com (Tim Peters) Date: Thu, 9 Mar 2017 22:43:45 -0600 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: References: <20170306024554.GK5689@ando.pearwood.info> <6481ff57-a368-462f-01e9-3077092185c7@lucidity.plus.com> <20170308001814.GM5689@ando.pearwood.info> <0801f164-c8a2-c4cc-0c45-9bebbae1ec38@lucidity.plus.com> <20170308110744.GO5689@ando.pearwood.info> <6d2924a5-cde1-e733-3b58-41f7ab05daa9@lucidity.plus.com> <081FC1D7-5F3F-4B8C-98B1-D6396C3E34ED@barrys-emacs.org> <5BA5FFB7-ECE1-49AB-B2F6-F9650A3D6DCE@barrys-emacs.org> Message-ID: [Tim Delaney ] > Isn't there already always a scan of the iterable to build the keys array > for sorting (even if no key keyword param is specified)? No - `.sort()` is a list method, and as such has nothing to do with arbitrary iterables, just lists (perhaps you're thinking of the `sorted()` function?). If no `key=` argument is passed, the list guts itself is used (as-is) as the vector of keys. > In which case adding the homogenity check there seems like it shouldn't > add much overhead at all (I have to say that I was surprised with 10+% > reductions in speed in some of the heterogenous TimSort tests for this reason). Those are worst cases where the current sort does very few compares (like just N-1 for a list of length N). Because they do "amazingly" few compares, they're already "amazingly" fast. And they also do little data movement (e.g., none at all for /sort & =sort, and N//2 pointer swaps for \sort). Because of that any new O(N) overhead would make them significantly slower - unless the new overhead pays off by allowing a larger time saving than it costs.(as it does when the list is same-type). There is a "natural" place to insert "same type?" checks: the outer loop of the sort marches over the vector once, left to right, alternately identifying the next natural run, then possibly extending it and/or merging it into previous runs. The checks could be put there instead, but the code would be ugly and more invasive - I wouldn't even bother trying it. > And could specific richcompares be refactored so there was a "we really know > what the types are is, no need to check" version available to sort() (with > the typechecking version available for general use/unoptimised sorting)? They're not _just_ type-aware. For example, the special function for ints is specialized to integers that happen to fit in one (internal) "digit", and the special function for strings is specialized to those that happen to be stored in PyUnicode_1BYTE_KIND format. Adding such stuff to the public C API would be ... well, for a start, tedious ;-) From tjreedy at udel.edu Fri Mar 10 02:41:20 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 10 Mar 2017 02:41:20 -0500 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: On 3/2/2017 3:03 AM, Serhiy Storchaka wrote: > Function implemented in Python can have optional parameters with default > value. It also can accept arbitrary number of positional and keyword > arguments if use var-positional or var-keyword parameters (*args and > **kwargs). In other words, Python signature possibilities are already unusually complex. > But there is no way to declare an optional parameter that > don't have default value. ... [moving the following up] > I propose to add a new syntax for optional parameters. If the argument > corresponding to the optional parameter without default value is not > specified, the parameter takes no value. -1 Being able to do this would violate what I believe is the fundamental precondition for python-coded function bodies: all parameters are bound to an object (so that using a parameter name is never a NameError); all arguments are used exactly once in the binding process; the binding is done without ambiguity (or resort to disambiguation rules). Calls that prevent establishment of this precondition result in an exception. This precondition is normal in computing languages. I believe that all of the ~20 languages I have used over decades have had it. In any case, I believe it is important in understanding Python signatures and calls, and that there would need to be a strong reason to alter this precondition. (Stronger than I judge the one given here to be.) > Currently you need to use the sentinel idiom which binds a special object to a parameter, thus fulfilling the precondition. > for implementing this: > > _sentinel = object() > def get(store, key, default=_sentinel): > if store.exists(key): > return store.retrieve(key) > if default is _sentinel: > raise LookupError > else: > return default > > There are drawback of this: > > * Module's namespace is polluted with sentinel's variables. If one cares, one can change the internal reference to 'get._sentinel' and add get._sentinel = _sentinel; del _sentinel after the def (or package this in a decorator. > * You need to check for the sentinel before passing it to other function > by accident. This might be a feature. > * Possible name conflicts between sentinels for different functions of > the same module. Since None can be used as a sentinel for multiple functions, I don't understand the problem you are pointing to. > * Since the sentinel is accessible outside of the function, it possible > to pass it to the function. 1. Give it a more private name (___private___?) similar to a reserved name. 2. Hide it better (as a object attribute, for instance). > * help() of the function shows reprs of default values. "foo(bar= object at 0xb713c698>)" looks ugly. Someone suggested a subclass of object with str = repr that prints something like 'bar = '. I think functools would be the appropriate place for the class, predefined instance, and possibly a decorator. Make the instance an attribute of the class so it a) would have the same name both in the header and body, and b) would not be an attribute of user module or user functions. __ Terry Jan Reedy From k7hoven at gmail.com Fri Mar 10 07:12:29 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 10 Mar 2017 14:12:29 +0200 Subject: [Python-ideas] Exploiting type-homogeneity in list.sort() (again!) In-Reply-To: <20170306024554.GK5689@ando.pearwood.info> References: <20170306024554.GK5689@ando.pearwood.info> Message-ID: On Mon, Mar 6, 2017 at 4:45 AM, Steven D'Aprano wrote: > On Sun, Mar 05, 2017 at 07:19:43AM +0000, Elliot Gorokhovsky wrote: > >> You may remember seeing some messages on here about optimizing list.sort() >> by exploiting type-homogeneity: since comparing apples and oranges is >> uncommon (though possible, i.e. float to int), it pays off to check if the >> list is type-homogeneous > > I sometimes need to know if a list is homogenous, but unfortunately > checking large lists for a common type in pure Python is quote slow. > > Here is a radical thought... why don't lists track their common type > themselves? There's only a few methods which can add items: > > - append > - extend > - insert > - __setitem__ > I can also imagine other places where knowing those one or two bits of information about homogeneity might potentially allow speedups: converting lists or tuples to numpy arrays, min, max, sum etc. If this extra one or two bits of information were tracked, and the overhead of doing that was very small, then operations over the collections could benefit regardless of the complexity of the algorithm, so also O(n) operations. Too bad that one common thing done with lists ? iterating ? does not have obvious benefits from type homogeneity in CPython. ?Koos > Suppose we gave lists a read-only attrribute, __type_hint__, which > returns None for hetrogeneous lists and the type for homogeneous lists. Adding > an item to the list does as follows: > > - if the list is empty, adding an item sets __type_hint__ to type(item); > - if the list is not empty, adding an item tests whether type(item) > is identical to (not a subclass) of __type_hint__, and if not, sets > __type_hint__ to None; > - removing an item doesn't change the __type_hint__ unless the list > becomes empty, in which case it is reset to None; > - if the internal allocated space of the list shrinks, that triggers > a recalculation of the __type_hint__ if it is currently None. > > (There's no need to recalculate the hint if it is not None.) > > Optional: doing a list.sort() could also recalculate the hint. > > > The effect will be: > > - if __type_hint__ is a type object, then you can be sure that > the list is homogeneous; > > - if the __type_hint__ is None, then it might still be > homogeneous, but it isn't safe to assume so. > > Not only could sorting take advantage of the type hint without needing > to do a separate O(N) scan of the list, but so could other code. I know > I would be interested in using this. I have a fair amount of code that > has to track the type of any items seen in a list, and swap to a "type > agnostic but slow" version if the list is not homogeneous. I could > probably replace that with some variation of: > > if thelist.__type_hint__ is None: > process_slow(thelist) > else: > process_fast(thelist) > > > At the very least, I'd be interested in experimenting with this. > > Thoughts? > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- + Koos Zevenhoven + http://twitter.com/k7hoven + From toddrjen at gmail.com Fri Mar 10 18:06:40 2017 From: toddrjen at gmail.com (Todd) Date: Fri, 10 Mar 2017 18:06:40 -0500 Subject: [Python-ideas] Optional parameters without default value In-Reply-To: References: Message-ID: On Mar 10, 2017 02:42, "Terry Reedy" wrote: On 3/2/2017 3:03 AM, Serhiy Storchaka wrote: > Function implemented in Python can have optional parameters with default > value. It also can accept arbitrary number of positional and keyword > arguments if use var-positional or var-keyword parameters (*args and > **kwargs). > In other words, Python signature possibilities are already unusually complex. But there is no way to declare an optional parameter that > don't have default value. > ... [moving the following up] > I propose to add a new syntax for optional parameters. If the argument > corresponding to the optional parameter without default value is not > specified, the parameter takes no value. -1 Being able to do this would violate what I believe is the fundamental precondition for python-coded function bodies: all parameters are bound to an object (so that using a parameter name is never a NameError); all arguments are used exactly once in the binding process; the binding is done without ambiguity (or resort to disambiguation rules). Calls that prevent establishment of this precondition result in an exception. This precondition is normal in computing languages. I believe that all of the ~20 languages I have used over decades have had it. In any case, I believe it is important in understanding Python signatures and calls, and that there would need to be a strong reason to alter this precondition. (Stronger than I judge the one given here to be.) -1 also Having used a language extensively that does not enforce this precondition (MATLAB), I agree that not being able count on arguments existing makes handing function arguments much more difficult. -------------- next part -------------- An HTML attachment was scrubbed... URL: From markusmeskanen at gmail.com Tue Mar 14 03:49:24 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Tue, 14 Mar 2017 09:49:24 +0200 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: Message-ID: Hi All, I find importing defaultdict from collections to be clunky I don't. And if this really turns out to be an issue, why not expose defaultdict to built-ins instead of messing with dict class? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cognetta.marco at gmail.com Tue Mar 14 05:38:06 2017 From: cognetta.marco at gmail.com (Marco Cognetta) Date: Tue, 14 Mar 2017 18:38:06 +0900 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class Message-ID: Hi all, I have been using the Counter class recently and came across several things that I was hoping to get feedback on. (This is my first time mailing this list, so any advice is greatly appreciated) 1) Addition of a Counter.least_common method: This would add a method to Counter that is basically the opposite of the pre-existing Counter.most_common method. In this case, the least common elements are considered the elements in c with the lowest (non-zero) frequency. This was addressed in https://bugs.python.org/issue16994, but it was never resolved and is still open (since Jan. 2013). This is a small change, but I think that it is useful to include in the stdlib. I have written a patch for this, but have not submitted a PR yet. It can be found at https://github.com/mcognetta/cpython/tree/collections_counter_least_common 2) Undefined behavior when using Counter.most_common: Consider the case c = Counter([1, 1, 2, 2, 3, 3, 'a', 'a', 'b', 'b', 'c', 'c']), when calling c.most_common(3), there are more than 3 "most common" elements in c and c.most_common(3) will not always return the same list, since there is no defined total order on the elements in c. Should this be mentioned in the documentation? Additionally, perhaps there is room for a method that produces all of the elements with the n highest frequencies in order of their frequencies. For example, in the case of c = Counter([1, 1, 1, 2, 2, 3, 3, 4, 4, 5]) c.aforementioned_method(2) would return [(1, 3), (2, 2), (3, 2), (4, 2)] since the two highest frequencies are 3 and 2. 3) Addition of a collections.Frequency or collections.Proportion class derived from collections.Counter: This is sort of discussed in https://bugs.python.org/issue25478. The idea behind this would be a dictionary that, instead of returning the integer frequency of an element, would return it's proportional representation in the iterable. So, for example f = Frequency('aabbcc'), f would hold Frequency({'a': 0.3333333333333333, 'b': 0.3333333333333333, 'c': 0.3333333333333333}). To address >The pitfall I imagine here is that if you continue adding elements after normalize() is called, the >results will be nonsensical. from the issue, this would not be a problem because we could just build it entirely on top of a Counter, keep a count of the total number of elements in the Counter, and just divide by that every time we output or return the object or any of its elements. I think that this would be a pretty useful addition especially for code related to discrete probability distributions (which is what motivated this in the first place). Thanks in advance, -Marco From wes.turner at gmail.com Tue Mar 14 09:26:08 2017 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 14 Mar 2017 08:26:08 -0500 Subject: [Python-ideas] dict(default=int) In-Reply-To: References: <7d2eb6ec-e662-7bbe-2b2d-3d3a31d816f2@brice.xyz> <70aca7b8-6d7b-a95c-5df0-24c438c1c0ad@trueblade.com> <20170309110826.66097a63@subdivisions.wooz.org> Message-ID: class OrderedDefaultDict, __missing__(), kwargs.pop('default_factory') - Src: https://gist.github.com/westurner/be22dba8110be099a35e#file-ordereddefaultdict-py >From https://groups.google.com/d/msg/python-ideas/9bpR8-bNC6o/tQ92g7wLGAAJ : On Fri, Oct 16, 2015 at 9:08 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > Actually, forget all that; it's even simpler. > > At least in recent 3.x, the only thing wrong with inheriting from both > types, assuming you put OrderedDict first, is the __init__ signature. So: > > class OrderedDefaultDict(OrderedDict, defaultdict): > def __init__(self, default_factory=None, *a, **kw): > OrderedDict.__init__(self, *a, **kw) > self.default_factory = default_factory > > More importantly, because __missing__ support is built into dict, despite > the confusing docs for defaultdict, you don't really need defaultdict at > all here: > > class OrderedDefaultDict(OrderedDict): > def __init__(self, default_factory=None, *a, **kw): > OrderedDict.__init__(self, *a, **kw) > self.default_factory = default_factory > def __missing__(self, key): > self[key] = value = default_factory() > return value > > And either of these should work with 2.5+ (according to > https://docs.python.org/2/library/stdtypes.html#dict that's when > dict.__missing__ was added). > ... This seems to keep a consistent __init__ signature with OrderedDict (by > .pop()-ing 'default_factory' from kwargs instead of specifying as a > positionalkwarg): > > class OrderedDefaultDict(OrderedDict): > def __init__(self, *a, **kw): > default_factory = kw.pop('default_factory', self.__class__) > OrderedDict.__init__(self, *a, **kw) > self.default_factory = default_factory > def __missing__(self, key): > self[key] = value = self.default_factory() > return value > > I've added a few tests (as well as to_json, and _repr_json_ > https://gist.github.com/westurner/be22dba8110be099a35e/ > c1a3a7394e401d4742df0617900bde6ab2643300#file-ordereddefaultdict-py-L120- > L122 > > (Without this fix, > json.loads(output_json, object_pairs_hook=OrderedDefaultDict) > doesn't seem to work). On Thu, Mar 9, 2017 at 4:57 PM, Chris Barker wrote: > > > >If we really want to make defaultdict feel more "builtin" (and I don't see >> >any reason to do so), I'd suggest adding a factory function: >> > >> >dict.defaultdict(int) >> > > >> Nice. >> > > I agree -- what about: > > dict.sorteddict() ?? > > make easy access to various built-in dict variations... > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Mar 14 11:52:52 2017 From: mertz at gnosis.cx (David Mertz) Date: Tue, 14 Mar 2017 08:52:52 -0700 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class In-Reply-To: References: Message-ID: On Tue, Mar 14, 2017 at 2:38 AM, Marco Cognetta wrote: > 1) Addition of a Counter.least_common method: > This was addressed in https://bugs.python.org/issue16994, but it was > never resolved and is still open (since Jan. 2013). This is a small > change, but I think that it is useful to include in the stdlib. -1 on adding this. I read the issue, and do not find a convincing use case that is common enough to merit a new method. As some people noted in the issue, the "least common" is really the infinitely many keys not in the collection at all. But I can imagine an occasional need to, e.g. "find outliers." However, that is not hard to spell as `mycounter.most_common()[-1*N:]`. Or if your program does this often, write a utility function `find_outliers(...)` 2) Undefined behavior when using Counter.most_common: > 'c', 'c']), when calling c.most_common(3), there are more than 3 "most > common" elements in c and c.most_common(3) will not always return the > same list, since there is no defined total order on the elements in c. > Should this be mentioned in the documentation? > +1. I'd definitely support adding this point to the documentation. > Additionally, perhaps there is room for a method that produces all of > the elements with the n highest frequencies in order of their > frequencies. For example, in the case of c = Counter([1, 1, 1, 2, 2, > 3, 3, 4, 4, 5]) c.aforementioned_method(2) would return [(1, 3), (2, > 2), (3, 2), (4, 2)] since the two highest frequencies are 3 and 2. > -0 on this. I can see wanting this, but I'm not sure often enough to add to the promise of the class. The utility function to do this would be somewhat less trivial to write than `find_outliers(..)` but not enormously hard. I think I'd be +0 on adding a recipe to the documentation for a utility function. > 3) Addition of a collections.Frequency or collections.Proportion class > derived from collections.Counter: > > This is sort of discussed in https://bugs.python.org/issue25478. > The idea behind this would be a dictionary that, instead of returning > the integer frequency of an element, would return it's proportional > representation in the iterable. One could write a subclass easily enough. The essential feature in my mind would be to keep an attributed Counter.total around to perform the normalization. I'm +1 on adding that to collections.Counter itself. I'm not sure if this would be better as an attribute kept directly or as a property that called `sum(self.values())` when accessed. I believe that having `mycounter.total` would provide the right normalization in a clean API, and also expose easy access to other questions one would naturally ask (e.g. "How many observations were made?") -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cognetta.marco at gmail.com Wed Mar 15 05:14:01 2017 From: cognetta.marco at gmail.com (Marco Cognetta) Date: Wed, 15 Mar 2017 18:14:01 +0900 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class In-Reply-To: References: Message-ID: Thanks for the reply. I will add it to the documentation and will work on a use case and recipe for the n highest frequency problem. As for >I'm not sure if this would be better as an attribute kept directly or as a property that called `sum(self.values())` when accessed. I believe that having `mycounter.total` would provide the right normalization in a clean API, and also expose easy access to other questions one would naturally ask (e.g. "How many observations were made?") I am not quite sure what you mean, especially the observations part. For the attribute part, do you mean we would just have a hidden class variable like num_values that was incremented or decremented whenever something is added or removed, so we have O(1) size queries instead of O(n) (where n is the number of keys)? Then there could be a method like 'normalize' that printed all elements with their frequencies divided by the total count? On Wed, Mar 15, 2017 at 12:52 AM, David Mertz wrote: > On Tue, Mar 14, 2017 at 2:38 AM, Marco Cognetta > wrote: >> >> 1) Addition of a Counter.least_common method: >> This was addressed in https://bugs.python.org/issue16994, but it was >> never resolved and is still open (since Jan. 2013). This is a small >> change, but I think that it is useful to include in the stdlib. > > > -1 on adding this. I read the issue, and do not find a convincing use case > that is common enough to merit a new method. As some people noted in the > issue, the "least common" is really the infinitely many keys not in the > collection at all. > > But I can imagine an occasional need to, e.g. "find outliers." However, > that is not hard to spell as `mycounter.most_common()[-1*N:]`. Or if your > program does this often, write a utility function `find_outliers(...)` > >> 2) Undefined behavior when using Counter.most_common: >> 'c', 'c']), when calling c.most_common(3), there are more than 3 "most >> common" elements in c and c.most_common(3) will not always return the >> same list, since there is no defined total order on the elements in c. >> >> Should this be mentioned in the documentation? > > > +1. I'd definitely support adding this point to the documentation. > >> >> Additionally, perhaps there is room for a method that produces all of >> the elements with the n highest frequencies in order of their >> frequencies. For example, in the case of c = Counter([1, 1, 1, 2, 2, >> 3, 3, 4, 4, 5]) c.aforementioned_method(2) would return [(1, 3), (2, >> 2), (3, 2), (4, 2)] since the two highest frequencies are 3 and 2. > > > -0 on this. I can see wanting this, but I'm not sure often enough to add to > the promise of the class. The utility function to do this would be somewhat > less trivial to write than `find_outliers(..)` but not enormously hard. I > think I'd be +0 on adding a recipe to the documentation for a utility > function. > >> >> 3) Addition of a collections.Frequency or collections.Proportion class >> derived from collections.Counter: >> >> This is sort of discussed in https://bugs.python.org/issue25478. >> The idea behind this would be a dictionary that, instead of returning >> the integer frequency of an element, would return it's proportional >> representation in the iterable. > > > One could write a subclass easily enough. The essential feature in my mind > would be to keep an attributed Counter.total around to perform the > normalization. I'm +1 on adding that to collections.Counter itself. > > I'm not sure if this would be better as an attribute kept directly or as a > property that called `sum(self.values())` when accessed. I believe that > having `mycounter.total` would provide the right normalization in a clean > API, and also expose easy access to other questions one would naturally ask > (e.g. "How many observations were made?") > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. From mertz at gnosis.cx Wed Mar 15 12:59:54 2017 From: mertz at gnosis.cx (David Mertz) Date: Wed, 15 Mar 2017 09:59:54 -0700 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class In-Reply-To: References: Message-ID: I added a couple comments at https://bugs.python.org/issue25478 about what I mean. Raymond replied as well. So it feels like we should use that thread there. In a scientific context I often think of a Counter as a way to count observations of a categorical variable. "I saw 3 As, then 7 Bs, etc". My proposed attribute/property would let you ask `num_observations = mycounter.total`. As I comment on the issue, we could ask for frequency by spelling it like: freqs = {k:v/c.total for k, v in c.items()} And yes, the question in my mind is whether we should define: @property def total(self): return sum(self.values()) Or keep a running count in all the mutator methods. Probably the property, actually, since we don't want to slow down all the existing code. I don't really want .normalize() as a method, but if we did, the spelling should be `.normalized()` instead. This reverse/reversed and sort/sorted that we already have to reflect mutators versus copies. But notice that copying is almost always in functions rather than in methods. The thing is, once something is normalized, it really IS NOT a Counter anymore. My dict comprehension emphasizes that by creating a plain dictionary. Even if .normalized() existed, I don't believe it should return a Counter object (instead either a plain dict, or some new specialized subclass like Frequencies). On Wed, Mar 15, 2017 at 2:14 AM, Marco Cognetta wrote: > Thanks for the reply. I will add it to the documentation and will work > on a use case and recipe for the n highest frequency problem. > > As for > > >I'm not sure if this would be better as an attribute kept directly or as > a property that called `sum(self.values())` when accessed. I believe that > having `mycounter.total` would provide the right normalization in a clean > API, and also expose easy access to other questions one would naturally ask > (e.g. "How many observations were made?") > > I am not quite sure what you mean, especially the observations part. > For the attribute part, do you mean we would just have a hidden class > variable like num_values that was incremented or decremented whenever > something is added or removed, so we have O(1) size queries instead of > O(n) (where n is the number of keys)? Then there could be a method > like 'normalize' that printed all elements with their frequencies > divided by the total count? > > On Wed, Mar 15, 2017 at 12:52 AM, David Mertz wrote: > > On Tue, Mar 14, 2017 at 2:38 AM, Marco Cognetta < > cognetta.marco at gmail.com> > > wrote: > >> > >> 1) Addition of a Counter.least_common method: > >> This was addressed in https://bugs.python.org/issue16994, but it was > >> never resolved and is still open (since Jan. 2013). This is a small > >> change, but I think that it is useful to include in the stdlib. > > > > > > -1 on adding this. I read the issue, and do not find a convincing use > case > > that is common enough to merit a new method. As some people noted in the > > issue, the "least common" is really the infinitely many keys not in the > > collection at all. > > > > But I can imagine an occasional need to, e.g. "find outliers." However, > > that is not hard to spell as `mycounter.most_common()[-1*N:]`. Or if > your > > program does this often, write a utility function `find_outliers(...)` > > > >> 2) Undefined behavior when using Counter.most_common: > >> 'c', 'c']), when calling c.most_common(3), there are more than 3 "most > >> common" elements in c and c.most_common(3) will not always return the > >> same list, since there is no defined total order on the elements in c. > >> > >> Should this be mentioned in the documentation? > > > > > > +1. I'd definitely support adding this point to the documentation. > > > >> > >> Additionally, perhaps there is room for a method that produces all of > >> the elements with the n highest frequencies in order of their > >> frequencies. For example, in the case of c = Counter([1, 1, 1, 2, 2, > >> 3, 3, 4, 4, 5]) c.aforementioned_method(2) would return [(1, 3), (2, > >> 2), (3, 2), (4, 2)] since the two highest frequencies are 3 and 2. > > > > > > -0 on this. I can see wanting this, but I'm not sure often enough to > add to > > the promise of the class. The utility function to do this would be > somewhat > > less trivial to write than `find_outliers(..)` but not enormously hard. > I > > think I'd be +0 on adding a recipe to the documentation for a utility > > function. > > > >> > >> 3) Addition of a collections.Frequency or collections.Proportion class > >> derived from collections.Counter: > >> > >> This is sort of discussed in https://bugs.python.org/issue25478. > >> The idea behind this would be a dictionary that, instead of returning > >> the integer frequency of an element, would return it's proportional > >> representation in the iterable. > > > > > > One could write a subclass easily enough. The essential feature in my > mind > > would be to keep an attributed Counter.total around to perform the > > normalization. I'm +1 on adding that to collections.Counter itself. > > > > I'm not sure if this would be better as an attribute kept directly or as > a > > property that called `sum(self.values())` when accessed. I believe that > > having `mycounter.total` would provide the right normalization in a clean > > API, and also expose easy access to other questions one would naturally > ask > > (e.g. "How many observations were made?") > > > > > > > > -- > > Keeping medicines from the bloodstreams of the sick; food > > from the bellies of the hungry; books from the hands of the > > uneducated; technology from the underdeveloped; and putting > > advocates of freedom in prisons. Intellectual property is > > to the 21st century what the slave trade was to the 16th. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Mar 15 13:39:58 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Mar 2017 04:39:58 +1100 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class In-Reply-To: References: Message-ID: <20170315173956.GW5689@ando.pearwood.info> On Tue, Mar 14, 2017 at 08:52:52AM -0700, David Mertz wrote: > But I can imagine an occasional need to, e.g. "find outliers." However, > that is not hard to spell as `mycounter.most_common()[-1*N:]`. Or if your > program does this often, write a utility function `find_outliers(...)` That's not how you find outliers :-) Just because a data point is uncommon doesn't mean it is an outlier. I don't think there's any good reason to want to find the "least common" values in a statistics context, but there might be other use-cases for it. For example, suppose we are interested in the *least* popular products being sold: Counter(order.item for order in orders) We can get the best selling products easily, but not the duds that don't sell much at all. However, the problem is that what we really need to see is the items that don't sell at all (count=0), and they won't show up! So I think that this is not actually a useful feature. > 2) Undefined behavior when using Counter.most_common: > > 'c', 'c']), when calling c.most_common(3), there are more than 3 "most > > common" elements in c and c.most_common(3) will not always return the > > same list, since there is no defined total order on the elements in c. > > > Should this be mentioned in the documentation? > > > > +1. I'd definitely support adding this point to the documentation. The docs already say that "Elements with equal counts are ordered arbitrarily" so I'm not sure what more is needed. -- Steve From mertz at gnosis.cx Wed Mar 15 14:06:20 2017 From: mertz at gnosis.cx (David Mertz) Date: Wed, 15 Mar 2017 11:06:20 -0700 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class In-Reply-To: <20170315173956.GW5689@ando.pearwood.info> References: <20170315173956.GW5689@ando.pearwood.info> Message-ID: On Wed, Mar 15, 2017 at 10:39 AM, Steven D'Aprano wrote: > > But I can imagine an occasional need to, e.g. "find outliers." However, > > that is not hard to spell as `mycounter.most_common()[-1*N:]`. Or if > your > > program does this often, write a utility function `find_outliers(...)` > > That's not how you find outliers :-) > Just because a data point is uncommon doesn't mean it is an outlier. > That's kinda *by definition* what an outlier is in categorical data! E.g.: In [1]: from glob import glob In [2]: from collections import Counter In [3]: names = Counter() In [4]: for fname in glob('babynames/yob*.txt'): ...: for line in open(fname): ...: name, sex, num = line.strip().split(',') ...: num = int(num) ...: names[name] += num ...: In [5]: names.most_common(3) Out[5]: [('James', 5086540), ('John', 5073452), ('Robert', 4795444)] In [6]: rare_names = names.most_common()[-3:] In [7]: rare_names Out[7]: [('Zyerre', 5), ('Zylas', 5), ('Zytavion', 5)] In [8]: sum(names.values()) # nicer would be `names.total` Out[8]: 326086290 This isn't exactly statistics, but it's like your product example. There are infinitely many random strings that occurred zero times among US births. But a "rare name" is one that occurred at least once, not one of these zero-occurring possible strings. I realize from my example, however, that I'm probably more interested in the actual uncommonality, not the specific `.least_common()`. I.e. I'd like to know which names occurred fewer than 10 times... but I don't know how many items that will include. Or as a percentage, which names occur in fewer than 0.01% of births? I don't think there's any good reason to want to find the "least common" > values in a statistics context, but there might be other use-cases for > it. For example, suppose we are interested in the *least* popular > products being sold: > > Counter(order.item for order in orders) > > > We can get the best selling products easily, but not the duds that don't > sell much at all. > > However, the problem is that what we really need to see is the items > that don't sell at all (count=0), and they won't show up! So I think > that this is not actually a useful feature. > > > > 2) Undefined behavior when using Counter.most_common: > > > 'c', 'c']), when calling c.most_common(3), there are more than 3 "most > > > common" elements in c and c.most_common(3) will not always return the > > > same list, since there is no defined total order on the elements in c. > > > > > Should this be mentioned in the documentation? > > > > > > > +1. I'd definitely support adding this point to the documentation. > > The docs already say that "Elements with equal counts are ordered > arbitrarily" so I'm not sure what more is needed. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Wed Mar 15 14:14:39 2017 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Wed, 15 Mar 2017 11:14:39 -0700 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class In-Reply-To: References: <20170315173956.GW5689@ando.pearwood.info> Message-ID: <58C9848F.8040503@brenbarn.net> On 2017-03-15 11:06, David Mertz wrote: > Just because a data point is uncommon doesn't mean it is an outlier. > > > That's kinda *by definition* what an outlier is in categorical data! Not really. Or rather, it depends what you mean by "uncommon". But this thread is about adding "least_common", and just because a data point is among the least frequent doesn't mean it's an outlier. You explained why yourself: > I realize from my example, however, that I'm probably more interested in the actual uncommonality, not the specific `.least_common()`. Exactly. If you have one data point that occurs once, another that occurs twice, another that occurs three times, and so on up to 10, then the "least common" one (or two or three) isn't an outlier. To be an outlier, it would have to be "much less common than the rest". That is, what matters is not the frequency rank but the magnitude of the separation in frequency between the outliers and the nonoutliers. But that's a much subtler notion than just "least common". -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From mertz at gnosis.cx Wed Mar 15 15:38:40 2017 From: mertz at gnosis.cx (David Mertz) Date: Wed, 15 Mar 2017 12:38:40 -0700 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class In-Reply-To: <58C9848F.8040503@brenbarn.net> References: <20170315173956.GW5689@ando.pearwood.info> <58C9848F.8040503@brenbarn.net> Message-ID: On Wed, Mar 15, 2017 at 11:14 AM, Brendan Barnwell wrote: > Exactly. If you have one data point that occurs once, another that occurs > twice, another that occurs three times, and so on up to 10, then the "least > common" one (or two or three) isn't an outlier. To be an outlier, it would > have to be "much less common than the rest". That is, what matters is not > the frequency rank but the magnitude of the separation in frequency between > the outliers and the nonoutliers. But that's a much subtler notion than > just "least common". OK. Fair enough. Although exactly what separation in frequencies makes an outlier is pretty fuzzy, especially in large samples where there are likely to be no/few gaps at all per se. It does tend to convince me that what we need is a more specialized class in the `statistics` module. In my large babynames dataset (a common one available from US Census), the distribution in a linear scale is basically a vertical line followed by a horizontal line. It's much starker than a Zipf distribution. On a semilog scale, it has some choppiness in the tail, where you might define some as "outliers" but it's not obvious what that cutoff would be. x = range(len(names)) y = [t[1] for t in names.most_common()] plt.plot(x, y) plt.title("Baby name frequencies USA 1880-2011") plt.semilogy() -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Baby-name-freq.png Type: image/png Size: 1205 bytes Desc: not available URL: From steve at pearwood.info Wed Mar 15 20:34:21 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 16 Mar 2017 11:34:21 +1100 Subject: [Python-ideas] Additions to collections.Counter and a Counter derived class In-Reply-To: References: <20170315173956.GW5689@ando.pearwood.info> Message-ID: <20170316003418.GX5689@ando.pearwood.info> On Wed, Mar 15, 2017 at 11:06:20AM -0700, David Mertz wrote: > On Wed, Mar 15, 2017 at 10:39 AM, Steven D'Aprano > wrote: > > > > But I can imagine an occasional need to, e.g. "find outliers." However, > > > that is not hard to spell as `mycounter.most_common()[-1*N:]`. Or if > > your > > > program does this often, write a utility function `find_outliers(...)` > > > > That's not how you find outliers :-) > > Just because a data point is uncommon doesn't mean it is an outlier. > > > > That's kinda *by definition* what an outlier is in categorical data! [...] > This isn't exactly statistics, but it's like your product example. There > are infinitely many random strings that occurred zero times among US > births. But a "rare name" is one that occurred at least once, not one of > these zero-occurring possible strings. I'm not sure that "outlier" is defined for non-numeric data, or at least not formally defined. You'd need a definition of central location (which would be the mode) and a definition of spread, and I'm not sure how you would measure spread for categorical data. What's the spread of this data? ["Jack", "Jack", "Jill", "Jack"] The mode is clearly "Jack", but beyond that I'm not sure what can be said except to give the frequencies themselves. One commonly used definition of outlier (due to John Tukey) is: - divide your data into four equal quarters; - the points between each quarter are known as quartiles, and there are three of them: Q1, Q2 (the median), Q3; - define the Interquartile Range IQR = Q3 - Q2; - define lower and upper fences as Q1 - 1.5*IQR and Q3 + 1.5*IQR; - anything not between the lower and upper fences is an outlier. Or to be precise, a *suspected* outlier, since for very long tailed distributions, rare values are to be expected and should not be discarded without good reason. If your data is Gaussian, that corresponds to discarding roughly 1% of the most extreme values. > I realize from my example, however, that I'm probably more interested in > the actual uncommonality, not the specific `.least_common()`. I.e. I'd > like to know which names occurred fewer than 10 times... but I don't know > how many items that will include. Or as a percentage, which names occur in > fewer than 0.01% of births? Indeed. While the frequencies themselves are useful, the least_common(count) (by analogy with Counter.most_common) is not so useful. -- Steve From joshringuk at gmail.com Mon Mar 20 06:40:34 2017 From: joshringuk at gmail.com (Josh Ring) Date: Mon, 20 Mar 2017 10:40:34 +0000 Subject: [Python-ideas] Multi threaded computation, thread local storage reducing false sharing, read only access to globals and a paused GIL to allow parallel scaling reads Message-ID: Hi My name is Josh Ring, I am interested in raw compute performance from C modules called from python. Most of which are single threaded (eg Numpy, Scipy etc) Some things are sensible with many threads: 1. Read only global state can avoid locks entirely in multithreaded code, this is to avoid cache line invalidations killing scaling >2-4 threads. 2. Incr/decr can be paused when entering the parallel region to avoid invalidating caches of objects, readonly access makes this safe. 2. Locality of memory, so using thread local stack by default and heap allocations bound per thread this is essential to scale >4 threads and with NUMA server systems. 3. Leaving the GIL intact for single threaded code to do the "cleanup stage" of temporaries after parallel computation has finished. - I liked the approach of a "parallel region", where data does not need to be pickled, and can directly read-only access shared memory. - If global state is unchangeable from a threaded region we can avoid many gotchas and races, leaving the GIL alone, almost. - If reference counting can be "paused" during the parallel region we can avoid cache invalidation from multiple threads, due to incr/decr, which limits scaling with more threads, this is evident even with 2 threads. - Thread local storage is the default and only option, avoiding clunky "threading.local()" storage classes, "thread bound" heap allocations would also be a good thing to increase efficiency and reduce "false sharing". https://en.wikipedia.org/wiki/False_sharing - Implement the parallel region using a function with a decorator akin to openMP? The function then defines the scope for the local variables and the start and end of parallel region, when to return etc in a straightforward manner. - By default, the objects returned from the parallel region return into separate objects (avoiding GIL contention), these temporary objects are then merged into a list once control is returned to just a single thread. - Objects marked @thread_shared have their state merged from the thread local copies once execution from all the threads has finished. This could be made intelligent if some index is provided to put the entries in the right place in a list/dict etc. Thoughts? This proposal is borrowing several ideas from Pyparallel, but without the focus on windows only and IO. It is more focused on accelerating raw compute performance and is tailored for high performance computing for instance at CERN etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien at palard.fr Tue Mar 21 18:05:31 2017 From: julien at palard.fr (Julien Palard) Date: Tue, 21 Mar 2017 18:05:31 -0400 Subject: [Python-ideas] PEP: Python Documentation Translations Message-ID: Hi, Here is the follow-up to the "Translated Python documentation" thread on python-dev [1]_, as a PEP describing the steps to make Python Documentation Translation official on accessible on docs.python.org. .. [1] [Python-Dev] Translated Python documentation (https://mail.python.org/pipermail/python-dev/2017-February/147416.html) You can also read it online at https://github.com/JulienPalard/peps/blob/master/pep-0xxx.rst ============================== PEP: XXX Title: Python Documentation Translations Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner , Inada Naoki , Julien Palard Status: Draft Type: Process Created: 04-Mar-2017 Post-History: Abstract ======== The intent of this PEP is to make existing translations of the Python Documentation more accessible and discoverable. By doing so, attracting and motivating new translators and new translations. Translated documentation will be hosted on python.org. Examples of two active translation teams: * http://docs.python.org/fr/: French * http://docs.python.org/jp/: Japanese http://docs.python.org/en/ will redirect to http://docs.python.org/. Sources of translated documentation will be hosted in the Python Documentation organization on GitHub: https://github.com/python-docs/. Contributors will have to sign the Python Contributor Agreement (CLA) and the license will be the PSF License. Motivation ========== On the french ``#python-fr`` IRC channel on freenode, it's not rare to meet people who don't speak english and so are unable to read the Python official documentation. Python wants to be widely available, to all users, in any language: that's also why Python 3 now allows any non-ASCII identifiers: https://www.python.org/dev/peps/pep-3131/#rationale There are a least 3 groups of people who are translating the Python documentation in their mother language (french [16]_ [17]_ [18]_, japanese [19]_ [20]_, spanish [21]_), even though their translation are not visible on d.p.o. Other less visible and less organized groups are also translating in their mother language, we heard of Russian, Chinese, Korean, maybe some others we didn't found yet. This PEP defines rules to move translations on docs.python.org so they can easily be found by developers, newcomers and potential translators. The Japanese team currently (March 2017) translated ~80% of the documentation, french team ~20%. French translation went from 6% to 23% in 2016 [13]_ with 7 contributors [14]_, proving a translation team can be faster than documentation mutates. Quoting Xiang Zhang about Chinese translations: I have seen several groups trying to translate part of our official doc. But their efforts are disperse and quickly become lost because they are not organized to work towards a single common result and their results are hold anywhere on the Web and hard to find. An official one could help ease the pain. Rationale ========= Translation ----------- Issue tracker ''''''''''''' Considering that issues opened about translations may be written in the translation language, which can be considered noise but at least is inconsistent, issues should be placed outside `bugs.python.org `_ (b.p.o). As all translation must have their own github project (see `Repository for Po Files`_), they must use the associated github issue tracker. Considering the noise induced by translation issues redacted in any languages which may beyond every warnings land in b.p.o, triage will have to be done. Considering that translations already exist and are not actually a source of noise in b.p.o, an unmanageable amount of work is not to be expected. Considering that Xiang Zhang and Victor Stinner are already triaging, and Julien Palard is willing to help on this task, noise on b.p.o is not to be expected. Also, language team coordinators (see `Language Team`_) should help triaging b.p.o by properly indicating, in the issue author language if needed, the right issue tracker. Branches '''''''' Translation teams should focus on last stable versions, and use tools (scripts, translation memory, ?) to automatically translate what is done in one branch to other branches. .. note:: Translation memories are a kind of database of previously translated paragraphs, even removed ones. See also `Sphinx Internationalization `_. The three stable branches will be translated [12]_: 2.7, 3.5, and 3.6. The scripts to build the documentation of older branches have to be modified to support translation [12]_, whereas these branches now only accept security-only fixes. The development branch (master) should have a lower translation priority than stable branches. But docsbuild-scripts should build it anyway so it is possible for a team to work on it to be ready for the next release. Hosting ------- Domain Name, Content negociation and URL '''''''''''''''''''''''''''''''''''''''' Different translation can be told appart by changing one of: Country Code Top Level Domain (CCTLD), path segment, subdomain, or by content negociation. Buying a CCTLD for each translations is expensive, time-consuming, and sometimes almost impossible when already registered, this solution should be avoided. Using subdomains like "es.docs.python.org" or "docs.es.python.org" is possible but confusing ("is it `es.docs.python.org` or `docs.es.python.org`?"). Hyphens in subdomains like `pt-br.doc.python.org` in uncommon and SEOMoz [23]_ correlated the presence of hyphens as a negative factor. Usage of underscores in subdomain is prohibited by the RFC1123 [24]_, section 2.1. Finally using subdomains means creating TLS certificates for each languages, which is more maintenance, and will probably causes us troubles in language pickers if, like for version picker, we want a preflight to check if the translation exists in the given version: preflight will probably be blocked by same-origin-policy. Wildcard TLS certificates are very expensive. Using content negociation (HTTP headers ``Accept-Language`` in the request and ``Vary: Accept-Language``) leads to a bad user experience where they can't easily change the language. According to Mozilla: "This header is a hint to be used when the server has no way of determining the language via another way, like a specific URL, that is controlled by an explicit user decision." [25]_. As we want to be able to easily change the language, we should not use the content negociation as a main language determination, so we need somthing else. Last solution is to use the URL path, which looks readable, allows for an easy switch from a language to another, and nicely accepts hyphens. Typically something like: "docs.python.org/de/". Example with a hyphen: "docs.python.org/pt-BR/" As for version, sphinx-doc does not support compiling for multiple languages, so we'll have full builds rooted under a path, exactly like we're already doing with versions. So we can have "docs.python.org/de/3.6/" or "docs.python.org/3.6/de/". Question is "Does the language contains multiple version or does version contains multiple languages?" As versions exists in any cases, and translations for a given version may or may not exists, we may prefer "docs.python.org/3.6/de/", but doing so scatter languages everywhere. Having "/de/3.6/" is clearer about "everything under /de/ is written in deutch". Having the version at the end is also an habit taken by readers of the documentation: they like to easily change the version by changing the end of the path. So we should use the following pattern: "docs.python.org/LANGUAGE_TAG/VERSION/". Current documentation is not moved to "/en/", but "docs.python.org/en/" will redirect to "docs.python.org/en/". Language Tag '''''''''''' A common notation for language tags is the IETF Language Tag [3]_ [4]_ based on ISO 639, alghough gettext uses ISO 639 tags with underscores (ex: ``pt_BR``) instead of dashes to join tags [5]_ (ex: ``pt-BR``). Examples of IETF Language Tags: ``fr`` (French), ``jp`` (Japanese), ``pt-BR`` (Orthographic formulation of 1943 - Official in Brazil). It is more common to see dashes instead of underscores in URLs [6]_, so we should use IETF language tags, even if sphinx uses gettext internally: URLs are not meant to leak the underlying implementation. It's uncommon to see capitalized letters in URLs, and docs.python.org don't use any, so it may hurt readability by attracting the eye on it, like in: "https://docs.python.org/pt-BR/3.6/library/stdtypes.html". RFC 5646 (Tags for Identifying Languages (IETF)) section-2.1 [7]_ tells the tags are not case sensitive. As the RFC allows lower case, and it enhances readability, we should use lowercased tags like ``pt-br``. It's redundant to display both language and country code if they're the same, typically "de-DE", "fr-FR", although it make sense, respectively "Deutch as spoken in Germany" and "French as spoken in France", it's not a usefull information for the reader. So we may drop those redundencies. We should obviously keep the country part when it make sense like "pt-BR" for "Portuguese as spoken in Brazil". So we should use IETF language tags, lowercased, like ``/fr/``, ``/pt-br/``, ``/de/`` and so on. Fetching And Building Translations '''''''''''''''''''''''''''''''''' Currently docsbuild-scripts are building the documentation [8]_. These scripts should be modified to fetch and build translations. Building new translations is like building new versions, so we're adding complexity, but not that much. Two steps should be configurable distinctively: Build a new language, and add it to the language picker. This allows a transition step between "we accepted the language" and "it is translated enough to be made public", during this step, translators can review their modifications on d.p.o without having to build the documentation locally. From the translations repositories, only the ``.po`` files should be opened by the docsbuild-script to keep the attack surface and probable bugs sources at a minimum. This mean no translation can patch sphinx to advertise their translation tool. (This specific feature should be handled by sphinx anyway [9]_). Community --------- Mailing List '''''''''''' The `doc-sig`_ mailing list will be used to discuss cross-language changes on translated documentations. There is also the i18n-sig list but it's more oriented towards i18n APIs [1]_, than translation the Python documentation. .. _i18n-sig: https://mail.python.org/mailman/listinfo/i18n-sig .. _doc-sig: https://mail.python.org/mailman/listinfo/doc-sig Chat '''' Python community being highly active on IRC, we should create a new IRC channel on freenode, typically #python-doc for consistency with the mailing list name. Each language coordinator can organize its own team, even by choosing another chat system if the local usage asks for it. As local teams will write in their native languages, we don't want each team in a single channel, and it's also natural for the local teams to reuse their local channels like "#python-fr" for french translators. Repository for PO Files ''''''''''''''''''''''' Considering that each translation teams may want to use different translation tools, and that those tools should easily be synchronized with git, all translations should expose their ``.po`` files via a git repository. Considering that each translation will be exposed via git repositories, and that Python has migrated to GitHub, translations will be hosted on github. For consistency and discoverability, all translations should be in the same github organization and named according to a common pattern. Considering that we want translations to be official, and that Python already have a github organization, translations should be hosted as projects of the `Python documentation GitHub organization`_. For consistency, translations repositories should be called ``python-docs-LANGUAGE_TAG`` [22]_. The docsbuild-scripts may enforce this rule by refusing to fetch outside of the Python organization or a wrongly named repository. The CLA bot may be used on the translation repositories, but with a limited effect as local coordinators may synchronize themselves translations from an external tool like transifex, loosing in the process who translated what. Version can be hosted on different repositories, different directories or different branches. Storing them on different repositories will probably pollute the Python documentation github organization. As it is typical and natural to use branches to separate versions, branches should be used to do so. .. _Python documentation GitHub organization: https://github.com/python-docs/ Translation tools ''''''''''''''''' Most of the translation work is actually done on Transifex [15]_. Other tools may be used later https://pontoon.mozilla.org/ and http://zanata.org/ Contributor Agreement ''''''''''''''''''''' Contributions to translated documentation will be requested to sign the Python Contributor Agreement (CLA): https://www.python.org/psf/contrib/contrib-form/ Language Team ''''''''''''' Each language team should have one coordinator responsible to: - Manage the team - Choose and manage the tools its team will use (chat, mailing list, ?) - Ensure contributors understand and agree with the CLA - Ensure quality (grammar, vocabulary, consistency, filtering spam, ads, ?) - Do redirect to GitHub issue tracker issues related to its language on bugs.python.org The license will be the `PSF License `_, and copyright should be transferable to PSF later. Alternatives ------------ Simplified English '''''''''''''''''' It would be possible to introduce a "simplified english" version like wikipedia did [10]_, as discussed on python-dev [11]_, targetting english learners and childrens. Pros: It yields a single other translation, theorically readable by everyone, and reviewable by current maintainers. Cons: Subtle details may be lost, and translators from english to english may be hard to find as stated by Wikipedia: > The main English Wikipedia has 5 million articles, written by nearly 140K active users; the Swedish Wikipedia is almost as big, 3M articles from only 3K active users; but the Simple English Wikipedia has just 123K articles and 871 active users. That's fewer articles than Esperanto! Changes ======= Migrate GitHub Repositories --------------------------- We (authors of this PEP) already own french and japanese Git repositories, so moving them to the Python documentation organization will not be a problem. We'll however follow the `New Translation Procedure`_. Patch docsbuild-scripts to Compile Translations ----------------------------------------------- Docsbuild-script must be patched to: - List the languages tags to build along with the branches to build. - List the languages tags to display in the language picker. - Find translation repositories by formatting ``github.com:python-docs/python-docs-{language_tag}.git`` (See `Repository for Po Files`_) - Build translations for each branches and each languages Patched docsbuild-scripts must only open ``.po`` files from translation repositories. List coordinators in the devguide --------------------------------- Add a page or a section with an empty list of coordinators to the devguide, each new coordinators will be added to this list. Create sphinx-doc Language Picker --------------------------------- Highly similar to the version picker, a language picker must be implemented. This language picker must be configurable to hide or show a given language. Enhance rendering of untranslated fuzzy translations ---------------------------------------------------- It's an opened sphinx issue [9]_, but we'll need it so we'll have to work on it. Translated, fuzzy, and untranslated paragraphs should be differentiated. (Fuzzy paragraphs have to warn the reader what he's reading may be out of date.) New Translation Procedure ========================= Designate a Coordinator ----------------------- The first step is to designate a coordinator, see `Language Team`_, The coordinator must sign the CLA. The coordinator should be added to a list of translation coordinator on the devguide. Create github repository ------------------------ Create a repository named "python-docs-{LANGUAGE_TAG}" on the Python documentation github organization (See `Repository For Po Files`_.), and grant the language coordinator push rights to this repository. Add support for translations in docsbuild-scripts ------------------------------------------------- As soon as the translation hits its firsts commits, update the docsbuild-scripts configuration to build the translation (but not displaying it in the language picker). Add translation to the language picker -------------------------------------- As soon as the translation hits: - 100% of bugs.html with proper links to the language repository issue tracker. - 100% of tutorial - 100% of library/functions (builtins) the translation can be added to the language picker. Previous discussions ==================== - `[Python-ideas] Cross link documentation translations (January, 2016)`_ - `[Python-ideas] Cross link documentation translations (January, 2016)`_ - `[Python-ideas] https://docs.python.org/fr/ ? (March 2016)`_ .. _[Python-ideas] Cross link documentation translations (January, 2016): https://mail.python.org/pipermail/python-ideas/2016-January/038010.html .. _[Python-Dev] Translated Python documentation (Febrary 2016): https://mail.python.org/pipermail/python-dev/2017-February/147416.html .. _[Python-ideas] https://docs.python.org/fr/ ? (March 2016): https://mail.python.org/pipermail/python-ideas/2016-March/038879.html References ========== .. [1] [I18n-sig] Hello Python members, Do you have any idea about Python documents? (https://mail.python.org/pipermail/i18n-sig/2013-September/002130.html) .. [2] [Doc-SIG] Localization of Python docs (https://mail.python.org/pipermail/doc-sig/2013-September/003948.html) .. [3] Tags for Identifying Languages (http://tools.ietf.org/html/rfc5646) .. [4] IETF language tag (https://en.wikipedia.org/wiki/IETF_language_tag) .. [5] GNU Gettext manual, section 2.3.1: Locale Names (https://www.gnu.org/software/gettext/manual/html_node/Locale-Names.html) .. [6] Semantic URL: Slug (https://en.wikipedia.org/wiki/Semantic_URL#Slug) .. [7] Tags for Identifying Languages: Formatting of Language Tags (https://tools.ietf.org/html/rfc5646#section-2.1.1) .. [8] Docsbuild-scripts github repository (https://github.com/python/docsbuild-scripts/) .. [9] i18n: Highlight untranslated paragraphs (https://github.com/sphinx-doc/sphinx/issues/1246) .. [10] Wikipedia: Simple English (https://simple.wikipedia.org/wiki/Main_Page) .. [11] Python-dev discussion about simplified english (https://mail.python.org/pipermail/python-dev/2017-February/147446.html) .. [12] Passing options to sphinx from Doc/Makefile (https://github.com/python/cpython/commit/57acb82d275ace9d9d854b156611e641f68e9e7c) .. [13] French translation progression (https://mdk.fr/pycon2016/#/11) .. [14] French translation contributors (https://github.com/AFPy/python_doc_fr/graphs/contributors?from=2016-01-01&to=2016-12-31&type=c) .. [15] Python-doc on Transifex (https://www.transifex.com/python-doc/) .. [16] French translation (https://www.afpy.org/doc/python/) .. [17] French translation github (https://github.com/AFPy/python_doc_fr) .. [18] French mailing list (http://lists.afpy.org/mailman/listinfo/traductions) .. [19] Japanese translation (http://docs.python.jp/3/) .. [20] Japanese github (https://github.com/python-doc-ja/python-doc-ja) .. [21] Spanish translation (http://docs.python.org.ar/tutorial/3/index.html) .. [22] [Python-Dev] Translated Python documentation: doc vs docs (https://mail.python.org/pipermail/python-dev/2017-February/147472.html) .. [23] Domains - SEO Best Practices | Moz (https://moz.com/learn/seo/domain) .. [24] Requirements for Internet Hosts -- Application and Support (https://www.ietf.org/rfc/rfc1123.txt) .. [25] Accept-Language (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -- Julien Palard https://mdk.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Mar 21 21:03:57 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 22 Mar 2017 02:03:57 +0100 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: I assigned the number 545 to the PEP. It should be online in less than 2 hours (I don't know what and how PEPs are rendered on python.org) at: https://www.python.org/dev/peps/pep-0545/ Victor From victor.stinner at gmail.com Tue Mar 21 21:14:17 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 22 Mar 2017 02:14:17 +0100 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: > Python documentation GitHub organization: https://github.com/python-docs/ I tried to create a team in the GitHub Python organization. It works. But then I don't have the right to add new members, since "I'm not an organization owner". IMHO the Python organization is too strict for such translation project. That's why I proposed to use https://github.com/python-docs/ (which was reserved by Naoki if I recall correctly), to give more freedom to translation subteams. To more easily delegate permissions and don't have to trust everyone. Victor From victor.stinner at gmail.com Tue Mar 21 21:18:42 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 22 Mar 2017 02:18:42 +0100 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: > Contributor Agreement > ''''''''''''''''''''' > > Contributions to translated documentation will be requested to sign the > Python Contributor Agreement (CLA): > > https://www.python.org/psf/contrib/contrib-form/ I'm not sure about this requirement, but I'm not a lawyer. I guess that in case of doubt, it's better to require it? By the way, we may also use the Python Code of Conduct: https://www.python.org/psf/codeofconduct/ Victor From ncoghlan at gmail.com Wed Mar 22 06:34:43 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 22 Mar 2017 20:34:43 +1000 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: On 22 March 2017 at 08:05, Julien Palard via Python-ideas < python-ideas at python.org> wrote: > Hi, > > Here is the follow-up to the "Translated Python documentation" thread on > python-dev [1]_, as a PEP describing the steps to make Python Documentation > Translation official on accessible on docs.python.org. > Thanks folks, this looks really good to me! > So we can have "docs.python.org/de/3.6/" or > "docs.python.org/3.6/de/". Question is "Does the language contains > multiple version or does version contains multiple languages?" As > versions exists in any cases, and translations for a given version may > or may not exists, we may prefer "docs.python.org/3.6/de/", but doing > so scatter languages everywhere. Having "/de/3.6/" is clearer about > "everything under /de/ is written in deutch". Having the version at > the end is also an habit taken by readers of the documentation: they > like to easily change the version by changing the end of the path. > > So we should use the following pattern: > "docs.python.org/LANGUAGE_TAG/VERSION/". > That structure also has the advantage of being able to cope if we ever decide that content like the tutorial and the howto guides would be better maintained as unversioned resources that note any version differences inline in the text. My one request for clarification would be whether or not there would be redirects back to the /en/ versions in place when there is no translation for a particular version (e.g. the older security-fix only branches). I'm not sure it matters all that much either way, but the PEP should be explicit. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Wed Mar 22 07:33:26 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 22 Mar 2017 20:33:26 +0900 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: On Wed, Mar 22, 2017 at 10:18 AM, Victor Stinner wrote: >> Contributor Agreement >> ''''''''''''''''''''' >> >> Contributions to translated documentation will be requested to sign the >> Python Contributor Agreement (CLA): >> >> https://www.python.org/psf/contrib/contrib-form/ > > I'm not sure about this requirement, but I'm not a lawyer. I guess > that in case of doubt, it's better to require it? To publish / redistribute translations under d.p.o, we need to get agreement from all translators. Maybe, we can use Github pull request (with template) to confirm member agreed how translations are used, instead of CLA. > > By the way, we may also use the Python Code of Conduct: > https://www.python.org/psf/codeofconduct/ > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From george at fischhof.hu Wed Mar 22 08:51:26 2017 From: george at fischhof.hu (George Fischhof) Date: Wed, 22 Mar 2017 13:51:26 +0100 Subject: [Python-ideas] Third party module in standard library Message-ID: Hi Guys, I would like to ask You: What is the process to propose a module to be part of the standard library? I would like to propose the following modules: requests https://pypi.python.org/pypi/requests and xmltodict https://pypi.python.org/pypi/xmltodict Both of them makes the life easier, and more simple ;-) Of course there are several similar libraries -- I know -- and actually several good modules could be part of standard library. ... Kind regards, George -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.f.hilliard at gmail.com Wed Mar 22 09:04:10 2017 From: d.f.hilliard at gmail.com (Jim F.Hilliard) Date: Wed, 22 Mar 2017 15:04:10 +0200 Subject: [Python-ideas] Third party module in standard library In-Reply-To: References: Message-ID: You might be interested in reading [1] for inclusion of new modules. A discussion on inclusion of requests has already been made [2] so you might want to take a look there too, I wouldn't be surprised if there's an old thread on python-ideas somewhere for requests. For xmltodict, I have no idea :-). Posting on python-ideas was the right step to take, though :-) [1]: https://docs.python.org/devguide/stdlibchanges.html#adding-a-new-module [2]: https://github.com/kennethreitz/requests/issues/2424 Best Regards, Jim Fasarakis Hilliard On Wed, Mar 22, 2017 at 2:51 PM, George Fischhof wrote: > Hi Guys, > > I would like to ask You: > What is the process to propose a module to be part of the standard library? > > I would like to propose the following modules: > requests > https://pypi.python.org/pypi/requests > > and > xmltodict > https://pypi.python.org/pypi/xmltodict > > Both of them makes the life easier, and more simple ;-) > Of course there are several similar libraries -- I know -- and actually > several good modules could be part of standard library. ... > > Kind regards, > George > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Mar 22 10:04:31 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 22 Mar 2017 10:04:31 -0400 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: On 3/21/2017 9:18 PM, Victor Stinner wrote: >> Contributor Agreement >> ''''''''''''''''''''' >> >> Contributions to translated documentation will be requested to sign the >> Python Contributor Agreement (CLA): >> >> https://www.python.org/psf/contrib/contrib-form/ > I'm not sure about this requirement, but I'm not a lawyer. I guess > that in case of doubt, it's better to require it? Unless the PSF lawyer says otherwise, I think anything published on docs.python.org should have the same contribution agreement, PSF copyright on the collective work, and collective license, which in this case made it possible for people to do the translations. Filling out the online form is trivial, and people should know that they keep their copyright on what they write and that their contribution to PSF is permanent but non-exclusive. (This is a much nicer agreement than scientific journals that require assignment of copyright, which occasionally results in journals suing researchers for re-using their own writing.) -- Terry Jan Reedy From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Mar 22 11:12:49 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 23 Mar 2017 00:12:49 +0900 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: <22738.38001.787798.314478@turnbull.sk.tsukuba.ac.jp> INADA Naoki writes: > On Wed, Mar 22, 2017 at 10:18 AM, Victor Stinner > wrote: > >> Contributor Agreement > >> ''''''''''''''''''''' > >> > >> Contributions to translated documentation will be requested to sign the > >> Python Contributor Agreement (CLA): > >> > >> https://www.python.org/psf/contrib/contrib-form/ > > > > I'm not sure about this requirement, but I'm not a lawyer. I guess > > that in case of doubt, it's better to require it? You have to ask the PSF lawyer about that, but I would say it's a good idea. We *do* need an appropriate license from each contributor IMO (IANAL). If we require it now and decide it's unneeded, at worst a few people with mild objections will sign a contributor license they didn't really have to. Since they retain copyright, any harm done should be rare and small. (If they really cared about copyleft etc, they'd be working in a GNU project.) If we don't do it now, somebody will likely need to identify contributions and chase down CLAs later, which is yuck. I've done that, I hope nobody ever has to do it again! > To publish / redistribute translations under d.p.o, we need to get > agreement from all translators. > > Maybe, we can use Github pull request (with template) to confirm member > agreed how translations are used, instead of CLA. IMO, the current web-based CLA is the right template to use. DRY - if the lawyers change the wording (unlikely, but possible) or the licenses (also unlikely but possible), we get that automatically. Of course, hook it into the GitHub process, since that's how you're going to collect contributions. Of course if the lawyers decide to use one or more documentation- oriented licenses for contributions, we'd need a different template. Unlikely but possible, but this would be PSF Legal-driven so again, gotta talk to them. Steve "IANAL but I've been through license wars and CLA collection drives" From brett at python.org Wed Mar 22 12:29:48 2017 From: brett at python.org (Brett Cannon) Date: Wed, 22 Mar 2017 16:29:48 +0000 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: On Wed, 22 Mar 2017 at 07:06 Terry Reedy wrote: > On 3/21/2017 9:18 PM, Victor Stinner wrote: > >> Contributor Agreement > >> ''''''''''''''''''''' > >> > >> Contributions to translated documentation will be requested to sign the > >> Python Contributor Agreement (CLA): > >> > >> https://www.python.org/psf/contrib/contrib-form/ > > > I'm not sure about this requirement, but I'm not a lawyer. I guess > > that in case of doubt, it's better to require it? > > Unless the PSF lawyer says otherwise, I think anything published on > docs.python.org should have the same contribution agreement, PSF > copyright on the collective work, and collective license, which in this > case made it possible for people to do the translations. > IANAL either, so I would ask if you should use the PSF license versus e.g. Apache. (In the past Van has said that the PSF license shouldn't be used for new projects and instead a more modern license with less cruft should be used). -Brett > > Filling out the online form is trivial, and people should know that they > keep their copyright on what they write and that their contribution to > PSF is permanent but non-exclusive. (This is a much nicer agreement > than scientific journals that require assignment of copyright, which > occasionally results in journals suing researchers for re-using their > own writing.) > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien at palard.fr Wed Mar 22 17:10:59 2017 From: julien at palard.fr (Julien Palard) Date: Wed, 22 Mar 2017 17:10:59 -0400 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: Hi Nick, My one request for clarification would be whether or not there would be redirects back to the /en/ versions in place when there is no translation for a particular version (e.g. the older security-fix only branches). I'm not sure it matters all that much either way, but the PEP should be explicit. About switching language in the same version: A 404 can't occur (both builds are from the *same* rst files), although the destination page may not be translated. In this case the page should still be displayed (as it exists) the page should warn about its untranslated state. It's a more a sphinx-doc issue than a python doc issue, but it still should be adressed, see "Enhance rendering of untranslated fuzzy translations". About switching version in the same language: A 404 can occur, but is already handled by the current version switcher, which preflights the request and give you the home page in case the destination does not exists. This behavior will be kept in the version switcher. However the current version switcher should be updated to understand the language segment in the path, I reviewed its code and it's not ready, so I updated the PEP, I changed the Language Switcher paragraph to: Create sphinx-doc Language Switcher ----------------------------------- Highly similar to the version switcher, a language switcher must be implemented. This language switcher must be configurable to hide or show a given language. The language switcher will only have to update or add the language segment to the path like the current version switcher does. Unlike the version switcher, no preflight are required as destination page always exists (translations does not add or remove pages). Untranslated (but existing) pages still exists, they should however be rendered as so, see `Enhance Rendering of Untranslated and Fuzzy Translations`_. And added: Update sphinx-doc Version Switcher ---------------------------------- The ``patch_url`` function of the version switcher in ``version_switch.js`` have to be updated to understand and allow the presence of the language segment in the path. I also replaced occurrences of "picker" with "switcher" at it's the vocabulary used in the current version pi^Wswitcher. -- Julien Palard https://mdk.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Thu Mar 23 04:14:40 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 23 Mar 2017 09:14:40 +0100 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: 2017-03-22 2:14 GMT+01:00 Victor Stinner : >> Python documentation GitHub organization: https://github.com/python-docs/ > > I tried to create a team in the GitHub Python organization. It works. > But then I don't have the right to add new members, since "I'm not an > organization owner". IMHO the Python organization is too strict for > such translation project. > > That's why I proposed to use https://github.com/python-docs/ (which > was reserved by Naoki if I recall correctly), to give more freedom to > translation subteams. To more easily delegate permissions and don't > have to trust everyone. I discuss this point with Brett Canon. First, we don't have to create teams. Each project can have its own list of contributors. About teams, if I become a team maintainer, I will be able to invite people to a team without an organization manager doing it for me. So it seems ok to move to the Python organization, github.com/python, rather than using a different organization. What do you think? Victor From cory at lukasa.co.uk Thu Mar 23 04:45:12 2017 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 23 Mar 2017 08:45:12 +0000 Subject: [Python-ideas] Third party module in standard library In-Reply-To: References: Message-ID: > On 22 Mar 2017, at 12:51, George Fischhof wrote: > > Hi Guys, > > I would like to ask You: > What is the process to propose a module to be part of the standard library? > > I would like to propose the following modules: > requests > https://pypi.python.org/pypi/requests Thanks for the consideration, and we?re glad that you find Requests helpful. However, the Requests project is not interested in being part of the standard library for as long as it?s under active development. This LWN article provides a good summary of the discussion that was had at the 2015 Python Language Summit, which covers the arguments for and against: https://lwn.net/Articles/640838/ . We believe that the most important place to spend time is not adding more modules to the standard library, but making it easier to get hold of the best-in-class third-party modules. Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: From julien at palard.fr Thu Mar 23 17:09:31 2017 From: julien at palard.fr (Julien Palard) Date: Thu, 23 Mar 2017 17:09:31 -0400 Subject: [Python-ideas] PEP: Python Documentation Translations In-Reply-To: References: Message-ID: Hi, So it seems ok to move to the Python organization, github.com/python, rather than using a different organization. What do you think? I think it make more sense, so if it's possible, let's do it, I updated the PEP. -- Julien Palard https://mdk.fr -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Fri Mar 24 11:10:30 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Fri, 24 Mar 2017 18:10:30 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) Message-ID: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Hi folks! We started a project to extend Python with a full-blown query language about a year ago. The project is call PythonQL, the links are given below in the references section. We have implemented what is kind of an alpha version now, and gained some experience and insights about why and where this is really useful. So I?d like to share those with you and gather some opinions whether you think we should try to include these extensions in the Python core. Intro What we have done is (mostly) extended Python?s comprehensions with group by, order by, let and window clauses, which can come in any order, thus comprehensions become a query language a bit cleaner and more powerful than SQL. And we added a couple small convenience extensions, like a We have identified three top motivations for folks to use these extensions: Our Motivations 1. This can become a standard for running queries against database systems. Instead of learning a large number of different SQL dialects (the pain point here are libraries of functions and operators that are different for each vendor), the Python developer needs only to learn PythonQL and he can query any SQL and NoSQL database. 2. A single PythonQL expression can integrate a number of databases/files/memory structures seamlessly, with the PythonQL optimizer figuring out which pieces of plans to ship to which databases. This is a cool virtual database integration story that can be very convenient, especially now, when a lot of data scientists use Python to wrangle the data all day long. 3. Querying data structures inside Python with the full power of SQL (and a bit more) is also really convenient on its own. Usually folks that are well-versed in SQL have to resort to completely different means when they need to run a query in Python on top of some data structures. Current Status We have PythonQL running, its installed via pip and an encoding hack, that runs our preprocessor. We currently compile PythonQL into Python using our executor functions and execute Python subexpressions via eval. We don?t do any optimization / rewriting of queries into languages of underlying systems. And the query processor is basic too, with naive implementations of operators. But we?ve build DBMS systems before, so if there is a good amount of support for this project, we?ll be able to build a real system here. Your take on this Extending Python?s grammar is surely a painful thing for the community. We?re now convinced that it is well worth it, because of all the wonderful functionality and convenience this extension offers. We?d like to get your feedback on this and maybe you?ll suggest some next steps for us. References PythonQL GitHub page: https://github.com/pythonql/pythonql PythonQL Intro and Tutorial (this is all User Documentation we have right now): https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial A use-case of querying Event Logs and doing Process Mining with PythonQL: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL PythonQL demo site: www.pythonql.org Best regards, PythonQL Team -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Fri Mar 24 11:41:58 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 24 Mar 2017 10:41:58 -0500 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: Recently, I was working on a Windows GUI application that ends up running ffmpeg, and I wanted to see the command that was being run. However, the file name had a Unicode character in it (it's a Sawano song), and when I tried to print it to the console, it crashed during the encode/decode. (The encoding used in cmd doesn't support Unicode characters.) The workaround was to do: print(mystring.encode(sys.stdout.encoding, errors='replace).decode(sys.stdout.encoding)) Not fun, especially since this was *just* a debug print. The proposal: why not add an 'errors' argument to print? That way, I could've just done: print(mystring, errors='replace') without having to worry about it crashing. -- Ryan (????) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else http://refi64.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Mar 24 11:54:55 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 24 Mar 2017 15:54:55 +0000 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: On 24 March 2017 at 15:41, Ryan Gonzalez wrote: > Recently, I was working on a Windows GUI application that ends up running > ffmpeg, and I wanted to see the command that was being run. However, the > file name had a Unicode character in it (it's a Sawano song), and when I > tried to print it to the console, it crashed during the encode/decode. (The > encoding used in cmd doesn't support Unicode characters.) > > The workaround was to do: > > > print(mystring.encode(sys.stdout.encoding, > errors='replace).decode(sys.stdout.encoding)) > > > Not fun, especially since this was *just* a debug print. > > The proposal: why not add an 'errors' argument to print? That way, I > could've just done: > > > print(mystring, errors='replace') > > > without having to worry about it crashing. When I've hit issues like this before, I've written a helper function: def sanitise(str, enc): """Ensure that str can be encoded in encoding enc""" return str.encode(enc, errors='replace').decode(enc) An errors argument to print would be very similar, but would only apply to the print function, whereas I've used my sanitise function in other situations as well. I understand the attraction of a dedicated "just print the best representation you can" argument to print, but I'm not sure it's a common enough need to be worth adding like this. Paul From victor.stinner at gmail.com Fri Mar 24 12:37:33 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 24 Mar 2017 17:37:33 +0100 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: *If* we change something, I would prefer to modify sys.stdout. The following issue proposes to add sys.stdout.set_encoding(errors='replace'): http://bugs.python.org/issue15216 You can already set the PYTHONIOENCODING environment variable to ":replace" to use "replace" on sys.stdout (and sys.stderr). Victor 2017-03-24 16:41 GMT+01:00 Ryan Gonzalez : > Recently, I was working on a Windows GUI application that ends up running > ffmpeg, and I wanted to see the command that was being run. However, the > file name had a Unicode character in it (it's a Sawano song), and when I > tried to print it to the console, it crashed during the encode/decode. (The > encoding used in cmd doesn't support Unicode characters.) > > The workaround was to do: > > > print(mystring.encode(sys.stdout.encoding, > errors='replace).decode(sys.stdout.encoding)) > > > Not fun, especially since this was *just* a debug print. > > The proposal: why not add an 'errors' argument to print? That way, I > could've just done: > > > print(mystring, errors='replace') > > > without having to worry about it crashing. > > -- > Ryan (????) > Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else > http://refi64.com > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From tjreedy at udel.edu Fri Mar 24 13:50:13 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 24 Mar 2017 13:50:13 -0400 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: On 3/24/2017 11:10 AM, Pavel Velikhov wrote: > Hi folks! > > We started a project to extend Python with a full-blown query language > about a year ago. The project is call PythonQL, the links are given > below in the references section. We have implemented what is kind of an > alpha version now, and gained some experience and insights about why and > where this is really useful. So I?d like to share those with you and > gather some opinions whether you think we should try to include these > extensions in the Python core. No. PythonQL defines a comprehension-inspired SQL-like domain-specific (specialized) language. Its style of packing programs into expressions is contrary to that of Python. It appears to me that most of the added features duplicate ones already in Python (sorted, itertools, named tuples?). I think it should remain a separate project with its own development group and schedule. This is not to say that I would never use PQL. I like he idea of a uniform method of accessing in-memory and on-disk date, and like Python's current method of making files an iterable of lines. I believe the current DB API allows something similar. I think that the misuse of coding cookies, which makes it unusable in code that already has a proper coding cookie, should be replaced by normal imports. PQL expressions should be quoted and passed to the dsl processor, as done with SQL and other DSLs. -- Terry Jan Reedy From guido at python.org Fri Mar 24 14:15:28 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 24 Mar 2017 11:15:28 -0700 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: On Fri, Mar 24, 2017 at 9:37 AM, Victor Stinner wrote: > *If* we change something, I would prefer to modify sys.stdout. The > following issue proposes to add > sys.stdout.set_encoding(errors='replace'): > http://bugs.python.org/issue15216 > I like that. > You can already set the PYTHONIOENCODING environment variable to > ":replace" to use "replace" on sys.stdout (and sys.stderr). > Great tip, I've needed this! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Mar 24 14:29:48 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 24 Mar 2017 18:29:48 +0000 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: On 24 March 2017 at 16:37, Victor Stinner wrote: > *If* we change something, I would prefer to modify sys.stdout. The > following issue proposes to add > sys.stdout.set_encoding(errors='replace'): > http://bugs.python.org/issue15216 I thought I recalled seeing something like that discussed somewhere. I agree that this is a better approach (even though it's not as granular as being able to specify on an individual print statement). > You can already set the PYTHONIOENCODING environment variable to > ":replace" to use "replace" on sys.stdout (and sys.stderr). That's something I didn't know. Thanks for the pointer! Paul From pavel.velikhov at gmail.com Fri Mar 24 15:03:38 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Fri, 24 Mar 2017 22:03:38 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: Hi Terry! Thanks for your feedback, I have a couple comments below. > On 24 Mar 2017, at 20:50, Terry Reedy wrote: > > On 3/24/2017 11:10 AM, Pavel Velikhov wrote: >> Hi folks! >> >> We started a project to extend Python with a full-blown query language >> about a year ago. The project is call PythonQL, the links are given >> below in the references section. We have implemented what is kind of an >> alpha version now, and gained some experience and insights about why and >> where this is really useful. So I?d like to share those with you and >> gather some opinions whether you think we should try to include these >> extensions in the Python core. > > No. PythonQL defines a comprehension-inspired SQL-like domain-specific (specialized) language. Its style of packing programs into expressions is contrary to that of Python. It appears to me that most of the added features duplicate ones already in Python (sorted, itertools, named tuples?). I think it should remain a separate project with its own development group and schedule. These features do exist separately, but usually they are quite a bit less powerful and convenient, then the ones we propose, and can?t be put into a single query/expressions. For example namedtuple requires one to define it first, the groupby in itertools doesn?t create named tuples and sorted takes a single expression. I do agree that synching releases with the overall Python code can become a problem... > > This is not to say that I would never use PQL. I like he idea of a uniform method of accessing in-memory and on-disk date, and like Python's current method of making files an iterable of lines. I believe the current DB API allows something similar. > > I think that the misuse of coding cookies, which makes it unusable in code that already has a proper coding cookie, should be replaced by normal imports. PQL expressions should be quoted and passed to the dsl processor, as done with SQL and other DSLs. So we see a lot of value of having PythonQL as a syntax extension, instead of having to define string queries and then executing them. It would definitely make our lives much simpler if we go with strings. But a lot of developers use ORMs just to avoid having to construct query string and have them break with a syntax error in different cases. In our case I guess we can avoid a lot of the hassle associated with query strings, since we can access all variables and functions from the context of the query. But when you?re doing interactive stuff in rapid development mode (e.g. doing some data science with Jupyter notebook) language integrated queries could be quite convenient. This is something we?ll need to think about... > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Fri Mar 24 18:38:26 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 25 Mar 2017 09:38:26 +1100 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: <20170324223825.GA27969@ando.pearwood.info> On Fri, Mar 24, 2017 at 10:41:58AM -0500, Ryan Gonzalez wrote: > Recently, I was working on a Windows GUI application that ends up running > ffmpeg, and I wanted to see the command that was being run. However, the > file name had a Unicode character in it (it's a Sawano song), and when I > tried to print it to the console, it crashed during the encode/decode. (The > encoding used in cmd doesn't support Unicode characters.) *Crash* crash, or just an exception? If it crashed the interpreter, you ought to report that as a bug. > The workaround was to do: > > > print(mystring.encode(sys.stdout.encoding, > errors='replace).decode(sys.stdout.encoding)) I think that this would be both simpler and more informative: print(ascii(mystring)) -- Steve From greg.ewing at canterbury.ac.nz Fri Mar 24 19:19:55 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 25 Mar 2017 12:19:55 +1300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: <58D5A99B.4030906@canterbury.ac.nz> Terry Reedy wrote: > PQL expressions should be quoted and passed to the dsl > processor, as done with SQL and other DSLs. But embedding one language as quoted strings inside another is a horrible way to program. I really like the idea of a data manipulation language that is seamlessly integrated with the host language. Unfortunately, PQL does not seem to be that. It appears to only work on Python data, and their proposed solutions for hooking it up to databases and the like is to use some existing DB interfacing method to get the data into Python, and then use PQL on that. I can see little point in that, since as Terry points out, most of what PQL does can already be done fairly easily with existing Python facilities. To be worth extending the language, PQL queries would need to be able to operate directly on data in the database, and that would mean hooking into the semantics somehow so that PQL expressions are evaluated differently from normal Python expressions. I don't see anything there that mentions any such hooks, either existing or planned. -- Greg From contact at brice.xyz Sat Mar 25 03:58:48 2017 From: contact at brice.xyz (Brice PARENT) Date: Sat, 25 Mar 2017 08:58:48 +0100 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: <211f8a08-958f-c5c1-4a81-04900419e5ce@brice.xyz> Hello! If I had to provide a unified way of dealing with data, whatever its source is, I would probably go with creating an standardized ORM, probably based on Django's or PonyORM, because: - it doesn't require any change in the Python language - it allows queries for both reading and writing. I didn't see a way to write (UPDATE, CREATE and DELETE equivalents), but maybe I didn't look right. Or maybe I didn't understand the purpose at all! - it makes it easy (although not fast) to extend to any backend (files, databases, or any other kind of data storage) - as a pure python object, any IDE already supports it. - It can live as a separate module until it is stable enough to be integrated into standard library (if it should ever be integrated there). - it is way easier to learn and use. Comprehensions are not the easier things to work with. Most of the places I worked in forbid their use except for really easy and short cases (single line, single loop, simple tests). You're adding a whole new syntax with new keywords to one of the most complicated and less readable (yet efficient in many cases, don't get me wrong) part of the language. I'm not saying there is no need for what you're developing, there probably is if you did it, but maybe the solution you chose isn't the easier way to have it merged to the core language, and if it was, it would really be a long shot, as there are many new keywords and new syntax to discuss, implement and test. But I like the idea of a standard API to deal with data, a nice battery to be included. Side note : I might not have understood what you were doing, so if I'm off-topic, tell me ! -Brice Le 24/03/17 ? 16:10, Pavel Velikhov a ?crit : > Hi folks! > > We started a project to extend Python with a full-blown query > language about a year ago. The project is call PythonQL, the links are > given below in the references section. We have implemented what is > kind of an alpha version now, and gained some experience and insights > about why and where this is really useful. So I?d like to share those > with you and gather some opinions whether you think we should try to > include these extensions in the Python core. > > *Intro* > > What we have done is (mostly) extended Python?s comprehensions with > group by, order by, let and window clauses, which can come in any > order, thus comprehensions become a query language a bit cleaner and > more powerful than SQL. And we added a couple small convenience > extensions, like a We have identified three top motivations for folks > to use these extensions: > > *Our Motivations* > > 1. This can become a standard for running queries against database > systems. Instead of learning a large number of different SQL dialects > (the pain point here are libraries of functions and operators that are > different for each vendor), the Python developer needs only to learn > PythonQL and he can query any SQL and NoSQL database. > > 2. A single PythonQL expression can integrate a number of > databases/files/memory structures seamlessly, with the PythonQL > optimizer figuring out which pieces of plans to ship to which > databases. This is a cool virtual database integration story that can > be very convenient, especially now, when a lot of data scientists use > Python to wrangle the data all day long. > > 3. Querying data structures inside Python with the full power of SQL > (and a bit more) is also really convenient on its own. Usually folks > that are well-versed in SQL have to resort to completely different > means when they need to run a query in Python on top of some data > structures. > > *Current Status* > > We have PythonQL running, its installed via pip and an encoding hack, > that runs our preprocessor. We currently compile PythonQL into Python > using our executor functions and execute Python subexpressions via > eval. We don?t do any optimization / rewriting of queries into > languages of underlying systems. And the query processor is basic too, > with naive implementations of operators. But we?ve build DBMS systems > before, so if there is a good amount of support for this project, > we?ll be able to build a real system here. > > *Your take on this* > > Extending Python?s grammar is surely a painful thing for the > community. We?re now convinced that it is well worth it, because of > all the wonderful functionality and convenience this extension offers. > We?d like to get your feedback on this and maybe you?ll suggest some > next steps for us. > > *References* > > PythonQL GitHub page: https://github.com/pythonql/pythonql > PythonQL Intro and Tutorial (this is all User Documentation we have > right now): > https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial > A use-case of querying Event Logs and doing Process Mining with > PythonQL: > https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL > PythonQL demo site: www.pythonql.org > > Best regards, > PythonQL Team > > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Sat Mar 25 07:24:43 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sat, 25 Mar 2017 14:24:43 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <58D5A99B.4030906@canterbury.ac.nz> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <58D5A99B.4030906@canterbury.ac.nz> Message-ID: <9AA92CCF-661F-4A19-AB53-0D17E733C827@gmail.com> > On 25 Mar 2017, at 02:19, Greg Ewing wrote: > > Terry Reedy wrote: >> PQL expressions should be quoted and passed to the dsl processor, as done with SQL and other DSLs. > > But embedding one language as quoted strings inside another > is a horrible way to program. > > I really like the idea of a data manipulation language that > is seamlessly integrated with the host language. > > Unfortunately, PQL does not seem to be that. It appears to > only work on Python data, and their proposed solutions for > hooking it up to databases and the like is to use some > existing DB interfacing method to get the data into Python, > and then use PQL on that. No, the current solution is temporary because we just don?t have the manpower to implement the full thing: a real system that will rewrite parts of PythonQL queries and ship them to underlying databases. We need a real query optimizer and smart wrappers for this purpose. But we?ll build one of these for demo purposes soon (either a Spark wrapper or a PostgreSQL wrapper). > > I can see little point in that, since as Terry points out, > most of what PQL does can already be done fairly easily > with existing Python facilities. You can solve any problem with basic language facilities. But instead of a simple query expression you will end up with a bunch of for loops (in case of groupby) and numeric indexes into tuples. You can take a look at some of the queries in this use-case and see how they would look in pure Python: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL > > To be worth extending the language, PQL queries would need > to be able to operate directly on data in the database, > and that would mean hooking into the semantics somehow so > that PQL expressions are evaluated differently from normal > Python expressions. > > I don't see anything there that mentions any such hooks, > either existing or planned. This is definitely planned, currently PythonQL expressions are evaluated separately because we run them through a pre-processor when you specify the pythonql encoding. We might need to add some hooks like PonyORM does, otherwise we might have to trace iterators to their source thought AST, which can become messy. It is a lot of work, so we?re not promising this in the nearest future. As far as hooks (if the language in integrated into core Python) we will probably have to define some kind of a ?Datasource? wrapper function that would wrap database cursors and Spark RDDs, etc. > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Mar 25 07:35:53 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 25 Mar 2017 11:35:53 +0000 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <9AA92CCF-661F-4A19-AB53-0D17E733C827@gmail.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <58D5A99B.4030906@canterbury.ac.nz> <9AA92CCF-661F-4A19-AB53-0D17E733C827@gmail.com> Message-ID: On 25 March 2017 at 11:24, Pavel Velikhov wrote: > No, the current solution is temporary because we just don?t have the > manpower to > implement the full thing: a real system that will rewrite parts of PythonQL > queries and > ship them to underlying databases. We need a real query optimizer and smart > wrappers > for this purpose. But we?ll build one of these for demo purposes soon > (either a Spark > wrapper or a PostgreSQL wrapper). One thought, if you're lacking in manpower now, then proposing inclusion into core Python means that the core dev team will be taking on an additional chunk of code that is already under-resourced. That rings alarm bells for me - how would you imagine the work needed to merge PythonQL into the core Python grammar would be resourced? I should say that in practice, I think that the solution is relatively niche, and overlaps quite significantly with existing Python features, so I don't really see a compelling case for inclusion. The parallel with C# and LINQ is interesting here - LINQ is a pretty cool technology, but I don't see it in widespread use in general-purpose C# projects (disclaimer: I don't get to see much C# code, so my experience is limited). Paul From pavel.velikhov at gmail.com Sat Mar 25 08:01:49 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sat, 25 Mar 2017 15:01:49 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <211f8a08-958f-c5c1-4a81-04900419e5ce@brice.xyz> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <211f8a08-958f-c5c1-4a81-04900419e5ce@brice.xyz> Message-ID: > On 25 Mar 2017, at 10:58, Brice PARENT wrote: > > Hello! > > Hello! > If I had to provide a unified way of dealing with data, whatever its source is, I would probably go with creating an standardized ORM, probably based on Django's or PonyORM, because: > > - it doesn't require any change in the Python language > > So we basically want to be just like PonyORM (the part that executes comprehensions against objects and databases), except we find that the current comprehension syntax is not powerful enough to express all the queries we want. So we extended the comprehension syntax, and then our strategy is a lot like PonyORM. > - it allows queries for both reading and writing. I didn't see a way to write (UPDATE, CREATE and DELETE equivalents), but maybe I didn't look right. Or maybe I didn't understand the purpose at all! > > Good point! This is a hard one, a single query in PythonQL can go against multiple databases, so doing updates with queries can be a big problem. We can add an insert/update/delete syntax to PythonQL and just track that these operations make sense. > - it makes it easy (although not fast) to extend to any backend (files, databases, or any other kind of data storage) > > Going to different backends is not a huge problem, just a bit of work. > - as a pure python object, any IDE already supports it. > > - It can live as a separate module until it is stable enough to be integrated into standard library (if it should ever be integrated there). > > So we live as an external module via an encoding hack :) > - it is way easier to learn and use. Comprehensions are not the easier things to work with. Most of the places I worked in forbid their use except for really easy and short cases (single line, single loop, simple tests). You're adding a whole new syntax with new keywords to one of the most complicated and less readable (yet efficient in many cases, don't get me wrong) part of the language. > Yes, that?s true. But comprehensions are basically a small subset of SQL, we?re extending it a bit to get the full power of query language. So we?re catering to folks that already know this language or are willing to learn it for their daily needs. > I'm not saying there is no need for what you're developing, there probably is if you did it, but maybe the solution you chose isn't the easier way to have it merged to the core language, and if it was, it would really be a long shot, as there are many new keywords and new syntax to discuss, implement and test. > Yes, we started off by catering to folks like data scientists or similar folks that have to write really complex queries all the time. > But I like the idea of a standard API to deal with data, a nice battery to be included. > > Side note : I might not have understood what you were doing, so if I'm off-topic, tell me ! > > -Brice > Thanks for the feedback! This is very useful. > > Le 24/03/17 ? 16:10, Pavel Velikhov a ?crit : >> Hi folks! >> >> We started a project to extend Python with a full-blown query language about a year ago. The project is call PythonQL, the links are given below in the references section. We have implemented what is kind of an alpha version now, and gained some experience and insights about why and where this is really useful. So I?d like to share those with you and gather some opinions whether you think we should try to include these extensions in the Python core. >> >> Intro >> >> What we have done is (mostly) extended Python?s comprehensions with group by, order by, let and window clauses, which can come in any order, thus comprehensions become a query language a bit cleaner and more powerful than SQL. And we added a couple small convenience extensions, like a We have identified three top motivations for folks to use these extensions: >> >> Our Motivations >> >> 1. This can become a standard for running queries against database systems. Instead of learning a large number of different SQL dialects (the pain point here are libraries of functions and operators that are different for each vendor), the Python developer needs only to learn PythonQL and he can query any SQL and NoSQL database. >> >> 2. A single PythonQL expression can integrate a number of databases/files/memory structures seamlessly, with the PythonQL optimizer figuring out which pieces of plans to ship to which databases. This is a cool virtual database integration story that can be very convenient, especially now, when a lot of data scientists use Python to wrangle the data all day long. >> >> 3. Querying data structures inside Python with the full power of SQL (and a bit more) is also really convenient on its own. Usually folks that are well-versed in SQL have to resort to completely different means when they need to run a query in Python on top of some data structures. >> >> Current Status >> >> We have PythonQL running, its installed via pip and an encoding hack, that runs our preprocessor. We currently compile PythonQL into Python using our executor functions and execute Python subexpressions via eval. We don?t do any optimization / rewriting of queries into languages of underlying systems. And the query processor is basic too, with naive implementations of operators. But we?ve build DBMS systems before, so if there is a good amount of support for this project, we?ll be able to build a real system here. >> >> Your take on this >> >> Extending Python?s grammar is surely a painful thing for the community. We?re now convinced that it is well worth it, because of all the wonderful functionality and convenience this extension offers. We?d like to get your feedback on this and maybe you?ll suggest some next steps for us. >> >> References >> >> PythonQL GitHub page: https://github.com/pythonql/pythonql >> PythonQL Intro and Tutorial (this is all User Documentation we have right now): https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial >> A use-case of querying Event Logs and doing Process Mining with PythonQL: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL >> PythonQL demo site: www.pythonql.org >> >> Best regards, >> PythonQL Team >> >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Sat Mar 25 08:28:49 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sat, 25 Mar 2017 15:28:49 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <58D5A99B.4030906@canterbury.ac.nz> <9AA92CCF-661F-4A19-AB53-0D17E733C827@gmail.com> Message-ID: <7043309B-256A-4F60-A86D-3C9BD712E46E@gmail.com> Hi Paul! > On 25 March 2017 at 11:24, Pavel Velikhov wrote: >> No, the current solution is temporary because we just don?t have the >> manpower to >> implement the full thing: a real system that will rewrite parts of PythonQL >> queries and >> ship them to underlying databases. We need a real query optimizer and smart >> wrappers >> for this purpose. But we?ll build one of these for demo purposes soon >> (either a Spark >> wrapper or a PostgreSQL wrapper). > > One thought, if you're lacking in manpower now, then proposing > inclusion into core Python means that the core dev team will be taking > on an additional chunk of code that is already under-resourced. That > rings alarm bells for me - how would you imagine the work needed to > merge PythonQL into the core Python grammar would be resourced? An inclusion in core would definitely help us to grow the team, but I see your point. If we could get an idea that we?d be in the core if we do a) b) c) and have a big enough team to be responsive, that could also help us grow. > > I should say that in practice, I think that the solution is relatively > niche, and overlaps quite significantly with existing Python features, > so I don't really see a compelling case for inclusion. The parallel > with C# and LINQ is interesting here - LINQ is a pretty cool > technology, but I don't see it in widespread use in general-purpose C# > projects (disclaimer: I don't get to see much C# code, so my > experience is limited). I?m not sure about the usual crowd of Python developers, but data scientists like the idea a lot, especially the future plans. If we?ll really have millions of data scientists soon, this could become pretty big. We?re also seeing a lot of much more advanced use-cases popping up, where PythonQL could really shine. Yes, LINQ didn?t go big, but maybe it was a bit ahead of its time. > > Paul From gerald.britton at gmail.com Sat Mar 25 08:51:58 2017 From: gerald.britton at gmail.com (Gerald Britton) Date: Sat, 25 Mar 2017 08:51:58 -0400 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) Message-ID: > > On 25 March 2017 at 11:24, Pavel Velikhov > wrote: > > No, the current solution is temporary because we just don?t have the > > manpower to > > implement the full thing: a real system that will rewrite parts of > PythonQL > > queries and > > ship them to underlying databases. We need a real query optimizer and > smart > > wrappers > > for this purpose. But we?ll build one of these for demo purposes soon > > (either a Spark > > wrapper or a PostgreSQL wrapper). > One thought, if you're lacking in manpower now, then proposing > inclusion into core Python means that the core dev team will be taking > on an additional chunk of code that is already under-resourced. That > rings alarm bells for me - how would you imagine the work needed to > merge PythonQL into the core Python grammar would be resourced? > I should say that in practice, I think that the solution is relatively > niche, and overlaps quite significantly with existing Python features, > so I don't really see a compelling case for inclusion. The parallel > with C# and LINQ is interesting here - LINQ is a pretty cool > technology, but I don't see it in widespread use in general-purpose C# > projects (disclaimer: I don't get to see much C# code, so my experience is limited). I see lots of C# code, but (thankfully) not so much LINQ to SQL. Yes, it is a cool technology. But I sometimes have a problem with the SQL it generates. Since I'm also a SQL developer, I'm sensitive to how queries are constructed, for performance reasons, as well as how they look, for readability and aesthetic reasons. LINQ queries can generate poorly-performing SQL, since LINQ is a basically a translator, but not an AI. As far as appearances go, LINQ queries can look pretty gnarly, especially if they include sub queries or a few joins. That makes it hard for the SQL dev (me!) to read and understand if there are performance problems (which there often are, in my experience) So, I would tend to code the SQL separately and put it in a SQL view, function or stored procedure. I can still parse the results with LINQ (not LINQ to SQL), which is fine. For similar reasons, I'm not a huge fan of ORMs either. Probably my bias towards designing the database first and building up queries to meet the business goals before writing a line of Python, C#, or the language de jour. -- Gerald Britton, MCSE-DP, MVP LinkedIn Profile: http://ca.linkedin.com/in/geraldbritton -------------- next part -------------- An HTML attachment was scrubbed... URL: From klahnakoski at mozilla.com Sat Mar 25 11:40:40 2017 From: klahnakoski at mozilla.com (Kyle Lahnakoski) Date: Sat, 25 Mar 2017 11:40:40 -0400 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> Pavel, I like PythonQL. I perform a lot of data transformation, and often find Python's list comprehensions too limiting; leaving me wishing for LINQ-like language features. As an alternative to extending Python with PythonQL, Terry Reedy suggested interpreting a DSL string, and Pavel Velikhov alluded to using magic method tricks found in ORM libraries. I can see how both these are not satisfactory. A third alternative could be to encode the query clauses as JSON objects. For example: result = [ select (x, sum_y) for x in range(1,8), y in range(1,7) where x % 2 == 0 and y % 2 != 0 and x > y group by x let sum_y = sum(y) where sum_y % 2 != 0 ] result = pq([ {"select":["x", "sum_y"]}, {"for":{"x": range(1,8), "y": range(1,7)}}, {"where": lambda x,y: x % 2 == 0 and y % 2 != 0 and x > y}, {"groupby": "x"}, {"with":{"sum_y":{"SUM":"y"}}, {"where": {"neq":[{"mod":["sum_y", 2]}, 0]}} ]) This representation does look a little lispy, and it may resemble PythonQL's parse tree. I think the benefits are: 1) no python language change 2) easier to parse 3) better than string-based DSL for catching syntax errors 4) {"clause": parameters} format is flexible for handling common query patterns ** 5) works in javascript too 6) easy to compose with automation (my favorite) It is probably easy for you to see the drawbacks. ** The `where` clause can accept a native lambda function, or an expression tree "If you are writing a loop, you are doing it wrong!" :) On 2017-03-24 11:10, Pavel Velikhov wrote: > Hi folks! > > We started a project to extend Python with a full-blown query > language about a year ago. The project is call PythonQL, the links are > given below in the references section. We have implemented what is > kind of an alpha version now, and gained some experience and insights > about why and where this is really useful. So I?d like to share those > with you and gather some opinions whether you think we should try to > include these extensions in the Python core. > > *Intro* > > What we have done is (mostly) extended Python?s comprehensions with > group by, order by, let and window clauses, which can come in any > order, thus comprehensions become a query language a bit cleaner and > more powerful than SQL. And we added a couple small convenience > extensions, like a We have identified three top motivations for folks > to use these extensions: > > *Our Motivations* > > 1. This can become a standard for running queries against database > systems. Instead of learning a large number of different SQL dialects > (the pain point here are libraries of functions and operators that are > different for each vendor), the Python developer needs only to learn > PythonQL and he can query any SQL and NoSQL database. > > 2. A single PythonQL expression can integrate a number of > databases/files/memory structures seamlessly, with the PythonQL > optimizer figuring out which pieces of plans to ship to which > databases. This is a cool virtual database integration story that can > be very convenient, especially now, when a lot of data scientists use > Python to wrangle the data all day long. > > 3. Querying data structures inside Python with the full power of SQL > (and a bit more) is also really convenient on its own. Usually folks > that are well-versed in SQL have to resort to completely different > means when they need to run a query in Python on top of some data > structures. > > *Current Status* > > We have PythonQL running, its installed via pip and an encoding hack, > that runs our preprocessor. We currently compile PythonQL into Python > using our executor functions and execute Python subexpressions via > eval. We don?t do any optimization / rewriting of queries into > languages of underlying systems. And the query processor is basic too, > with naive implementations of operators. But we?ve build DBMS systems > before, so if there is a good amount of support for this project, > we?ll be able to build a real system here. > > *Your take on this* > > Extending Python?s grammar is surely a painful thing for the > community. We?re now convinced that it is well worth it, because of > all the wonderful functionality and convenience this extension offers. > We?d like to get your feedback on this and maybe you?ll suggest some > next steps for us. > > *References* > > PythonQL GitHub page: https://github.com/pythonql/pythonql > PythonQL Intro and Tutorial (this is all User Documentation we have > right > now): https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial > A use-case of querying Event Logs and doing Process Mining with > PythonQL: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL > PythonQL demo site: www.pythonql.org > > Best regards, > PythonQL Team > > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Mar 25 12:40:52 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 26 Mar 2017 02:40:52 +1000 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: Message-ID: First off, I think PythonQL (and PonyORM before it) is a very interesting piece of technology. However, I think some of the answers so far suggest we may need to discuss a couple of meta-issues around target audiences and available technical options before continuing on. I'm quoting Gerald's post here because it highlights the "target audience" problem, but my comments apply to the thread generally. On 25 March 2017 at 22:51, Gerald Britton wrote: > > I see lots of C# code, but (thankfully) not so much LINQ to SQL. Yes, it is a cool technology. But I sometimes have a problem with the SQL it generates. Since I'm also a SQL developer, I'm sensitive to how queries are constructed, for performance reasons, as well as how they look, for readability and aesthetic reasons. > > LINQ queries can generate poorly-performing SQL, since LINQ is a basically a translator, but not an AI. As far as appearances go, LINQ queries can look pretty gnarly, especially if they include sub queries or a few joins. That makes it hard for the SQL dev (me!) to read and understand if there are performance problems (which there often are, in my experience) > > So, I would tend to code the SQL separately and put it in a SQL view, function or stored procedure. I can still parse the results with LINQ (not LINQ to SQL), which is fine. > > For similar reasons, I'm not a huge fan of ORMs either. Probably my bias towards designing the database first and building up queries to meet the business goals before writing a line of Python, C#, or the language de jour. Right, the target audience here *isn't* folks who already know how to construct their own relational queries in SQL, and it definitely isn't folks that know how to tweak their queries to get optimal performance from the specific database they're using. Rather, it's folks that already know Python's comprehensions, and perhaps some of the itertools features, and helping to provide them with a smoother on-ramp into the world of relational data processing. There's no question that folks dealing with sufficiently large data sets with sufficiently stringent performance requirements are eventually going to want to reach for handcrafted SQL or a distributed computation framework like dask, but that's not really any different from our standard position that when folks are attempting to optimise a hot loop, they're eventually going to have to switch to something that can eliminate the interpreter's default runtime object management overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an extension module in a different language entirely). It isn't an argument against making it easier for folks to postpone the point where they find it necessary to reach for the "something else" that takes them beyond Python's default capabilities. However, at the same time, PythonQL *is* a DSL for data manipulation operations, and map and filter are far and away the most common of those. Even reduce, which was previously a builtin, was pushed into functools for Python 3.0, with the preferred alternative being to just write a suitably named function that accepts an iterable and returns a single value. And while Python is a very popular tool for data manipulation, it would be a big stretch to assume that that was it's primary use case in all contexts. So it makes sense to review some of the technical options that are available to help make projects like PythonQL more maintainable, without necessarily gating improvements to them on the relatively slow update and rollout cycle of new Python versions. = Option 1 = Fully commit to the model of allowing alternate syntactic dialects to run atop Python interpreters. In Hylang and PythonQL we have at least two genuinely interesting examples of that working through the text encoding system, as well as other examples like Cython that work through the extension module system. So that's an opportunity to take this from "Possible, but a bit hacky" to "Pluggable source code translation is supported at all levels of the interpreter, including debugger source maps, etc" (perhaps by borrowing ideas from other ecosytems like Java, JavaScript, and .NET, where this kind of thing is already a lot more common. The downside of this approach is that actually making it happen would be getting pretty far afield from the original PythonQL goal of "provide nicer data manipulation abstractions in Python", and it wouldn't actually deliver anything new that can't already be done with existing import and codec system features. = Option 2 = Back when f-strings were added for 3.6, I wrote PEP 501 to generalise the idea as "i-strings": exposing the intermediate interpolated form of f-strings, such that you could write code like `myquery = sql(i"SELECT {column} FROM {table};")` where the "sql" function received an "InterpolationTemplate" object that it could render however it wanted, but the "column" and "table" references were just regular Python expressions. It's currently deferred indefinitely, as I didn't have any concrete use cases that Guido found sufficiently compelling to make the additional complexity worthwhile. However, given optionally delayed rendering of interpolated strings, PythonQL could be used in the form: result =pyql(i""" (x,y) for x in {range(1,8)} for y in {range(1,7)} if x % 2 == 0 and y % 2 != 0 and x > y """) I personally like this idea (otherwise I wouldn't have written PEP 501 in the first place), and the necessary technical underpinnings to enable it are all largely already in place to support f-strings. If the PEP were revised to show examples of using it to support relatively seamless calling back and forth between Hylang, PythonQL and regular Python code in the same process, that might be intriguing enough to pique Guido's interest (and I'm open to adding co-authors that are interested in pursuing that). Option 3: Go all the way to expanding comprehensions to natively be a full data manipulation DSL. I'm personally not a fan of that approach, as syntax is really hard to search for help on (keywords are better for that than punctuation, but not by much), while methods and functions get to have docstrings. It also means the query language gets tightly coupled to the Python grammar, which not only makes the query language difficult to update, but also makes Python's base syntax harder for new users to learn. By contrast, when DSLs are handled as interpolation templates with delayed rendering, then the rendering function gets to provide runtime documentation, and the definition of the DSL is coupled to the update cycle of the rendering function, *not* that of the Python language definition. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mehaase at gmail.com Sat Mar 25 12:54:08 2017 From: mehaase at gmail.com (Mark E. Haase) Date: Sat, 25 Mar 2017 12:54:08 -0400 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: Hi Pavel, This is a really impressive body of work. I had looked at this project in the past but it is great to get back up to speed and see all the progress made. I use Python + databases almost every day, and the major unanswered question is what benefit does dedicated language syntax have over using a DBAL/ORM with a Builder style API? It obviously has huge costs (as all syntax changes do) but the benefit is not obvious to me: I have never found myself wanting built-in syntax for writing database queries. My second thought is that every database layer I've ever used was unavoidably leaky or incomplete. Database functionality (even if we constrain "database" to mean RDBMS) is too diverse to be completely abstracted away. This is why so many different abstractions already exist, e.g. low-level like DBAPI and high-level like SQL Alchemy. You're not going to find much support for cementing an imperfect abstraction right into the Python grammar. In order to make the abstraction relatively complete, you'd need to almost complete merge ANSI SQL grammar into Python grammar, which sounds terrifying. Third thought: is the implementation of a "Python query language" as generic as the name implies? The docs mention support for document databases, but I can run Redis queries? LDAP queries? DNS queries? > We haven't build a real SQL Database wrapper yet, but in the meanwhile you can use libraries like psycopg2 or SQLAlchemy to get data from the database into an iterator, and then PythonQL can run on top of such iterator. Fourth thought: until PythonQL can abstract over a real database, it's far too early to consider putting it into the language itself. These kinds of "big change" projects typically need to stabilize on their own for a long time before anybody will even consider putting them into the core language. Finally ? to end on a positive note ? the coolest part of this project from my point of view is using SQL as an abstraction over in-memory objects or raw files. I can see how somebody that is comfortable with SQL would prefer this declarative approach. I could see myself using an API like this to search a Pandas dataframe, for example. Cheers, Mark On Fri, Mar 24, 2017 at 11:10 AM, Pavel Velikhov wrote: > Hi folks! > > We started a project to extend Python with a full-blown query language > about a year ago. The project is call PythonQL, the links are given below > in the references section. We have implemented what is kind of an alpha > version now, and gained some experience and insights about why and where > this is really useful. So I?d like to share those with you and gather some > opinions whether you think we should try to include these extensions in the > Python core. > > *Intro* > > What we have done is (mostly) extended Python?s comprehensions with > group by, order by, let and window clauses, which can come in any order, > thus comprehensions become a query language a bit cleaner and more powerful > than SQL. And we added a couple small convenience extensions, like a We > have identified three top motivations for folks to use these extensions: > > *Our Motivations* > > 1. This can become a standard for running queries against database > systems. Instead of learning a large number of different SQL dialects (the > pain point here are libraries of functions and operators that are different > for each vendor), the Python developer needs only to learn PythonQL and he > can query any SQL and NoSQL database. > > 2. A single PythonQL expression can integrate a number of > databases/files/memory structures seamlessly, with the PythonQL optimizer > figuring out which pieces of plans to ship to which databases. This is a > cool virtual database integration story that can be very convenient, > especially now, when a lot of data scientists use Python to wrangle the > data all day long. > > 3. Querying data structures inside Python with the full power of SQL (and > a bit more) is also really convenient on its own. Usually folks that are > well-versed in SQL have to resort to completely different means when they > need to run a query in Python on top of some data structures. > > *Current Status* > > We have PythonQL running, its installed via pip and an encoding hack, that > runs our preprocessor. We currently compile PythonQL into Python using our > executor functions and execute Python subexpressions via eval. We don?t do > any optimization / rewriting of queries into languages of underlying > systems. And the query processor is basic too, with naive implementations > of operators. But we?ve build DBMS systems before, so if there is a good > amount of support for this project, we?ll be able to build a real system > here. > > *Your take on this* > > Extending Python?s grammar is surely a painful thing for the community. > We?re now convinced that it is well worth it, because of all the wonderful > functionality and convenience this extension offers. We?d like to get your > feedback on this and maybe you?ll suggest some next steps for us. > > *References* > > PythonQL GitHub page: https://github.com/pythonql/pythonql > PythonQL Intro and Tutorial (this is all User Documentation we have right > now): https://github.com/pythonql/pythonql/wiki/ > PythonQL-Intro-and-Tutorial > A use-case of querying Event Logs and doing Process Mining with PythonQL: > https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process- > Mining-with-PythonQL > PythonQL demo site: www.pythonql.org > > Best regards, > PythonQL Team > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Mar 25 13:08:55 2017 From: mertz at gnosis.cx (David Mertz) Date: Sat, 25 Mar 2017 10:08:55 -0700 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: I think it's extraordinarily unlikely that a big change in Python syntax to support query syntax will ever happen. Moreover, I would oppose such a change myself. But just a change also really is not necessary. Pandas already abstracts all the things mentioned using only Python methods. It is true that Pandas sometimes does some black magic within those methods to get there; and it also uses somewhat non-Pythonic style of long chains of method calls. But it does everything PythonQL does, as well as much, much more. Pandas builds in DataFrame readers for every data source you are likely to encounter, including leveraging all the abstractions provided by RDBMS drivers, etc. It does groupby, join, etc. See, e.g.: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html Now there's one reasonable objection to Pandas: It doesn't handle larger-than-memory datasets well. I don't see that PythonQL is better in that regard. But there is an easy next step for that larger data. Blaze provides generic interfaces to many, many larger-than-memory data sources. It is largely a subset of the Pandas API, although not precisely that. See, e.g.: http://blaze.readthedocs.io/en/latest/rosetta-sql.html Moreover, within the Blaze "orbit" is Dask. This is a framework for parallel computation, one of whose abstractions is a DataFrame based on Pandas. This gives you 90% of those methods for slicing-and-dicing data that Pandas does, but deals seamlessly with larger-than-memory datasets. See, e.g.: http://dask.pydata.org/en/latest/dataframe.html So I think your burden is even higher than showing the usefulness of PythonQL. You have to show why it's worth adding new syntax to do somewhat LESS than is available in very widely used 3rd party tools that avoid new syntax. On Fri, Mar 24, 2017 at 8:10 AM, Pavel Velikhov wrote: > Hi folks! > > We started a project to extend Python with a full-blown query language > about a year ago. The project is call PythonQL, the links are given below > in the references section. We have implemented what is kind of an alpha > version now, and gained some experience and insights about why and where > this is really useful. So I?d like to share those with you and gather some > opinions whether you think we should try to include these extensions in the > Python core. > > *Intro* > > What we have done is (mostly) extended Python?s comprehensions with > group by, order by, let and window clauses, which can come in any order, > thus comprehensions become a query language a bit cleaner and more powerful > than SQL. And we added a couple small convenience extensions, like a We > have identified three top motivations for folks to use these extensions: > > *Our Motivations* > > 1. This can become a standard for running queries against database > systems. Instead of learning a large number of different SQL dialects (the > pain point here are libraries of functions and operators that are different > for each vendor), the Python developer needs only to learn PythonQL and he > can query any SQL and NoSQL database. > > 2. A single PythonQL expression can integrate a number of > databases/files/memory structures seamlessly, with the PythonQL optimizer > figuring out which pieces of plans to ship to which databases. This is a > cool virtual database integration story that can be very convenient, > especially now, when a lot of data scientists use Python to wrangle the > data all day long. > > 3. Querying data structures inside Python with the full power of SQL (and > a bit more) is also really convenient on its own. Usually folks that are > well-versed in SQL have to resort to completely different means when they > need to run a query in Python on top of some data structures. > > *Current Status* > > We have PythonQL running, its installed via pip and an encoding hack, that > runs our preprocessor. We currently compile PythonQL into Python using our > executor functions and execute Python subexpressions via eval. We don?t do > any optimization / rewriting of queries into languages of underlying > systems. And the query processor is basic too, with naive implementations > of operators. But we?ve build DBMS systems before, so if there is a good > amount of support for this project, we?ll be able to build a real system > here. > > *Your take on this* > > Extending Python?s grammar is surely a painful thing for the community. > We?re now convinced that it is well worth it, because of all the wonderful > functionality and convenience this extension offers. We?d like to get your > feedback on this and maybe you?ll suggest some next steps for us. > > *References* > > PythonQL GitHub page: https://github.com/pythonql/pythonql > PythonQL Intro and Tutorial (this is all User Documentation we have right > now): https://github.com/pythonql/pythonql/wiki/ > PythonQL-Intro-and-Tutorial > A use-case of querying Event Logs and doing Process Mining with PythonQL: > https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process- > Mining-with-PythonQL > PythonQL demo site: www.pythonql.org > > Best regards, > PythonQL Team > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From desmoulinmichel at gmail.com Sat Mar 25 15:43:06 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Sat, 25 Mar 2017 20:43:06 +0100 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: Hello, I've been following PythonQL with interest. I like the clever hack using Python encoding. It's definitely not something I would recommend to do for an inclusion in Python as it hijack the Python encoding method, which prevent your from... well choosing an encoding. And requires to have a file. However, I find the idea great for the demonstration purpose. Like LINQ, the strength of your tool is the integrated syntax. I myself found it annoying to import itertools all the time. I eventually wrote a wrapper so I could use slicing on generators, callable in slicing, etc for this very reason. However, I have good news. When the debate about f-strings was on this list, the concept has been spitted in several parts. The f-string currently implemented in Python 3.6, and an more advanced type of string interpolation: the i-string from PEP 501 (https://www.python.org/dev/peps/pep-0501/) that is still to be implemented. The idea of the i-string was to allow something like this: mycommand = sh(i"cat {filename}") myquery = sql(i"SELECT {column} FROM {table};") myresponse = html(i"{response.body}") Which would then pass an object to sql/sh/html() with the string, the placeholders and the variable context then allow it to do whatever you want. Evaluation of the i-string would of course be lazy. So while I don't thing PythonQL can be integrated in Python the way it is, you may want to champion PEP 501. This way you will be able to provide a PQL hook allowing you to do something like: pql(i"""select (x, sum_y) for x in range(1,8), y in {stuff} where x % 2 == 0 and y % 2 != 0 and x > y group by x let sum_y = sum(y) where sum_y % 2 != 0 """) Granted, this is not as elegant as your DSL, but that would make it easier to adopt anywhere: repl, ipython notebook, files in Python with a different encoding, embeded Python, alternative Python implementations compiled Python, etc. Plus the sooner we start with i-string, the sooner editors will implement syntax highlighting for the popular dialects. This would allow you to spread the popularity of your tool and maybe change the way it's seen on this list. From desmoulinmichel at gmail.com Sat Mar 25 15:49:48 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Sat, 25 Mar 2017 20:49:48 +0100 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: Le 24/03/2017 ? 17:37, Victor Stinner a ?crit : > *If* we change something, I would prefer to modify sys.stdout. The > following issue proposes to add > sys.stdout.set_encoding(errors='replace'): > http://bugs.python.org/issue15216 > > You can already set the PYTHONIOENCODING environment variable to > ":replace" to use "replace" on sys.stdout (and sys.stderr). > > Victor This is not the same. You may want to locally apply "errors=replace" and not the whole program. Indeed, this can silence encoding problems. So I would probably never set in to errors at dev time except for the few places where I know I can explicitly silence errors. I quite like this print(errors="replace|ignore"). This is not going to cause any trouble, and can only help. From tjreedy at udel.edu Sun Mar 26 00:23:08 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 26 Mar 2017 00:23:08 -0400 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> Message-ID: On 3/25/2017 11:40 AM, Kyle Lahnakoski wrote: > > Pavel, > > I like PythonQL. I perform a lot of data transformation, and often find > Python's list comprehensions too limiting; leaving me wishing for > LINQ-like language features. > > As an alternative to extending Python with PythonQL, Terry Reedy > suggested interpreting a DSL string, and Pavel Velikhov alluded to using > magic method tricks found in ORM libraries. I can see how both these are > not satisfactory. > > A third alternative could be to encode the query clauses as JSON > objects. For example: PythonQL version > result = [ select (x, sum_y) > for x in range(1,8), > y in range(1,7) > where x % 2 == 0 and y % 2 != 0 and x > y > group by x > let sum_y = sum(y) > where sum_y % 2 != 0 > ] Someone mentioned the problem of adding multiple new keywords. Even 1 requires a proposal to meet a high bar; I think we average less than 1 new keyword per release in the last 20 years. Searching '\bgroup\b' just in /lib (the 3.6 stdlib on Windows) gets over 300 code hits in about 30 files. I think this makes in ineligible to bere's match.group() accounts for many. 'select' has fair number of code uses also. I also see 'where', 'let', and 'by' in the above. > result = pq([ > {"select":["x", "sum_y"]}, > {"for":{"x": range(1,8), "y": range(1,7)}}, > {"where": lambda x,y: x % 2 == 0 and y % 2 != 0 and x > y}, > {"groupby": "x"}, > {"with":{"sum_y":{"SUM":"y"}}, > {"where": {"neq":[{"mod":["sum_y", 2]}, 0]}} > ]) > > This representation does look a little lispy, and it may resemble > PythonQL's parse tree. I think the benefits are: > > 1) no python language change > 2) easier to parse > 3) better than string-based DSL for catching syntax errors > 4) {"clause": parameters} format is flexible for handling common query > patterns ** > 5) works in javascript too > 6) easy to compose with automation (my favorite) > > It is probably easy for you to see the drawbacks. -- Terry Jan Reedy From victor.stinner at gmail.com Sun Mar 26 04:31:09 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 26 Mar 2017 10:31:09 +0200 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: print(msg) calls sys.stdout.write(msg): write() expects text, not bytes. I dislike the idea of putting encoding options in print. It's too specific. What if tomorrow you replace print() with file.write()? Do you want to add errors there too? No, it's better to write own formatter function as shown in a previous email. Victor Le 25 mars 2017 8:50 PM, "Michel Desmoulin" a ?crit : Le 24/03/2017 ? 17:37, Victor Stinner a ?crit : > *If* we change something, I would prefer to modify sys.stdout. The > following issue proposes to add > sys.stdout.set_encoding(errors='replace'): > http://bugs.python.org/issue15216 > > You can already set the PYTHONIOENCODING environment variable to > ":replace" to use "replace" on sys.stdout (and sys.stderr). > > Victor This is not the same. You may want to locally apply "errors=replace" and not the whole program. Indeed, this can silence encoding problems. So I would probably never set in to errors at dev time except for the few places where I know I can explicitly silence errors. I quite like this print(errors="replace|ignore"). This is not going to cause any trouble, and can only help. _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Sun Mar 26 06:39:01 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sun, 26 Mar 2017 13:39:01 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: Message-ID: <5A0D92A2-AE4C-437B-90B8-48B639488939@icloud.com> > On 25 Mar 2017, at 15:51, Gerald Britton wrote: > > On 25 March 2017 at 11:24, Pavel Velikhov > wrote: > > No, the current solution is temporary because we just don?t have the > > manpower to > > implement the full thing: a real system that will rewrite parts of PythonQL > > queries and > > ship them to underlying databases. We need a real query optimizer and smart > > wrappers > > for this purpose. But we?ll build one of these for demo purposes soon > > (either a Spark > > wrapper or a PostgreSQL wrapper). > One thought, if you're lacking in manpower now, then proposing > inclusion into core Python means that the core dev team will be taking > on an additional chunk of code that is already under-resourced. That > rings alarm bells for me - how would you imagine the work needed to > merge PythonQL into the core Python grammar would be resourced? > I should say that in practice, I think that the solution is relatively > niche, and overlaps quite significantly with existing Python features, > so I don't really see a compelling case for inclusion. The parallel > with C# and LINQ is interesting here - LINQ is a pretty cool > technology, but I don't see it in widespread use in general-purpose C# > projects (disclaimer: I don't get to see much C# code, so my > experience is limited). > > I see lots of C# code, but (thankfully) not so much LINQ to SQL. Yes, it is a cool technology. But I sometimes have a problem with the SQL it generates. Since I'm also a SQL developer, I'm sensitive to how queries are constructed, for performance reasons, as well as how they look, for readability and aesthetic reasons. > > LINQ queries can generate poorly-performing SQL, since LINQ is a basically a translator, but not an AI. As far as appearances go, LINQ queries can look pretty gnarly, especially if they include sub queries or a few joins. That makes it hard for the SQL dev (me!) to read and understand if there are performance problems (which there often are, in my experience) > We want to go beyond being a basic translator. Especially if the common use-case will be integrating multiple databases. We can also introduce decent-looking hints (maybe not always decent looking) to generate better plans. Not sure about asethetics though... > So, I would tend to code the SQL separately and put it in a SQL view, function or stored procedure. I can still parse the results with LINQ (not LINQ to SQL), which is fine. > > For similar reasons, I'm not a huge fan of ORMs either. Probably my bias towards designing the database first and building up queries to meet the business goals before writing a line of Python, C#, or the language de jour. This sounds completely reasonable, but this means you?re tied to a specific DBMS (especially if you?re using a lot of built-in functions that are usually very specific to a database). PythonQL (when it has enough functionality) should give you independence. > > > -- > Gerald Britton, MCSE-DP, MVP > LinkedIn Profile: http://ca.linkedin.com/in/geraldbritton _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Sun Mar 26 06:50:22 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sun, 26 Mar 2017 13:50:22 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: <9D203965-3D42-4E82-B7B9-178B26084B6A@gmail.com> Hi David > On 25 Mar 2017, at 20:08, David Mertz wrote: > > I think it's extraordinarily unlikely that a big change in Python syntax to support query syntax will ever happen. Moreover, I would oppose such a change myself. > > But just a change also really is not necessary. Pandas already abstracts all the things mentioned using only Python methods. It is true that Pandas sometimes does some black magic within those methods to get there; and it also uses somewhat non-Pythonic style of long chains of method calls. But it does everything PythonQL does, as well as much, much more. Pandas builds in DataFrame readers for every data source you are likely to encounter, including leveraging all the abstractions provided by RDBMS drivers, etc. It does groupby, join, etc. > I work daily with pandas, so of course it does have the functionality that PythonQL introduces, its a completely different beast. One of the reasons I started with PythonQL is because pandas is so difficult to master (just like any function-based database API would). The key benefit of PythonQL is that with minimal grammar extensions you get the power of a real query language. So a Python programmer who knows comprehensions well and has a good idea of SQL or other query languages can start writing complex queries right away. With pandas you need to read the docs all the time and complex data transformations become incredibly cryptic. > See, e.g.: > > http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html > > Now there's one reasonable objection to Pandas: It doesn't handle larger-than-memory datasets well. I don't see that PythonQL is better in that regard. But there is an easy next step for that larger data. Blaze provides generic interfaces to many, many larger-than-memory data sources. It is largely a subset of the Pandas API, although not precisely that. See, e.g.: > > http://blaze.readthedocs.io/en/latest/rosetta-sql.html > > Moreover, within the Blaze "orbit" is Dask. This is a framework for parallel computation, one of whose abstractions is a DataFrame based on Pandas. This gives you 90% of those methods for slicing-and-dicing data that Pandas does, but deals seamlessly with larger-than-memory datasets. See, e.g.: > > http://dask.pydata.org/en/latest/dataframe.html > > So I think your burden is even higher than showing the usefulness of PythonQL. You have to show why it's worth adding new syntax to do somewhat LESS than is available in very widely used 3rd party tools that avoid new syntax. > > On Fri, Mar 24, 2017 at 8:10 AM, Pavel Velikhov > wrote: > Hi folks! > > We started a project to extend Python with a full-blown query language about a year ago. The project is call PythonQL, the links are given below in the references section. We have implemented what is kind of an alpha version now, and gained some experience and insights about why and where this is really useful. So I?d like to share those with you and gather some opinions whether you think we should try to include these extensions in the Python core. > > Intro > > What we have done is (mostly) extended Python?s comprehensions with group by, order by, let and window clauses, which can come in any order, thus comprehensions become a query language a bit cleaner and more powerful than SQL. And we added a couple small convenience extensions, like a We have identified three top motivations for folks to use these extensions: > > Our Motivations > > 1. This can become a standard for running queries against database systems. Instead of learning a large number of different SQL dialects (the pain point here are libraries of functions and operators that are different for each vendor), the Python developer needs only to learn PythonQL and he can query any SQL and NoSQL database. > > 2. A single PythonQL expression can integrate a number of databases/files/memory structures seamlessly, with the PythonQL optimizer figuring out which pieces of plans to ship to which databases. This is a cool virtual database integration story that can be very convenient, especially now, when a lot of data scientists use Python to wrangle the data all day long. > > 3. Querying data structures inside Python with the full power of SQL (and a bit more) is also really convenient on its own. Usually folks that are well-versed in SQL have to resort to completely different means when they need to run a query in Python on top of some data structures. > > Current Status > > We have PythonQL running, its installed via pip and an encoding hack, that runs our preprocessor. We currently compile PythonQL into Python using our executor functions and execute Python subexpressions via eval. We don?t do any optimization / rewriting of queries into languages of underlying systems. And the query processor is basic too, with naive implementations of operators. But we?ve build DBMS systems before, so if there is a good amount of support for this project, we?ll be able to build a real system here. > > Your take on this > > Extending Python?s grammar is surely a painful thing for the community. We?re now convinced that it is well worth it, because of all the wonderful functionality and convenience this extension offers. We?d like to get your feedback on this and maybe you?ll suggest some next steps for us. > > References > > PythonQL GitHub page: https://github.com/pythonql/pythonql > PythonQL Intro and Tutorial (this is all User Documentation we have right now): https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial > A use-case of querying Event Logs and doing Process Mining with PythonQL: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL > PythonQL demo site: www.pythonql.org > > Best regards, > PythonQL Team > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Sun Mar 26 07:06:13 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sun, 26 Mar 2017 14:06:13 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: Hi Mark, > On 25 Mar 2017, at 19:54, Mark E. Haase wrote: > > Hi Pavel, > > This is a really impressive body of work. I had looked at this project in the past but it is great to get back up to speed and see all the progress made. > > I use Python + databases almost every day, and the major unanswered question is what benefit does dedicated language syntax have over using a DBAL/ORM with a Builder style API? It obviously has huge costs (as all syntax changes do) but the benefit is not obvious to me: I have never found myself wanting built-in syntax for writing database queries. > We do a lot work with data every day doing data science. We have to use tool like pandas, and they don?t work in a lot of cases and in many cases we end up with very cryptic notebooks that only the authors can work with? I actually use PythonQL in daily work and it simplified a lot of things greatly. The general insight is that a small language can add a lot more value than a huge library, because its easy to combine good ideas in a language. If you look at some examples of complex PythonQL, I think you?ll might change your mind: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL > My second thought is that every database layer I've ever used was unavoidably leaky or incomplete. Database functionality (even if we constrain "database" to mean RDBMS) is too diverse to be completely abstracted away. This is why so many different abstractions already exist, e.g. low-level like DBAPI and high-level like SQL Alchemy. You're not going to find much support for cementing an imperfect abstraction right into the Python grammar. In order to make the abstraction relatively complete, you'd need to almost complete merge ANSI SQL grammar into Python grammar, which sounds terrifying. Don?t see a problem here, expect for a performance problem. I.e. you?ll be able to write queries of any complexity in PythonQL, and most of the work will be pushed into the underlying database. Stuff that can?t be pushed, will be finished up at the Python layer. We don?t have to guarantee the other direction - i.e. if a DBMS has transitive closure for instance, we don?t have to support it PythonQL. > > Third thought: is the implementation of a "Python query language" as generic as the name implies? The docs mention support for document databases, but I can run Redis queries? LDAP queries? DNS queries? We definitely can support Redis. LDAP and DNS - don?t know if we want to go there, I would stop at databases for now. > > > We haven't build a real SQL Database wrapper yet, but in the meanwhile you can use libraries like psycopg2 or SQLAlchemy to get data from the database into an iterator, and then PythonQL can run on top of such iterator. > > Fourth thought: until PythonQL can abstract over a real database, it's far too early to consider putting it into the language itself. These kinds of "big change" projects typically need to stabilize on their own for a long time before anybody will even consider putting them into the core language. We?re definitely at the start of this, because we have huge plans for PythonQL, including a powerful planner/optimizer and wrappers for most popular DBMSs. If we get the support of the Python community though it would help us to move faster for sure. > > Finally ? to end on a positive note ? the coolest part of this project from my point of view is using SQL as an abstraction over in-memory objects or raw files. I can see how somebody that is comfortable with SQL would prefer this declarative approach. I could see myself using an API like this to search a Pandas dataframe, for example. I think if we get this right, we might unlock some cool new usages. I really believe that if we simplify integration of multiple data sources sufficiently, a lot of dirty work of data scientists will become much simpler. > > Cheers, > Mark > > On Fri, Mar 24, 2017 at 11:10 AM, Pavel Velikhov > wrote: > Hi folks! > > We started a project to extend Python with a full-blown query language about a year ago. The project is call PythonQL, the links are given below in the references section. We have implemented what is kind of an alpha version now, and gained some experience and insights about why and where this is really useful. So I?d like to share those with you and gather some opinions whether you think we should try to include these extensions in the Python core. > > Intro > > What we have done is (mostly) extended Python?s comprehensions with group by, order by, let and window clauses, which can come in any order, thus comprehensions become a query language a bit cleaner and more powerful than SQL. And we added a couple small convenience extensions, like a We have identified three top motivations for folks to use these extensions: > > Our Motivations > > 1. This can become a standard for running queries against database systems. Instead of learning a large number of different SQL dialects (the pain point here are libraries of functions and operators that are different for each vendor), the Python developer needs only to learn PythonQL and he can query any SQL and NoSQL database. > > 2. A single PythonQL expression can integrate a number of databases/files/memory structures seamlessly, with the PythonQL optimizer figuring out which pieces of plans to ship to which databases. This is a cool virtual database integration story that can be very convenient, especially now, when a lot of data scientists use Python to wrangle the data all day long. > > 3. Querying data structures inside Python with the full power of SQL (and a bit more) is also really convenient on its own. Usually folks that are well-versed in SQL have to resort to completely different means when they need to run a query in Python on top of some data structures. > > Current Status > > We have PythonQL running, its installed via pip and an encoding hack, that runs our preprocessor. We currently compile PythonQL into Python using our executor functions and execute Python subexpressions via eval. We don?t do any optimization / rewriting of queries into languages of underlying systems. And the query processor is basic too, with naive implementations of operators. But we?ve build DBMS systems before, so if there is a good amount of support for this project, we?ll be able to build a real system here. > > Your take on this > > Extending Python?s grammar is surely a painful thing for the community. We?re now convinced that it is well worth it, because of all the wonderful functionality and convenience this extension offers. We?d like to get your feedback on this and maybe you?ll suggest some next steps for us. > > References > > PythonQL GitHub page: https://github.com/pythonql/pythonql > PythonQL Intro and Tutorial (this is all User Documentation we have right now): https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial > A use-case of querying Event Logs and doing Process Mining with PythonQL: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL > PythonQL demo site: www.pythonql.org > > Best regards, > PythonQL Team > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Sun Mar 26 07:10:54 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sun, 26 Mar 2017 14:10:54 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> Message-ID: <87E120E5-9743-4FCC-91D5-5D3B848E6131@gmail.com> Hi Michel! > On 25 Mar 2017, at 22:43, Michel Desmoulin wrote: > > Hello, > > I've been following PythonQL with interest. I like the clever hack using > Python encoding. It's definitely not something I would recommend to do > for an inclusion in Python as it hijack the Python encoding method, > which prevent your from... well choosing an encoding. And requires to > have a file. This was done as a temporary hack, we definitely want to move away from this at some point. > > However, I find the idea great for the demonstration purpose. Like LINQ, > the strength of your tool is the integrated syntax. I myself found it > annoying to import itertools all the time. I eventually wrote a wrapper > so I could use slicing on generators, callable in slicing, etc for this > very reason. > > However, I have good news. > > When the debate about f-strings was on this list, the concept has been > spitted in several parts. The f-string currently implemented in Python > 3.6, and an more advanced type of string interpolation: the i-string > from PEP 501 (https://www.python.org/dev/peps/pep-0501/) that is still > to be implemented. > > The idea of the i-string was to allow something like this: > > mycommand = sh(i"cat {filename}") > myquery = sql(i"SELECT {column} FROM {table};") > myresponse = html(i"{response.body}") > > Which would then pass an object to sql/sh/html() with the string, the > placeholders and the variable context then allow it to do whatever you > want. Evaluation of the i-string would of course be lazy. > > So while I don't thing PythonQL can be integrated in Python the way it > is, you may want to champion PEP 501. This way you will be able to > provide a PQL hook allowing you to do something like: > > pql(i"""select (x, sum_y) > for x in range(1,8), > y in {stuff} > where x % 2 == 0 and y % 2 != 0 and x > y > group by x > let sum_y = sum(y) > where sum_y % 2 != 0 > """) > > Granted, this is not as elegant as your DSL, but that would make it > easier to adopt anywhere: repl, ipython notebook, files in Python with a > different encoding, embeded Python, alternative Python implementations > compiled Python, etc. > > Plus the sooner we start with i-string, the sooner editors will > implement syntax highlighting for the popular dialects. > > This would allow you to spread the popularity of your tool and maybe > change the way it's seen on this list. Hmm, I don?t quite understand what?s the difference here with regular string and executing them, i.e. I think we can do this right now without i-strings. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From pavel.velikhov at gmail.com Sun Mar 26 07:14:56 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sun, 26 Mar 2017 14:14:56 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> Message-ID: <9FDAF5C1-2871-4AF9-A3D2-0219279F65D1@icloud.com> Terry, > On 26 Mar 2017, at 07:23, Terry Reedy wrote: > > On 3/25/2017 11:40 AM, Kyle Lahnakoski wrote: >> >> Pavel, >> >> I like PythonQL. I perform a lot of data transformation, and often find >> Python's list comprehensions too limiting; leaving me wishing for >> LINQ-like language features. >> >> As an alternative to extending Python with PythonQL, Terry Reedy >> suggested interpreting a DSL string, and Pavel Velikhov alluded to using >> magic method tricks found in ORM libraries. I can see how both these are >> not satisfactory. >> >> A third alternative could be to encode the query clauses as JSON >> objects. For example: > > PythonQL version > >> result = [ select (x, sum_y) >> for x in range(1,8), >> y in range(1,7) >> where x % 2 == 0 and y % 2 != 0 and x > y >> group by x >> let sum_y = sum(y) >> where sum_y % 2 != 0 >> ] > > Someone mentioned the problem of adding multiple new keywords. Even 1 requires a proposal to meet a high bar; I think we average less than 1 new keyword per release in the last 20 years. > > Searching '\bgroup\b' just in /lib (the 3.6 stdlib on Windows) gets over 300 code hits in about 30 files. I think this makes in ineligible to bere's match.group() accounts for many. 'select' has fair number of code uses also. I also see 'where', 'let', and 'by' in the above. Yes, we add quite a few keywords. If you look at the window clause we have, there are even more keywords there. This is definitely a huge concern and the main reason that the community would oppose the change in my view. I?m not too experienced with Python parser, but could we make all these keywords not be real keywords (only interpreted inside comprehension as keywords, not breaking any other code)? > >> result = pq([ >> {"select":["x", "sum_y"]}, >> {"for":{"x": range(1,8), "y": range(1,7)}}, >> {"where": lambda x,y: x % 2 == 0 and y % 2 != 0 and x > y}, >> {"groupby": "x"}, >> {"with":{"sum_y":{"SUM":"y"}}, >> {"where": {"neq":[{"mod":["sum_y", 2]}, 0]}} >> ]) >> >> This representation does look a little lispy, and it may resemble >> PythonQL's parse tree. I think the benefits are: >> >> 1) no python language change >> 2) easier to parse >> 3) better than string-based DSL for catching syntax errors >> 4) {"clause": parameters} format is flexible for handling common query >> patterns ** >> 5) works in javascript too >> 6) easy to compose with automation (my favorite) >> >> It is probably easy for you to see the drawbacks. > > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Sun Mar 26 07:40:41 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sun, 26 Mar 2017 14:40:41 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: Message-ID: <2BF89181-9E13-47E7-9759-086A0617EA70@gmail.com> Hi Nick, Thanks for such a detailed response! > On 25 Mar 2017, at 19:40, Nick Coghlan wrote: > > First off, I think PythonQL (and PonyORM before it) is a very > interesting piece of technology. However, I think some of the answers > so far suggest we may need to discuss a couple of meta-issues around > target audiences and available technical options before continuing on. > > I'm quoting Gerald's post here because it highlights the "target > audience" problem, but my comments apply to the thread generally. > > On 25 March 2017 at 22:51, Gerald Britton wrote: >> >> I see lots of C# code, but (thankfully) not so much LINQ to SQL. Yes, it is a cool technology. But I sometimes have a problem with the SQL it generates. Since I'm also a SQL developer, I'm sensitive to how queries are constructed, for performance reasons, as well as how they look, for readability and aesthetic reasons. >> >> LINQ queries can generate poorly-performing SQL, since LINQ is a basically a translator, but not an AI. As far as appearances go, LINQ queries can look pretty gnarly, especially if they include sub queries or a few joins. That makes it hard for the SQL dev (me!) to read and understand if there are performance problems (which there often are, in my experience) >> >> So, I would tend to code the SQL separately and put it in a SQL view, function or stored procedure. I can still parse the results with LINQ (not LINQ to SQL), which is fine. >> >> For similar reasons, I'm not a huge fan of ORMs either. Probably my bias towards designing the database first and building up queries to meet the business goals before writing a line of Python, C#, or the language de jour. > > > Right, the target audience here *isn't* folks who already know how to > construct their own relational queries in SQL, and it definitely isn't > folks that know how to tweak their queries to get optimal performance > from the specific database they're using. Rather, it's folks that > already know Python's comprehensions, and perhaps some of the > itertools features, and helping to provide them with a smoother > on-ramp into the world of relational data processing. Actually I myself am a user of PythonQL, even though I?m an SQL expert. I work in data science, so I do a lot of ad-hoc querying and we always get some new datasets we need to check out and work with. Some things like nested data models are also much better handled by PythonQL, and data like JSON or XML will also be easier to handle. I see even more use-cases coming up, once we get further with smart database wrappers in PythonQL. > > There's no question that folks dealing with sufficiently large data > sets with sufficiently stringent performance requirements are > eventually going to want to reach for handcrafted SQL or a distributed > computation framework like dask, but that's not really any different > from our standard position that when folks are attempting to optimise > a hot loop, they're eventually going to have to switch to something > that can eliminate the interpreter's default runtime object management > overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an > extension module in a different language entirely). It isn't an > argument against making it easier for folks to postpone the point > where they find it necessary to reach for the "something else" that > takes them beyond Python's default capabilities. Don?t know, for example one of the wrappers is going to be an Apache Spark wrappers, so you could quickly hack up a PythonQL query that would be run on a distributed platform. > > However, at the same time, PythonQL *is* a DSL for data manipulation > operations, and map and filter are far and away the most common of > those. Even reduce, which was previously a builtin, was pushed into > functools for Python 3.0, with the preferred alternative being to just > write a suitably named function that accepts an iterable and returns a > single value. And while Python is a very popular tool for data > manipulation, it would be a big stretch to assume that that was it's > primary use case in all contexts. > > So it makes sense to review some of the technical options that are > available to help make projects like PythonQL more maintainable, > without necessarily gating improvements to them on the relatively slow > update and rollout cycle of new Python versions. > > = Option 1 = > > Fully commit to the model of allowing alternate syntactic dialects to > run atop Python interpreters. In Hylang and PythonQL we have at least > two genuinely interesting examples of that working through the text > encoding system, as well as other examples like Cython that work > through the extension module system. > > So that's an opportunity to take this from "Possible, but a bit hacky" > to "Pluggable source code translation is supported at all levels of > the interpreter, including debugger source maps, etc" (perhaps by > borrowing ideas from other ecosytems like Java, JavaScript, and .NET, > where this kind of thing is already a lot more common. > > The downside of this approach is that actually making it happen would > be getting pretty far afield from the original PythonQL goal of > "provide nicer data manipulation abstractions in Python", and it > wouldn't actually deliver anything new that can't already be done with > existing import and codec system features. This would be great anyways, if we could rely on some preprocessor directive, instead of hacking encodings, this could be nice. > > = Option 2 = > > Back when f-strings were added for 3.6, I wrote PEP 501 to generalise > the idea as "i-strings": exposing the intermediate interpolated form > of f-strings, such that you could write code like `myquery = > sql(i"SELECT {column} FROM {table};")` where the "sql" function > received an "InterpolationTemplate" object that it could render > however it wanted, but the "column" and "table" references were just > regular Python expressions. > > It's currently deferred indefinitely, as I didn't have any concrete > use cases that Guido found sufficiently compelling to make the > additional complexity worthwhile. However, given optionally delayed > rendering of interpolated strings, PythonQL could be used in the form: > > result =pyql(i""" > (x,y) > for x in {range(1,8)} > for y in {range(1,7)} > if x % 2 == 0 and > y % 2 != 0 and > x > y > """) > > I personally like this idea (otherwise I wouldn't have written PEP 501 > in the first place), and the necessary technical underpinnings to > enable it are all largely already in place to support f-strings. If > the PEP were revised to show examples of using it to support > relatively seamless calling back and forth between Hylang, PythonQL > and regular Python code in the same process, that might be intriguing > enough to pique Guido's interest (and I'm open to adding co-authors > that are interested in pursuing that). What would be the difference between this and just executing a PythonQL string for us, getting local and global variables into PythonQL scope? > > Option 3: > > Go all the way to expanding comprehensions to natively be a full data > manipulation DSL. > > I'm personally not a fan of that approach, as syntax is really hard to > search for help on (keywords are better for that than punctuation, but > not by much), while methods and functions get to have docstrings. It > also means the query language gets tightly coupled to the Python > grammar, which not only makes the query language difficult to update, > but also makes Python's base syntax harder for new users to learn. > > By contrast, when DSLs are handled as interpolation templates with > delayed rendering, then the rendering function gets to provide runtime > documentation, and the definition of the DSL is coupled to the update > cycle of the rendering function, *not* that of the Python language > definition. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ncoghlan at gmail.com Sun Mar 26 10:07:15 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Mar 2017 00:07:15 +1000 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: On 26 March 2017 at 18:31, Victor Stinner wrote: > print(msg) calls sys.stdout.write(msg): write() expects text, not bytes. I > dislike the idea of putting encoding options in print. It's too specific. > What if tomorrow you replace print() with file.write()? Do you want to add > errors there too? > > No, it's better to write own formatter function as shown in a previous > email. While I agree with that, folks that are thinking in terms of errors handlers for str.encode may not immediately jump to using the `ascii()` builtin or the "%a" or "!a" format specifiers, and if you don't use those existing tools, you have the hassle of deciding where to put your custom helper function. Perhaps it would be worth noting in the table of error handlers at https://docs.python.org/3/library/codecs.html#error-handlers that backslashreplace is used by the `ascii()` builtin and the associated format specifiers, as well as noting the format specifiers in the documentation of the builtin function? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rymg19 at gmail.com Sun Mar 26 10:40:44 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 26 Mar 2017 09:40:44 -0500 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: FWIW, using the ascii function does have the problem that Unicose characters will be escaped, even if the terminal could have handled them perfectly fine. -- Ryan (????) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else http://refi64.com On Mar 26, 2017 9:07 AM, "Nick Coghlan" wrote: > On 26 March 2017 at 18:31, Victor Stinner > wrote: > > print(msg) calls sys.stdout.write(msg): write() expects text, not bytes. > I > > dislike the idea of putting encoding options in print. It's too specific. > > What if tomorrow you replace print() with file.write()? Do you want to > add > > errors there too? > > > > No, it's better to write own formatter function as shown in a previous > > email. > > While I agree with that, folks that are thinking in terms of errors > handlers for str.encode may not immediately jump to using the > `ascii()` builtin or the "%a" or "!a" format specifiers, and if you > don't use those existing tools, you have the hassle of deciding where > to put your custom helper function. > > Perhaps it would be worth noting in the table of error handlers at > https://docs.python.org/3/library/codecs.html#error-handlers that > backslashreplace is used by the `ascii()` builtin and the associated > format specifiers, as well as noting the format specifiers in the > documentation of the builtin function? > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Mar 26 11:02:09 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Mar 2017 01:02:09 +1000 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <2BF89181-9E13-47E7-9759-086A0617EA70@gmail.com> References: <2BF89181-9E13-47E7-9759-086A0617EA70@gmail.com> Message-ID: On 26 March 2017 at 21:40, Pavel Velikhov wrote: > On 25 Mar 2017, at 19:40, Nick Coghlan wrote: >> Right, the target audience here *isn't* folks who already know how to >> construct their own relational queries in SQL, and it definitely isn't >> folks that know how to tweak their queries to get optimal performance >> from the specific database they're using. Rather, it's folks that >> already know Python's comprehensions, and perhaps some of the >> itertools features, and helping to provide them with a smoother >> on-ramp into the world of relational data processing. > > > Actually I myself am a user of PythonQL, even though I?m an SQL expert. I work in data science, so > I do a lot of ad-hoc querying and we always get some new datasets we need to check out and work with. > Some things like nested data models are also much better handled by PythonQL, and data like > JSON or XML will also be easier to handle. So perhaps a better way of framing it would be to say that PythonQL aims to provide a middle ground between interfaces that are fully in "Python mode" (e.g ORMs, pandas DataFrames), where the primary interface is methods-on-objects, and those that are fully in "data manipulation mode" (e.g. raw SQL, lower level XML and JSON APIs). At the Python level, success for PythonQL would look like people being able to seamlessly transfer their data manipulation skills from a Django ORM project to an SQL Alchemy project to a pandas analysis project to a distributed data analysis project in dask, without their data manipulation code really having to change - only the backing data structures and the runtime performance characteristics would differ. At the data manipulation layer, success for PythonQL would look like people being able to easily get "good enough" performance for one-off scripts, regardless of the backing data store, with closer attention to detail only being needed for genuinely large data sets (where efficiency matters even for one-off analyses), or for frequently repeated operations (where wasted CPU hours show up as increased infrastructure expenses). >> There's no question that folks dealing with sufficiently large data >> sets with sufficiently stringent performance requirements are >> eventually going to want to reach for handcrafted SQL or a distributed >> computation framework like dask, but that's not really any different >> from our standard position that when folks are attempting to optimise >> a hot loop, they're eventually going to have to switch to something >> that can eliminate the interpreter's default runtime object management >> overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an >> extension module in a different language entirely). It isn't an >> argument against making it easier for folks to postpone the point >> where they find it necessary to reach for the "something else" that >> takes them beyond Python's default capabilities. > > Don?t know, for example one of the wrappers is going to be an Apache Spark > wrappers, so you could quickly hack up a PythonQL query that would be run > on a distributed platform. Right, I meant this in the same sense that folks using an ORM like SQL Alchemy may eventually hit a point where rather than trying to convince the ORM to emit the SQL they want to run, it's easier to just bypass the ORM layer and write the exact SQL they want. It's worthwhile attempting to reduce the number of cases where folks feel obliged to do that, but at the same time, abstraction layers need to hide at least some lower level details if they're going to actually work properly. >> = Option 1 = >> >> Fully commit to the model of allowing alternate syntactic dialects to >> run atop Python interpreters. In Hylang and PythonQL we have at least >> two genuinely interesting examples of that working through the text >> encoding system, as well as other examples like Cython that work >> through the extension module system. >> >> So that's an opportunity to take this from "Possible, but a bit hacky" >> to "Pluggable source code translation is supported at all levels of >> the interpreter, including debugger source maps, etc" (perhaps by >> borrowing ideas from other ecosytems like Java, JavaScript, and .NET, >> where this kind of thing is already a lot more common. >> >> The downside of this approach is that actually making it happen would >> be getting pretty far afield from the original PythonQL goal of >> "provide nicer data manipulation abstractions in Python", and it >> wouldn't actually deliver anything new that can't already be done with >> existing import and codec system features. > > This would be great anyways, if we could rely on some preprocessor directive, > instead of hacking encodings, this could be nice. Victor Stinner wrote up some ideas about that in PEP 511: https://www.python.org/dev/peps/pep-0511/ Preprocessing is one of the specific uses cases considered: https://www.python.org/dev/peps/pep-0511/#usage-2-preprocessor >> = Option 2 = >> >> ... given optionally delayed >> rendering of interpolated strings, PythonQL could be used in the form: >> >> result =pyql(i""" >> (x,y) >> for x in {range(1,8)} >> for y in {range(1,7)} >> if x % 2 == 0 and >> y % 2 != 0 and >> x > y >> """) >> >> I personally like this idea (otherwise I wouldn't have written PEP 501 >> in the first place), and the necessary technical underpinnings to >> enable it are all largely already in place to support f-strings. If >> the PEP were revised to show examples of using it to support >> relatively seamless calling back and forth between Hylang, PythonQL >> and regular Python code in the same process, that might be intriguing >> enough to pique Guido's interest (and I'm open to adding co-authors >> that are interested in pursuing that). > > What would be the difference between this and just executing a PythonQL > string for us, getting local and global variables into PythonQL scope? The big new technical capability that f-strings introduced is that the compiler can see the variable references in the embedded expressions, so f-strings "just work" with closure references, whereas passing locals() and globals() explicitly is: 1. slow (since you have to generate a full locals dict); 2. incompatible with the use of closure variables (since they're not visible in either locals() *or* globals()) The i-strings concept takes that closure-compatible interpolation capability and separates it from the str.format based rendering step. >From a speed perspective, the interpolation aspects of this approach are so efficient they rival simple string concatenation: $ python -m perf timeit -s 'first = "Hello"; second = " World!"' 'first + second' ..................... Mean +- std dev: 71.7 ns +- 2.1 ns $ python -m perf timeit -s 'first = "Hello"; second = " World!"' 'f"{first}{second}"' ..................... Mean +- std dev: 77.8 ns +- 2.5 ns Something like pyql that did more than just concatenate the text sections with the text values of the embedded expressions would still need some form of regex-style caching strategy to avoid parsing the same query string multiple times, but the Python interpreter would handle the task of breaking up the string into the text sections and the interpolated Python expressions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pavel.velikhov at gmail.com Sun Mar 26 12:31:57 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Sun, 26 Mar 2017 19:31:57 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <2BF89181-9E13-47E7-9759-086A0617EA70@gmail.com> Message-ID: <36CB6C2D-18E9-44CD-BD7C-21149C772E82@gmail.com> Hi Nick! > On 26 Mar 2017, at 18:02, Nick Coghlan wrote: > > On 26 March 2017 at 21:40, Pavel Velikhov wrote: >> On 25 Mar 2017, at 19:40, Nick Coghlan wrote: >>> Right, the target audience here *isn't* folks who already know how to >>> construct their own relational queries in SQL, and it definitely isn't >>> folks that know how to tweak their queries to get optimal performance >>> from the specific database they're using. Rather, it's folks that >>> already know Python's comprehensions, and perhaps some of the >>> itertools features, and helping to provide them with a smoother >>> on-ramp into the world of relational data processing. >> >> >> Actually I myself am a user of PythonQL, even though I?m an SQL expert. I work in data science, so >> I do a lot of ad-hoc querying and we always get some new datasets we need to check out and work with. >> Some things like nested data models are also much better handled by PythonQL, and data like >> JSON or XML will also be easier to handle. > > So perhaps a better way of framing it would be to say that PythonQL > aims to provide a middle ground between interfaces that are fully in > "Python mode" (e.g ORMs, pandas DataFrames), where the primary > interface is methods-on-objects, and those that are fully in "data > manipulation mode" (e.g. raw SQL, lower level XML and JSON APIs). > > At the Python level, success for PythonQL would look like people being > able to seamlessly transfer their data manipulation skills from a > Django ORM project to an SQL Alchemy project to a pandas analysis > project to a distributed data analysis project in dask, without their > data manipulation code really having to change - only the backing data > structures and the runtime performance characteristics would differ. > > At the data manipulation layer, success for PythonQL would look like > people being able to easily get "good enough" performance for one-off > scripts, regardless of the backing data store, with closer attention > to detail only being needed for genuinely large data sets (where > efficiency matters even for one-off analyses), or for frequently > repeated operations (where wasted CPU hours show up as increased > infrastructure expenses). Yes, more in this line. It is possible for us to provide decent-looking hints for query optimization and we are planning a sophisticated optimizer in the future, but especially in the beginning of the project this sounds quite fair. > >>> There's no question that folks dealing with sufficiently large data >>> sets with sufficiently stringent performance requirements are >>> eventually going to want to reach for handcrafted SQL or a distributed >>> computation framework like dask, but that's not really any different >>> from our standard position that when folks are attempting to optimise >>> a hot loop, they're eventually going to have to switch to something >>> that can eliminate the interpreter's default runtime object management >>> overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an >>> extension module in a different language entirely). It isn't an >>> argument against making it easier for folks to postpone the point >>> where they find it necessary to reach for the "something else" that >>> takes them beyond Python's default capabilities. >> >> Don?t know, for example one of the wrappers is going to be an Apache Spark >> wrappers, so you could quickly hack up a PythonQL query that would be run >> on a distributed platform. > > Right, I meant this in the same sense that folks using an ORM like SQL > Alchemy may eventually hit a point where rather than trying to > convince the ORM to emit the SQL they want to run, it's easier to just > bypass the ORM layer and write the exact SQL they want. > > It's worthwhile attempting to reduce the number of cases where folks > feel obliged to do that, but at the same time, abstraction layers need > to hide at least some lower level details if they're going to actually > work properly. > >>> = Option 1 = >>> >>> Fully commit to the model of allowing alternate syntactic dialects to >>> run atop Python interpreters. In Hylang and PythonQL we have at least >>> two genuinely interesting examples of that working through the text >>> encoding system, as well as other examples like Cython that work >>> through the extension module system. >>> >>> So that's an opportunity to take this from "Possible, but a bit hacky" >>> to "Pluggable source code translation is supported at all levels of >>> the interpreter, including debugger source maps, etc" (perhaps by >>> borrowing ideas from other ecosytems like Java, JavaScript, and .NET, >>> where this kind of thing is already a lot more common. >>> >>> The downside of this approach is that actually making it happen would >>> be getting pretty far afield from the original PythonQL goal of >>> "provide nicer data manipulation abstractions in Python", and it >>> wouldn't actually deliver anything new that can't already be done with >>> existing import and codec system features. >> >> This would be great anyways, if we could rely on some preprocessor directive, >> instead of hacking encodings, this could be nice. > > Victor Stinner wrote up some ideas about that in PEP 511: > https://www.python.org/dev/peps/pep-0511/ > > Preprocessing is one of the specific uses cases considered: > https://www.python.org/dev/peps/pep-0511/#usage-2-preprocessor > >>> = Option 2 = >>> >>> ... given optionally delayed >>> rendering of interpolated strings, PythonQL could be used in the form: >>> >>> result =pyql(i""" >>> (x,y) >>> for x in {range(1,8)} >>> for y in {range(1,7)} >>> if x % 2 == 0 and >>> y % 2 != 0 and >>> x > y >>> """) >>> >>> I personally like this idea (otherwise I wouldn't have written PEP 501 >>> in the first place), and the necessary technical underpinnings to >>> enable it are all largely already in place to support f-strings. If >>> the PEP were revised to show examples of using it to support >>> relatively seamless calling back and forth between Hylang, PythonQL >>> and regular Python code in the same process, that might be intriguing >>> enough to pique Guido's interest (and I'm open to adding co-authors >>> that are interested in pursuing that). >> >> What would be the difference between this and just executing a PythonQL >> string for us, getting local and global variables into PythonQL scope? > > The big new technical capability that f-strings introduced is that the > compiler can see the variable references in the embedded expressions, > so f-strings "just work" with closure references, whereas passing > locals() and globals() explicitly is: > > 1. slow (since you have to generate a full locals dict); > 2. incompatible with the use of closure variables (since they're not > visible in either locals() *or* globals()) > > The i-strings concept takes that closure-compatible interpolation > capability and separates it from the str.format based rendering step. > > From a speed perspective, the interpolation aspects of this approach > are so efficient they rival simple string concatenation: > > $ python -m perf timeit -s 'first = "Hello"; second = " World!"' > 'first + second' > ..................... > Mean +- std dev: 71.7 ns +- 2.1 ns > > $ python -m perf timeit -s 'first = "Hello"; second = " World!"' > 'f"{first}{second}"' > ..................... > Mean +- std dev: 77.8 ns +- 2.5 ns > > Something like pyql that did more than just concatenate the text > sections with the text values of the embedded expressions would still > need some form of regex-style caching strategy to avoid parsing the > same query string multiple times, but the Python interpreter would > handle the task of breaking up the string into the text sections and > the interpolated Python expressions. Thanks, will start following this proposal! > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From desmoulinmichel at gmail.com Sun Mar 26 14:22:18 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Sun, 26 Mar 2017 20:22:18 +0200 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: <6a918709-a833-6681-03d5-3d6abe42eb3c@gmail.com> Le 26/03/2017 ? 10:31, Victor Stinner a ?crit : > print(msg) calls sys.stdout.write(msg): write() expects text, not bytes. What you are saying right now is that the API is not granular enough to just add a parameter. Not that it can't be done. It just mean we need to expose stdout.write() encoding behavior. > I dislike the idea of putting encoding options in print. It's too > specific. What if tomorrow you replace print() with file.write()? Do you > want to add errors there too? You would have to rewrite all your calls anyway, because print() call str() on things and accept already many parameters while file.write() doesn't. > > No, it's better to write own formatter function as shown in a previous > email. print(encoding) is short, easy to use, unobtrusive and will be used ponctually. How is that using your own formatter function better ? From rosuav at gmail.com Sun Mar 26 14:42:06 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 27 Mar 2017 05:42:06 +1100 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: <6a918709-a833-6681-03d5-3d6abe42eb3c@gmail.com> References: <6a918709-a833-6681-03d5-3d6abe42eb3c@gmail.com> Message-ID: On Mon, Mar 27, 2017 at 5:22 AM, Michel Desmoulin wrote: > > > Le 26/03/2017 ? 10:31, Victor Stinner a ?crit : >> print(msg) calls sys.stdout.write(msg): write() expects text, not bytes. > > What you are saying right now is that the API is not granular enough to > just add a parameter. Not that it can't be done. It just mean we need to > expose stdout.write() encoding behavior. > >> I dislike the idea of putting encoding options in print. It's too >> specific. What if tomorrow you replace print() with file.write()? Do you >> want to add errors there too? > > You would have to rewrite all your calls anyway, because print() call > str() on things and accept already many parameters while file.write() > doesn't. You can easily make a wrapper around print(), though. For example, suppose you want a timestamped log file as well as the console: from builtins import print as pront # mess with people @functools.wraps(pront) def print(*a, **kw): if "file" not in kw: logging.info(kw.get("sep", " ").join(a)) return pront(*a, **kw) Now what happens if you add the errors handler? Does this function need to handle that somehow? ChrisA From desmoulinmichel at gmail.com Sun Mar 26 14:45:48 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Sun, 26 Mar 2017 20:45:48 +0200 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: <6a918709-a833-6681-03d5-3d6abe42eb3c@gmail.com> Message-ID: Yes Python is turing complete, there is always a solution to everything. You can also do decorators with func = wrapper(func) instead of @wrapper, no need for a new syntax. Le 26/03/2017 ? 20:42, Chris Angelico a ?crit : > On Mon, Mar 27, 2017 at 5:22 AM, Michel Desmoulin > wrote: >> >> >> Le 26/03/2017 ? 10:31, Victor Stinner a ?crit : >>> print(msg) calls sys.stdout.write(msg): write() expects text, not bytes. >> >> What you are saying right now is that the API is not granular enough to >> just add a parameter. Not that it can't be done. It just mean we need to >> expose stdout.write() encoding behavior. >> >>> I dislike the idea of putting encoding options in print. It's too >>> specific. What if tomorrow you replace print() with file.write()? Do you >>> want to add errors there too? >> >> You would have to rewrite all your calls anyway, because print() call >> str() on things and accept already many parameters while file.write() >> doesn't. > > You can easily make a wrapper around print(), though. For example, > suppose you want a timestamped log file as well as the console: > > from builtins import print as pront # mess with people > @functools.wraps(pront) > def print(*a, **kw): > if "file" not in kw: > logging.info(kw.get("sep", " ").join(a)) > return pront(*a, **kw) > > Now what happens if you add the errors handler? Does this function > need to handle that somehow? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From 14bscsfasim at seecs.edu.pk Sun Mar 26 15:08:21 2017 From: 14bscsfasim at seecs.edu.pk (Faaiz Asim) Date: Mon, 27 Mar 2017 00:08:21 +0500 Subject: [Python-ideas] IDEA Message-ID: Hi, I am currently enrolled in computer science program and am currently in 6th semester. I am comfortable using c++,java,python in general. I know this is a little late for proposing an idea but i was busy in exams. Besides i wanted to get into something where i was determined to contribute to the society. It took a little while but i found this ultimately. I think i have a good idea for a project that might further facilitate the use of python. My experience with python suggests that python is one the best all rounders in market. Its a good and easy head start for newbies and a powerful tool in good hands. Although i haven't used any IDE for python but i have serious complaints against lack of good code editors and python ides in general. I was suggested to use atom from my friends but i preferred rolling back to plain text editor for the trouble it cost me to shift to atom. Not degrading atom or python ides but lack of popularity of any pyhton ide (at least best to my knowledge) suggests a need for improvement in current ides. I am aware of the default IDE (IDLE) which ships with the default python package but I find it unsatisfactory in terms of debugging, aesthetics and usability etc. The main idea is to improve it or rebase it from the grounds up, and making it capable enough to ship with the python package. (PS. i would love to develop a new IDE of my own even if that is besides GSoc ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sun Mar 26 16:21:33 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 26 Mar 2017 15:21:33 -0500 Subject: [Python-ideas] IDEA In-Reply-To: References: Message-ID: There are quite a few Python IDEs, like PyCharm, Ninja, Spyder, PyDev, and more. In addition, I would say that almost every currently existent text editor has at least *some* Python support (I personally use Howl, though I'll admit I'm rather biased, being part of the development team and all... ;) To top things off, there have also been several efforts to improve IDLE: - http://www.tkdocs.com/tutorial/idle.html - http://idlex.sourceforge.net/ FWIW writing a good, stable, usable IDE is hard, too... On Sun, Mar 26, 2017 at 2:08 PM, Faaiz Asim via Python-ideas < python-ideas at python.org> wrote: > Hi, > I am currently enrolled in computer science program and am currently in > 6th semester. I am comfortable using c++,java,python in general. I know > this is a little late for proposing an idea but i was busy in exams. > Besides i wanted to get into something where i was determined to contribute > to the society. It took a little while but i found this ultimately. I think > i have a good idea for a project that might further facilitate the use of > python. > My experience with python suggests that python is one the best all > rounders in market. Its a good and easy head start for newbies and a > powerful tool in good hands. > > Although i haven't used any IDE for python but i have serious complaints > against lack of good code editors and python ides in general. I was > suggested to use atom from my friends but i preferred rolling back to > plain text editor for the trouble it cost me to shift to atom. Not > degrading atom or python ides but lack of popularity of any pyhton ide (at > least best to my knowledge) suggests a need for improvement in current > ides. > I am aware of the default IDE (IDLE) which ships with the default python > package but I find it unsatisfactory in terms of debugging, aesthetics and > usability etc. The main idea is to improve it or rebase it from the grounds > up, and making it capable enough to ship with the python package. > (PS. i would love to develop a new IDE of my own even if that is > besides GSoc ) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Ryan (????) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else http://refi64.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Mar 26 19:21:53 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 26 Mar 2017 19:21:53 -0400 Subject: [Python-ideas] IDEA In-Reply-To: References: Message-ID: On 3/26/2017 3:08 PM, Faaiz Asim via Python-ideas wrote: To add to what I said on core_mentorship list, you should read https://docs.python.org/devguide/ keeping in mind that some details are in flux due to the transition from hg to git and github. > I am aware of the default IDE (IDLE) which ships with the default python > package but I find it unsatisfactory in terms of debugging, aesthetics > and usability etc. The main idea is to improve it What specific ideas for improvement do you have that are not already mentioned in one of the existing issues on bugs.python.org? Search on component 'IDLE'. https://bugs.python.org/issue?%40search_text=&ignore=file%3Acontent&title=&%40columns=title&id=&%40columns=id&stage=&creation=&creator=&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=6&versions=&dependencies=&assignee=&keywords=&priority=&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40group=&%40pagesize=50&%40startwith=0&%40sortdir=on&%40action=search > or rebase it from the grounds up, If you mean using a GUI framework other than tkinter, not an option as long as tkinter is the GUI delivered with python. > and making it capable enough to ship with the python package. It is intentionally focused on beginners, but is capable enough that I use it to develop patches for cpython, including IDLE. -- Terry Jan Reedy From tjreedy at udel.edu Sun Mar 26 21:26:40 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 26 Mar 2017 21:26:40 -0400 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <9FDAF5C1-2871-4AF9-A3D2-0219279F65D1@icloud.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> <9FDAF5C1-2871-4AF9-A3D2-0219279F65D1@icloud.com> Message-ID: On 3/26/2017 7:14 AM, Pavel Velikhov wrote: > On 26 Mar 2017, at 07:23, Terry Reedy >> Someone mentioned the problem of adding multiple new keywords. Even 1 >> requires a proposal to meet a high bar; I think we average less than 1 >> new keyword per release in the last 20 years. >> >> Searching '\bgroup\b' just in /lib (the 3.6 stdlib on Windows) gets >> over 300 code hits in about 30 files. I think this makes in >> ineligible to bere's match.group() accounts for many. 'select' has >> fair number of code uses also. I also see 'where', 'let', and 'by' in >> the above. > > Yes, we add quite a few keywords. If you look at the window clause we > have, there are even more keywords there. > This is definitely a huge concern and the main reason that the community > would oppose the change in my view. > > I?m not too experienced with Python parser, but could we make all these > keywords not be real keywords (only interpreted > inside comprehension as keywords, not breaking any other code)? It might be possible (or not!) to make the clause-heading words like 'where' or 'groupby' (this would have to be one word) recognized as special only in the context of starting a new comprehension clause. The precedents for 'keyword in context' are 'as', 'async', and 'await'. But these were temporary and a nuisance (both to code and for syntax highlighting) and I would not be in favor of repeating for this case. For direct integration with Python, I think you should work on and promote a more generic approach as Nick has suggested. Or work on a 3rd party environment that is not constrained by core python. Or you could consider making use of IDLE; it would be trivial to run code extracted from a text widget through a preprocessor before submitting it to compile(). -- Terry Jan Reedy From rosuav at gmail.com Sun Mar 26 22:40:09 2017 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 27 Mar 2017 13:40:09 +1100 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> <9FDAF5C1-2871-4AF9-A3D2-0219279F65D1@icloud.com> Message-ID: On Mon, Mar 27, 2017 at 12:26 PM, Terry Reedy wrote: > It might be possible (or not!) to make the clause-heading words like 'where' > or 'groupby' (this would have to be one word) recognized as special only in > the context of starting a new comprehension clause. The precedents for > 'keyword in context' are 'as', 'async', and 'await'. But these were > temporary and a nuisance (both to code and for syntax highlighting) and I > would not be in favor of repeating for this case. Apologies if it's already been mentioned, but is MacroPy able to do this without introducing actual language keywords? ChrisA From wes.turner at gmail.com Sun Mar 26 22:50:32 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 26 Mar 2017 21:50:32 -0500 Subject: [Python-ideas] IDEA In-Reply-To: References: Message-ID: On Sun, Mar 26, 2017 at 2:08 PM, Faaiz Asim via Python-ideas < python-ideas at python.org> wrote: > Hi, > I am currently enrolled in computer science program and am currently in > 6th semester. I am comfortable using c++,java,python in general. I know > this is a little late for proposing an idea but i was busy in exams. > Besides i wanted to get into something where i was determined to contribute > to the society. It took a little while but i found this ultimately. I think > i have a good idea for a project that might further facilitate the use of > python. > My experience with python suggests that python is one the best all > rounders in market. Its a good and easy head start for newbies and a > powerful tool in good hands. > Wikipedia describes Python as a general-purpose, multi-paradigm language: - https://en.wikipedia.org/wiki/Python_(programming_language) - https://github.com/bayandin/awesome-awesomeness - https://github.com/vinta/awesome-python - https://github.com/josephmisiti/awesome-machine-learning#python - https://www.scipy.org/topical-software.html > > Although i haven't used any IDE for python but i have serious complaints > against lack of good code editors and python ides in general. I was > suggested to use atom from my friends but i preferred rolling back to > plain text editor for the trouble it cost me to shift to atom. Not > degrading atom or python ides but lack of popularity of any pyhton ide (at > least best to my knowledge) suggests a need for improvement in current > ides. > I am aware of the default IDE (IDLE) which ships with the default python > package but I find it unsatisfactory in terms of debugging, aesthetics and > usability etc. The main idea is to improve it or rebase it from the grounds > up, and making it capable enough to ship with the python package. > (PS. i would love to develop a new IDE of my own even if that is > besides GSoc ) > https://en.wikipedia.org/wiki/Comparison_of_integrated_development_environments#Python https://wiki.python.org/moin/IntegratedDevelopmentEnvironments https://wiki.python.org/moin/PythonEditors >From https://github.com/westurner/wiki/wiki/bricklayer#spyder : ```md #### Spyder - | Src: https://github.com/spyder-ide/spyder - | Docs: https://pythonhosted.org/spyder/ - ``conda install spyder`` also installs {qt, pyqt, } - ``pip install spyder`` also installs {qt, pyqt, } - Docs: https://pythonhosted.org/spyder/editor.html#how-to-define-a-code-cell - BLD: .travis.yml: https://github.com/spyder-ide/spyder/blob/master/.travis.yml - https://github.com/spyder-ide/spyder/blob/master/create_exe.py (cx_freeze) - https://github.com/spyder-ide/spyder/blob/master/create_app.py (py2app) - Src: https://github.com/spyder-ide/spyder/blob/master/spyder/plugins/editor.py - Src: https://github.com/spyder-ide/spyder/blob/master/spyder/plugins/console.py - Src: https://github.com/spyder-ide/spyder/blob/master/spyder/plugins/ipythonconsole.py - Src: https://github.com/spyder-ide/spyder/tree/master/spyder/widgets - Src: https://github.com/spyder-ide/spyder/blob/master/spyder/widgets/editor.py - Src: https://github.com/spyder-ide/spyder/blob/master/spyder/utils/syntaxhighlighters.py > Spyder may also be used as a PyQt5/PyQt4 extension library (module > spyder). For example, the Python interactive shell widget used in > Spyder may be embedded in your own PyQt5/PyQt4 application. ``` ... https://github.com/spyder-ide/spyder/blob/master/setup.py - ``conda install spyder`` also installs {qt, pyqt, } - https://github.com/spyder-ide/spyder/search?q=ipython - IPython embed_kernel() - https://ipython.readthedocs.io/en/stable/interactive/reference.html#embedding - Docs: https://spyder-ide.github.io/docs/ipythonconsole/ - Docs: https://spyder-ide.github.io/docs/variableexplorer/ - https://github.com/spyder-ide/spyder/tree/master/spyder/widgets/variableexplorer - pandas.DataFrame, Series > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerald.britton at gmail.com Sun Mar 26 23:00:21 2017 From: gerald.britton at gmail.com (Gerald Britton) Date: Sun, 26 Mar 2017 23:00:21 -0400 Subject: [Python-ideas] (no subject) Message-ID: >* On 25 Mar 2017, at 15:51, Gerald Britton > wrote: > *> >* On 25 March 2017 at 11:24, Pavel Velikhov >> wrote: > *>* > No, the current solution is temporary because we just don?t have the > *>* > manpower to > *>* > implement the full thing: a real system that will rewrite parts of PythonQL > *>* > queries and > *>* > ship them to underlying databases. We need a real query optimizer and smart > *>* > wrappers > *>* > for this purpose. But we?ll build one of these for demo purposes soon > *>* > (either a Spark > *>* > wrapper or a PostgreSQL wrapper). > *>* One thought, if you're lacking in manpower now, then proposing > *>* inclusion into core Python means that the core dev team will be taking > *>* on an additional chunk of code that is already under-resourced. That > *>* rings alarm bells for me - how would you imagine the work needed to > *>* merge PythonQL into the core Python grammar would be resourced? > *>* I should say that in practice, I think that the solution is relatively > *>* niche, and overlaps quite significantly with existing Python features, > *>* so I don't really see a compelling case for inclusion. The parallel > *>* with C# and LINQ is interesting here - LINQ is a pretty cool > *>* technology, but I don't see it in widespread use in general-purpose C# > *>* projects (disclaimer: I don't get to see much C# code, so my > *>* experience is limited). > *> >* I see lots of C# code, but (thankfully) not so much LINQ to SQL. Yes, it is a cool technology. But I sometimes have a problem with the SQL it generates. Since I'm also a SQL developer, I'm sensitive to how queries are constructed, for performance reasons, as well as how they look, for readability and aesthetic reasons. > *> >* LINQ queries can generate poorly-performing SQL, since LINQ is a basically a translator, but not an AI. As far as appearances go, LINQ queries can look pretty gnarly, especially if they include sub queries or a few joins. That makes it hard for the SQL dev (me!) to read and understand if there are performance problems (which there often are, in my experience) > *> We want to go beyond being a basic translator. Especially if the common use-case will be integrating multiple databases. We can also introduce decent-looking hints (maybe not always decent looking) to generate better plans. Not sure about asethetics though... >* So, I would tend to code the SQL separately and put it in a SQL view, function or stored procedure. I can still parse the results with LINQ (not LINQ to SQL), which is fine. > *> >* For similar reasons, I'm not a huge fan of ORMs either. Probably my bias towards designing the database first and building up queries to meet the business goals before writing a line of Python, C#, or the language de jour. > * This sounds completely reasonable, but this means you?re tied to a specific DBMS (especially if you?re using a lot of built-in functions that are usually very specific to a database). PythonQL (when it has enough functionality) should give you independence. True though not always needed. e.g. at present I'm working for a large company with thousands of db servers in all the popular flavors. The probability of changing even one of them to a different vendor is essentially zero. The costs and dependencies far outweigh any hoped-for advantage. At the same time, I'm happy to optimize the SQL for different target environments. If I lack the specific expertise, I know where to go to find it. The Adapter pattern helps here. It's actually more important for me to build queries that can be used in multiple client languages. We're using Java, C++, C#, F#, VB, ... and Python, of course (and probably others that I don't know we use). I can optimize the query once and not worry about the clients messing it up. -- Gerald Britton, MCSE-DP, MVP LinkedIn Profile: http://ca.linkedin.com/in/geraldbritton -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Sun Mar 26 23:10:58 2017 From: steve.dower at python.org (Steve Dower) Date: Sun, 26 Mar 2017 20:10:58 -0700 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: <6404f522-34e6-7ee9-4b39-d65deca31f1d@python.org> On 26Mar2017 0707, Nick Coghlan wrote: > Perhaps it would be worth noting in the table of error handlers at > https://docs.python.org/3/library/codecs.html#error-handlers that > backslashreplace is used by the `ascii()` builtin and the associated > format specifiers backslashreplace is also the default errors for stderr, which is arguably the right target for debugging output. Perhaps what we really want is a shorter way to send output to stderr? Though I guess it's an easy to invent one-liner, once you know about the difference: >>> printe = partial(print, file=sys.stderr) Also worth noting that Python 3.6 supports Unicode characters on the console by default on Windows. So unless sys.stdout was manually constructed (a possibility, given this was a GUI app, though I designed the change such that `open("CON", "w")` would get it right), there wouldn't have been an encoding issue in the first place. Cheers, Steve From wes.turner at gmail.com Sun Mar 26 23:42:35 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 26 Mar 2017 22:42:35 -0500 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <2BF89181-9E13-47E7-9759-086A0617EA70@gmail.com> Message-ID: On Sun, Mar 26, 2017 at 10:02 AM, Nick Coghlan wrote: > On 26 March 2017 at 21:40, Pavel Velikhov > wrote: > > On 25 Mar 2017, at 19:40, Nick Coghlan wrote: > >> Right, the target audience here *isn't* folks who already know how to > >> construct their own relational queries in SQL, and it definitely isn't > >> folks that know how to tweak their queries to get optimal performance > >> from the specific database they're using. Rather, it's folks that > >> already know Python's comprehensions, and perhaps some of the > >> itertools features, and helping to provide them with a smoother > >> on-ramp into the world of relational data processing. > > > > > > > Actually I myself am a user of PythonQL, even though I?m an SQL expert. > I work in data science, so > > I do a lot of ad-hoc querying and we always get some new datasets we > need to check out and work with. > > Some things like nested data models are also much better handled by > PythonQL, and data like > > JSON or XML will also be easier to handle. > > So perhaps a better way of framing it would be to say that PythonQL > aims to provide a middle ground between interfaces that are fully in > "Python mode" (e.g ORMs, pandas DataFrames), where the primary > interface is methods-on-objects, and those that are fully in "data > manipulation mode" (e.g. raw SQL, lower level XML and JSON APIs). > > At the Python level, success for PythonQL would look like people being > able to seamlessly transfer their data manipulation skills from a > Django ORM project to an SQL Alchemy project to a pandas analysis > project to a distributed data analysis project in dask, without their > data manipulation code really having to change - only the backing data > structures and the runtime performance characteristics would differ. > e.g. Django ORM to SQLAlchemy: - Does this necessarily imply a metamodel for relations? - Django: GenericForeignKey - SQLAlchemy: sqlalchemy_utils.generic_relationship ... > At the data manipulation layer, success for PythonQL would look like > people being able to easily get "good enough" performance for one-off > scripts, regardless of the backing data store, with closer attention > to detail only being needed for genuinely large data sets (where > efficiency matters even for one-off analyses), or for frequently > repeated operations (where wasted CPU hours show up as increased > infrastructure expenses). > http://pandas.pydata.org/pandas-docs/stable/ecosystem.html#out-of-core (dask, blaze, odo, ) http://blaze.pydata.org/ - blaze - | Src: https://github.com/blaze/blaze - | Docs: https://blaze.readthedocs.io/en/latest/rosetta-pandas.html - | Docs: https://blaze.readthedocs.io/en/latest/rosetta-sql.html - | Docs: https://blaze.readthedocs.io/en/latest/backends.html - "Python, Pandas, SQLAlchemy, MongoDB, PyTables, and Spark" - dask - | Src: https://github.com/dask/dask - | Docs: http://dask.pydata.org/en/latest/#familiar-user-interface ** - | Docs: http://dask.pydata.org/en/latest/scheduler-choice.html - http://xarray.pydata.org/en/stable/dask.html - odo - | Src: https://github.com/blaze/blaze#odo - | Docs: https://odo.readthedocs.io/en/latest/#formats - zero-copy - https://www.slideshare.net/wesm/memory-interoperability-in-analytics-and-machine-learning - https://github.com/alex/zero_buffer/blob/master/zero_buffer.py - ibis - | Src: https://github.com/cloudera/ibis - (a backend may compile to LLVM) - seeAlso: blaze, dask, "bulk synchronous parallel" - | Docs: http://docs.ibis-project.org/sql.html - | Docs: http://docs.ibis-project.org/tutorial.html "Expression tutortial" jupyter notebooks - | Docs: http://docs.ibis-project.org/ - Apache Impala (incubating) - Apache Kudu (incubating) - Hadoop Distributed File System (HDFS) - PostgreSQL (Experimental) - SQLite - [ SQLAlchemy: { ... } ] - | Src: https://github.com/cloudera/ibis/blob/master/ibis/sql/alchemy.py - apache beam - https://beam.apache.org/documentation/sdks/python/ - https://beam.apache.org/get-started/quickstart-py/ (pip install apache-beam) - https://beam.apache.org/documentation/sdks/pydoc/0.6.0/ - apache_beam.transforms - https://beam.apache.org/documentation/programming-guide/#transforms "Applying transforms" Somewhere in this list, these become big data tools. > >> There's no question that folks dealing with sufficiently large data > >> sets with sufficiently stringent performance requirements are > >> eventually going to want to reach for handcrafted SQL or a distributed > >> computation framework like dask, but that's not really any different > >> from our standard position that when folks are attempting to optimise > >> a hot loop, they're eventually going to have to switch to something > >> that can eliminate the interpreter's default runtime object management > >> overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an > >> extension module in a different language entirely). It isn't an > >> argument against making it easier for folks to postpone the point > >> where they find it necessary to reach for the "something else" that > >> takes them beyond Python's default capabilities. > > > > Don?t know, for example one of the wrappers is going to be an Apache > Spark > > wrappers, so you could quickly hack up a PythonQL query that would be run > > on a distributed platform. > > Right, I meant this in the same sense that folks using an ORM like SQL > Alchemy may eventually hit a point where rather than trying to > convince the ORM to emit the SQL they want to run, it's easier to just > bypass the ORM layer and write the exact SQL they want. > At that point one can either: - reflect the tables/mappings at devtime - reflect the tables/mappings at runtime And then run the raw DBAPI query (using appropriate query interpolation (-> i-strings and scoped configuration state)): session.execute("SELECT dbapi_version FROM ?", "tbl;name") > It's worthwhile attempting to reduce the number of cases where folks > feel obliged to do that, but at the same time, abstraction layers need > to hide at least some lower level details if they're going to actually > work properly. > > >> = Option 1 = > >> > >> Fully commit to the model of allowing alternate syntactic dialects to > >> run atop Python interpreters. In Hylang and PythonQL we have at least > >> two genuinely interesting examples of that working through the text > >> encoding system, as well as other examples like Cython that work > >> through the extension module system. > >> > >> So that's an opportunity to take this from "Possible, but a bit hacky" > >> to "Pluggable source code translation is supported at all levels of > >> the interpreter, including debugger source maps, etc" (perhaps by > >> borrowing ideas from other ecosytems like Java, JavaScript, and .NET, > >> where this kind of thing is already a lot more common. > >> > >> The downside of this approach is that actually making it happen would > >> be getting pretty far afield from the original PythonQL goal of > >> "provide nicer data manipulation abstractions in Python", and it > >> wouldn't actually deliver anything new that can't already be done with > >> existing import and codec system features. > > > This would be great anyways, if we could rely on some preprocessor > directive, > > instead of hacking encodings, this could be nice. > > Victor Stinner wrote up some ideas about that in PEP 511: > https://www.python.org/dev/peps/pep-0511/ > > Preprocessing is one of the specific uses cases considered: > https://www.python.org/dev/peps/pep-0511/#usage-2-preprocessor > > >> = Option 2 = > >> > >> ... given optionally delayed > >> rendering of interpolated strings, PythonQL could be used in the form: > >> > >> result =pyql(i""" > >> (x,y) > >> for x in {range(1,8)} > >> for y in {range(1,7)} > >> if x % 2 == 0 and > >> y % 2 != 0 and > >> x > y > >> """) > >> > >> I personally like this idea (otherwise I wouldn't have written PEP 501 > >> in the first place), and the necessary technical underpinnings to > >> enable it are all largely already in place to support f-strings. If > >> the PEP were revised to show examples of using it to support > >> relatively seamless calling back and forth between Hylang, PythonQL > >> and regular Python code in the same process, that might be intriguing > >> enough to pique Guido's interest (and I'm open to adding co-authors > >> that are interested in pursuing that). > > > > What would be the difference between this and just executing a PythonQL > > string for us, getting local and global variables into PythonQL scope? > > The big new technical capability that f-strings introduced is that the > compiler can see the variable references in the embedded expressions, > so f-strings "just work" with closure references, whereas passing > locals() and globals() explicitly is: > > 1. slow (since you have to generate a full locals dict); > 2. incompatible with the use of closure variables (since they're not > visible in either locals() *or* globals()) > > The i-strings concept takes that closure-compatible interpolation > capability and separates it from the str.format based rendering step. > > From a speed perspective, the interpolation aspects of this approach > are so efficient they rival simple string concatenation: > > $ python -m perf timeit -s 'first = "Hello"; second = " World!"' > 'first + second' > ..................... > Mean +- std dev: 71.7 ns +- 2.1 ns > > $ python -m perf timeit -s 'first = "Hello"; second = " World!"' > 'f"{first}{second}"' > ..................... > Mean +- std dev: 77.8 ns +- 2.5 ns > > Something like pyql that did more than just concatenate the text > sections with the text values of the embedded expressions would still > need some form of regex-style caching strategy to avoid parsing the > same query string multiple times, but the Python interpreter would > handle the task of breaking up the string into the text sections and > the interpolated Python expressions. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Mar 27 02:16:28 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 27 Mar 2017 01:16:28 -0500 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <2BF89181-9E13-47E7-9759-086A0617EA70@gmail.com> Message-ID: On Sun, Mar 26, 2017 at 10:42 PM, Wes Turner wrote: > > > On Sun, Mar 26, 2017 at 10:02 AM, Nick Coghlan wrote: > >> On 26 March 2017 at 21:40, Pavel Velikhov >> wrote: >> > On 25 Mar 2017, at 19:40, Nick Coghlan wrote: >> >> Right, the target audience here *isn't* folks who already know how to >> >> construct their own relational queries in SQL, and it definitely isn't >> >> folks that know how to tweak their queries to get optimal performance >> >> from the specific database they're using. Rather, it's folks that >> >> already know Python's comprehensions, and perhaps some of the >> >> itertools features, and helping to provide them with a smoother >> >> on-ramp into the world of relational data processing. >> > >> > > >> > >> > Actually I myself am a user of PythonQL, even though I?m an SQL expert. >> I work in data science, so >> > I do a lot of ad-hoc querying and we always get some new datasets we >> need to check out and work with. >> > Some things like nested data models are also much better handled by >> PythonQL, and data like >> > JSON or XML will also be easier to handle. >> >> So perhaps a better way of framing it would be to say that PythonQL >> aims to provide a middle ground between interfaces that are fully in >> "Python mode" (e.g ORMs, pandas DataFrames), where the primary >> interface is methods-on-objects, and those that are fully in "data >> manipulation mode" (e.g. raw SQL, lower level XML and JSON APIs). >> >> At the Python level, success for PythonQL would look like people being >> able to seamlessly transfer their data manipulation skills from a >> Django ORM project to an SQL Alchemy project to a pandas analysis >> project to a distributed data analysis project in dask, without their >> data manipulation code really having to change - only the backing data >> structures and the runtime performance characteristics would differ. >> > > e.g. Django ORM to SQLAlchemy: > - Does this necessarily imply a metamodel for relations? > - Django: GenericForeignKey > - SQLAlchemy: sqlalchemy_utils.generic_relationship > > Does this necessarily imply a metamodel for relations? Edges are expressed differently in different frameworks; ultimately you're looking at a projection of a graph (a constructed subset of a graph). So solving this in the general case implies solving for graphs (as well as (which includes) tree-based hierarchical data like SQL, arrays, documents, keys and values)). 1. Schema ("metamodel") 2. Query language Q: How can Linked Data help define a metamodel for expressing relations (in order to harmonize search of disparate datasets)? - It's a graph with schema constraints. - Use URIs for Classes ("types"), Properties ("columns", "attributes"), and instances with @ids ("rows") - RDF, RDFS, OWL - Search n databases asynchronously with SPARQL federation - Native SPARQL database (adapt the data) - SPARQL facade/frontend (adapt to an interface) - Define/generate a schema representation for arbitrary data sources which {query language} {implementation A} can use to plan data-local queries and transformations - JSONLD @context for data sources ### Property Relations are expressed as properties of class instances. rdf:Property schema:Property https://meta.schema.org/Property - https://meta.schema.org/inverseOf owl:inverseOf https://www.w3.org/TR/owl-ref/#inverseOf-def Q: > "How can you provide documentation about the columns in a CSV file?" https://www.w3.org/TR/tabular-data-primer/#documentation-columns A: CSVW as [JSONLD,] A: https://wrdrd.com/docs/consulting/linkedreproducibility#csv-csvw-and-metadata-rows - How do we make these work with various stores? - How do we include columnar metadata like physical units and precision in databases without support for it? - JSON-LD manifest? AFAIU, these don't handle relations: - http://datashape.pydata.org/ - https://github.com/apache/arrow/blob/master/format/Metadata.md Q: "How can you describe the schema for multi-dimensional datasets (with complex relations)?" A: https://www.w3.org/TR/vocab-data-cube/#data-cubes The relations are otherwise defined as RDFS/OWL (e.g. as JSON-LD). ## Graph queries ### SPARQL - SPARQL is a W3C Web Standard query language. - SPARQL is not the only graph query language. ### Blueprints, Gremlin Blueprints is a graph traversal/query API. - There are many blueprints API implementations (e.g. Rexster, Neo4j , Blazegraph , Accumulo ) Gremlin implements the Blueprints API (also in Python); it's also generic like LINQ (like JDBC for graph databases): https://tinkerpop.apache.org/docs/current/reference/#gremlin-python ### GraphQL https://github.com/graphql-python/ ... supporting relations across ORMs would be cool; with enough abstraction IDK why it wouldn't look like RDFS/OWL. > > ... > > >> At the data manipulation layer, success for PythonQL would look like >> people being able to easily get "good enough" performance for one-off >> scripts, regardless of the backing data store, with closer attention >> to detail only being needed for genuinely large data sets (where >> efficiency matters even for one-off analyses), or for frequently >> repeated operations (where wasted CPU hours show up as increased >> infrastructure expenses). >> > > > http://pandas.pydata.org/pandas-docs/stable/ecosystem.html#out-of-core > (dask, blaze, odo, ) > > http://blaze.pydata.org/ > > - blaze > - | Src: https://github.com/blaze/blaze > - | Docs: https://blaze.readthedocs.io/en/latest/rosetta-pandas.html > - | Docs: https://blaze.readthedocs.io/en/latest/rosetta-sql.html > - | Docs: https://blaze.readthedocs.io/en/latest/backends.html > - "Python, Pandas, SQLAlchemy, MongoDB, PyTables, and Spark" > *URIs* https://blaze.readthedocs.io/en/latest/uri.html#what-sorts-of-uris-does-blaze-support ``` What sorts of URIs does Blaze support? Paths to files on disk, including the following extensions - .csv - .json - .csv.gz/json.gz - .hdf5 (uses h5py) - .hdf5::/datapath - hdfstore://filename.hdf5 (uses special pandas.HDFStore format) - .bcolz - .xls(x) SQLAlchemy strings like the following - sqlite:////absolute/path/to/myfile.db::tablename - sqlite:////absolute/path/to/myfile.db (specify a particular table) - postgresql://username:password at hostname:port - impala://hostname (uses impyla) - anything supported by SQLAlchemy MongoDB Connection strings of the following form - mongodb://username:password at hostname:port/database_name::collection_name Blaze server strings of the following form - blaze://hostname:port (port defaults to 6363) In all cases when a location or table name is required in addition to the traditional URI (e.g. a data path within an HDF5 file or a Table/Collection name within a database) *then that information follows on the end of the URI after a separator of two colons ::.* How it works Blaze depends on the Odo library to handle URIs. URIs are managed through the resource function which is dispatched based on regular expressions. For example a simple resource function to handle .json files might look like the following (although Blaze?s actual solution is a bit more comprehensive): from blaze import resource import json @resource.register('.+\.json') def resource_json(uri): with open(uri): data = json.load(uri) return data Can I extend this to my own types? Absolutely. Import and extend *resource* as shown in the ?How it works? section. The rest of Blaze will pick up your change automatically. ``` > > - dask > - | Src: https://github.com/dask/dask > - | Docs: http://dask.pydata.org/en/latest/#familiar-user-interface ** > - | Docs: http://dask.pydata.org/en/latest/scheduler-choice.html > - http://xarray.pydata.org/en/stable/dask.html > > - odo > - | Src: https://github.com/blaze/blaze#odo > - | Docs: https://odo.readthedocs.io/en/latest/#formats > > - zero-copy > - https://www.slideshare.net/wesm/memory-interoperability- > in-analytics-and-machine-learning > - https://github.com/alex/zero_buffer/blob/master/zero_buffer.py > > > - ibis > - | Src: https://github.com/cloudera/ibis > - (a backend may compile to LLVM) > - seeAlso: blaze, dask, "bulk synchronous parallel" > - | Docs: http://docs.ibis-project.org/sql.html > - | Docs: http://docs.ibis-project.org/tutorial.html "Expression > tutortial" jupyter notebooks > - | Docs: http://docs.ibis-project.org/ > - Apache Impala (incubating) > - Apache Kudu (incubating) > - Hadoop Distributed File System (HDFS) > - PostgreSQL (Experimental) > - SQLite > - [ SQLAlchemy: { ... } ] > - | Src: https://github.com/cloudera/ibis/blob/master/ibis/sql/ > alchemy.py > > - apache beam > - https://beam.apache.org/documentation/sdks/python/ > - https://beam.apache.org/get-started/quickstart-py/ (pip install > apache-beam) > - https://beam.apache.org/documentation/sdks/pydoc/0.6.0/ > - apache_beam.transforms > - https://beam.apache.org/documentation/programming-guide/#transforms > "Applying transforms" > > Somewhere in this list, > these become big data tools. > > >> >> There's no question that folks dealing with sufficiently large data >> >> sets with sufficiently stringent performance requirements are >> >> eventually going to want to reach for handcrafted SQL or a distributed >> >> computation framework like dask, but that's not really any different >> >> from our standard position that when folks are attempting to optimise >> >> a hot loop, they're eventually going to have to switch to something >> >> that can eliminate the interpreter's default runtime object management >> >> overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an >> >> extension module in a different language entirely). It isn't an >> >> argument against making it easier for folks to postpone the point >> >> where they find it necessary to reach for the "something else" that >> >> takes them beyond Python's default capabilities. >> > >> > Don?t know, for example one of the wrappers is going to be an Apache >> Spark >> > wrappers, so you could quickly hack up a PythonQL query that would be >> run >> > on a distributed platform. >> >> Right, I meant this in the same sense that folks using an ORM like SQL >> Alchemy may eventually hit a point where rather than trying to >> convince the ORM to emit the SQL they want to run, it's easier to just >> bypass the ORM layer and write the exact SQL they want. >> > > At that point one can either: > - reflect the tables/mappings at devtime > - reflect the tables/mappings at runtime > > And then run the raw DBAPI query > (using appropriate query interpolation > (-> i-strings and scoped configuration state)): > > session.execute("SELECT dbapi_version FROM ?", "tbl;name") > > > >> It's worthwhile attempting to reduce the number of cases where folks >> feel obliged to do that, but at the same time, abstraction layers need >> to hide at least some lower level details if they're going to actually >> work properly. >> >> >> = Option 1 = >> >> >> >> Fully commit to the model of allowing alternate syntactic dialects to >> >> run atop Python interpreters. In Hylang and PythonQL we have at least >> >> two genuinely interesting examples of that working through the text >> >> encoding system, as well as other examples like Cython that work >> >> through the extension module system. >> >> >> >> So that's an opportunity to take this from "Possible, but a bit hacky" >> >> to "Pluggable source code translation is supported at all levels of >> >> the interpreter, including debugger source maps, etc" (perhaps by >> >> borrowing ideas from other ecosytems like Java, JavaScript, and .NET, >> >> where this kind of thing is already a lot more common. >> >> >> >> The downside of this approach is that actually making it happen would >> >> be getting pretty far afield from the original PythonQL goal of >> >> "provide nicer data manipulation abstractions in Python", and it >> >> wouldn't actually deliver anything new that can't already be done with >> >> existing import and codec system features. > > > >> > This would be great anyways, if we could rely on some preprocessor >> directive, >> > instead of hacking encodings, this could be nice. >> >> Victor Stinner wrote up some ideas about that in PEP 511: >> https://www.python.org/dev/peps/pep-0511/ >> >> Preprocessing is one of the specific uses cases considered: >> https://www.python.org/dev/peps/pep-0511/#usage-2-preprocessor >> >> >> = Option 2 = >> >> >> >> ... given optionally delayed >> >> rendering of interpolated strings, PythonQL could be used in the form: >> >> >> >> result =pyql(i""" >> >> (x,y) >> >> for x in {range(1,8)} >> >> for y in {range(1,7)} >> >> if x % 2 == 0 and >> >> y % 2 != 0 and >> >> x > y >> >> """) >> >> >> >> I personally like this idea (otherwise I wouldn't have written PEP 501 >> >> in the first place), and the necessary technical underpinnings to >> >> enable it are all largely already in place to support f-strings. If >> >> the PEP were revised to show examples of using it to support >> >> relatively seamless calling back and forth between Hylang, PythonQL >> >> and regular Python code in the same process, that might be intriguing >> >> enough to pique Guido's interest (and I'm open to adding co-authors >> >> that are interested in pursuing that). >> > >> > What would be the difference between this and just executing a PythonQL >> > string for us, getting local and global variables into PythonQL scope? >> >> The big new technical capability that f-strings introduced is that the >> compiler can see the variable references in the embedded expressions, >> so f-strings "just work" with closure references, whereas passing >> locals() and globals() explicitly is: >> >> 1. slow (since you have to generate a full locals dict); >> 2. incompatible with the use of closure variables (since they're not >> visible in either locals() *or* globals()) >> >> The i-strings concept takes that closure-compatible interpolation >> capability and separates it from the str.format based rendering step. >> >> From a speed perspective, the interpolation aspects of this approach >> are so efficient they rival simple string concatenation: >> >> $ python -m perf timeit -s 'first = "Hello"; second = " World!"' >> 'first + second' >> ..................... >> Mean +- std dev: 71.7 ns +- 2.1 ns >> >> $ python -m perf timeit -s 'first = "Hello"; second = " World!"' >> 'f"{first}{second}"' >> ..................... >> Mean +- std dev: 77.8 ns +- 2.5 ns >> >> Something like pyql that did more than just concatenate the text >> sections with the text values of the embedded expressions would still >> need some form of regex-style caching strategy to avoid parsing the >> same query string multiple times, but the Python interpreter would >> handle the task of breaking up the string into the text sections and >> the interpolated Python expressions. >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Mar 27 02:54:04 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 27 Mar 2017 16:54:04 +1000 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: <6404f522-34e6-7ee9-4b39-d65deca31f1d@python.org> References: <6404f522-34e6-7ee9-4b39-d65deca31f1d@python.org> Message-ID: On 27 March 2017 at 13:10, Steve Dower wrote: > On 26Mar2017 0707, Nick Coghlan wrote: >> >> Perhaps it would be worth noting in the table of error handlers at >> https://docs.python.org/3/library/codecs.html#error-handlers that >> backslashreplace is used by the `ascii()` builtin and the associated >> format specifiers > > backslashreplace is also the default errors for stderr, which is arguably > the right target for debugging output. Perhaps what we really want is a > shorter way to send output to stderr? Though I guess it's an easy to invent > one-liner, once you know about the difference: > >>>> printe = partial(print, file=sys.stderr) If there was a printerror builtin that used sys.stderr as its default output stream, it could also special case BaseException instances to show their traceback. At the moment, we do force people to learn a few additional concepts in order to do error display "right": - processes have two standard output streams, stdout and stderr - Python makes those available in the sys module - the print() builtin function lets you specify a stream with "file" - so errors should be printed with "print(arg, file=sys.stderr)" - to get exception tracebacks like those at the interactive prompt, look at the traceback module As opposed to "for normal output, use 'print', for error output, use 'printerror', for temporary debugging output also use 'printerror', otherwise use the logging module". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From contact at brice.xyz Mon Mar 27 03:17:46 2017 From: contact at brice.xyz (Brice PARENT) Date: Mon, 27 Mar 2017 09:17:46 +0200 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> Message-ID: <4b77573a-94f0-13bb-b7b8-20ef89eb5642@brice.xyz> I prefer this a lot to the original syntax, and I really think this has much better chances to be integrated (if such an integration had to be done, and not kept as a separate module). Also, maybe managing this with classes instead of syntax could also be done easily (without any change to Python), like this: from pyql import PQL, Select, For, Where, GroupBy, Let result = PQL( Select("x", "sum_y"), For("x", range(1,8)), For("y",range(1,7)), Where(lambda x, y: x %2==0andy %2!=0andx >y, "x", "y"),# function, *[arguments to pass to the function] Where("sum_y", lambda sum_y: sum_y %2!=0) GroupBy("x"), Let("sum_y", lambda y: sum(y), "y") ) (to be defined more precisely, I don't really like to rely on case to differentiate the "for" keyword and the "For" class, which by the way could be inherited from a more general "From" class, allowing to get the data from a database, pure python, a JSON/csv/xml file/object, or anything else.) With a nice lazy evaluation, in the order of the arguments of the constructor, I suppose you could achieve everything you need, and have an easily extendable syntax (create new objects between versions of the module, without ever having to create new keywords). There is no new parsing, and you already have the autocompletion of your IDE if you annotate correctly your code. You could even have this : query = PQL( Select("x", "sum_y"), Where("x", "y", lambda x, y: x %2==0andy %2!=0andx >y), Where("sum_y", lambda sum_y: sum_y %2!=0) GroupBy("x"), Let("sum_y", lambda y: sum(y), "y") # [name of the new var], function, *[arguments to pass to the function] ) query.execute(x=range(1,8),y=range(1,7)) or query.execute(PgDatabase(**dbsettings)) -Brice Le 25/03/17 ? 16:40, Kyle Lahnakoski a ?crit : > > > Pavel, > > I like PythonQL. I perform a lot of data transformation, and often > find Python's list comprehensions too limiting; leaving me wishing for > LINQ-like language features. > > As an alternative to extending Python with PythonQL, Terry Reedy > suggested interpreting a DSL string, and Pavel Velikhov alluded to > using magic method tricks found in ORM libraries. I can see how both > these are not satisfactory. > > A third alternative could be to encode the query clauses as JSON > objects. For example: > result= [ select (x, sum_y) > for xin range(1,8), > yin range(1,7) > where x% 2 == 0 and y% 2 != 0 and x> y > group by x > let sum_y =sum(y) > where sum_y% 2 != 0 > ] > result = pq([ > {"select":["x", "sum_y"]}, > {"for":{"x": range(1,8), "y":range(1,7)}}, > {"where": lambda x,y: x% 2 == 0 and y% 2 != 0 and x> y}, > {"groupby": "x"}, > {"with":{"sum_y":{"SUM":"y"}}, > {"where": {"neq":[{"mod":["sum_y", 2]}, 0]}} > ]) > This representation does look a little lispy, and it may resemble > PythonQL's parse tree. I think the benefits are: > > 1) no python language change > 2) easier to parse > 3) better than string-based DSL for catching syntax errors > 4) {"clause": parameters} format is flexible for handling common query > patterns ** > 5) works in javascript too > 6) easy to compose with automation (my favorite) > > It is probably easy for you to see the drawbacks. > > > ** The `where` clause can accept a native lambda function, or an > expression tree > > > > > > "If you are writing a loop, you are doing it wrong!" :) > > > On 2017-03-24 11:10, Pavel Velikhov wrote: >> Hi folks! >> >> We started a project to extend Python with a full-blown query >> language about a year ago. The project is call PythonQL, the links >> are given below in the references section. We have implemented what >> is kind of an alpha version now, and gained some experience and >> insights about why and where this is really useful. So I?d like to >> share those with you and gather some opinions whether you think we >> should try to include these extensions in the Python core. >> >> *Intro* >> >> What we have done is (mostly) extended Python?s comprehensions with >> group by, order by, let and window clauses, which can come in any >> order, thus comprehensions become a query language a bit cleaner and >> more powerful than SQL. And we added a couple small convenience >> extensions, like a We have identified three top motivations for >> folks to use these extensions: >> >> *Our Motivations* >> >> 1. This can become a standard for running queries against database >> systems. Instead of learning a large number of different SQL dialects >> (the pain point here are libraries of functions and operators that >> are different for each vendor), the Python developer needs only to >> learn PythonQL and he can query any SQL and NoSQL database. >> >> 2. A single PythonQL expression can integrate a number of >> databases/files/memory structures seamlessly, with the PythonQL >> optimizer figuring out which pieces of plans to ship to which >> databases. This is a cool virtual database integration story that can >> be very convenient, especially now, when a lot of data scientists use >> Python to wrangle the data all day long. >> >> 3. Querying data structures inside Python with the full power of SQL >> (and a bit more) is also really convenient on its own. Usually folks >> that are well-versed in SQL have to resort to completely different >> means when they need to run a query in Python on top of some data >> structures. >> >> *Current Status* >> >> We have PythonQL running, its installed via pip and an encoding hack, >> that runs our preprocessor. We currently compile PythonQL into Python >> using our executor functions and execute Python subexpressions via >> eval. We don?t do any optimization / rewriting of queries into >> languages of underlying systems. And the query processor is basic >> too, with naive implementations of operators. But we?ve build DBMS >> systems before, so if there is a good amount of support for this >> project, we?ll be able to build a real system here. >> >> *Your take on this* >> >> Extending Python?s grammar is surely a painful thing for the >> community. We?re now convinced that it is well worth it, because of >> all the wonderful functionality and convenience this extension >> offers. We?d like to get your feedback on this and maybe you?ll >> suggest some next steps for us. >> >> *References* >> >> PythonQL GitHub page: https://github.com/pythonql/pythonql >> PythonQL Intro and Tutorial (this is all User Documentation we have >> right now): >> https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial >> A use-case of querying Event Logs and doing Process Mining with >> PythonQL: >> https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL >> PythonQL demo site: www.pythonql.org >> >> Best regards, >> PythonQL Team >> >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct:http://python.org/psf/codeofconduct/ > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavel.velikhov at gmail.com Mon Mar 27 04:55:32 2017 From: pavel.velikhov at gmail.com (Pavel Velikhov) Date: Mon, 27 Mar 2017 11:55:32 +0300 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <4b77573a-94f0-13bb-b7b8-20ef89eb5642@brice.xyz> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> <4b77573a-94f0-13bb-b7b8-20ef89eb5642@brice.xyz> Message-ID: <8716ACD7-C609-4F9D-B9AF-9DBDA8369381@gmail.com> Hi Brice, > On 27 Mar 2017, at 10:17, Brice PARENT wrote: > > I prefer this a lot to the original syntax, and I really think this has much better chances to be integrated (if such an integration had to be done, and not kept as a separate module). > > Also, maybe managing this with classes instead of syntax could also be done easily (without any change to Python), like this: > from pyql import PQL, Select, For, Where, GroupBy, Let > > result = PQL( > Select("x", "sum_y"), > For("x", range(1, 8)), > For("y", range(1, 7)), > Where(lambda x, y: x % 2 == 0 and y % 2 != 0 and x > y, "x", "y"), # function, *[arguments to pass to the function] > Where("sum_y", lambda sum_y: sum_y % 2 != 0) > GroupBy("x"), > Let("sum_y", lambda y: sum(y), "y") > ) > > So here?s the deal: small queries will look pretty decent in pretty much all paradigms, ORM, or PythonQL or your proposal. Once they get bigger and combine multiple pain points (say outerjoins, grouping and nested data) - then unless you have a really clear and minimal language, folks will get confused and lost. We?ve gone through a few query languages that failed, including XQuery and others, and the main reason was the need to learn a whole new language and a bunch of libraries, nobody wanted to do it. So the main selling point behind PythonQL is: its Python that folks hopefully know already, with just a few extensions. > (to be defined more precisely, I don't really like to rely on case to differentiate the "for" keyword and the "For" class, which by the way could be inherited from a more general "From" class, allowing to get the data from a database, pure python, a JSON/csv/xml file/object, or anything else.) > > With a nice lazy evaluation, in the order of the arguments of the constructor, I suppose you could achieve everything you need, and have an easily extendable syntax (create new objects between versions of the module, without ever having to create new keywords). There is no new parsing, and you already have the autocompletion of your IDE if you annotate correctly your code. > > You could even have this : > > query = PQL( > Select("x", "sum_y"), > Where("x", "y", lambda x, y: x % 2 == 0 and y % 2 != 0 and x > y), > Where("sum_y", lambda sum_y: sum_y % 2 != 0) > GroupBy("x"), > Let("sum_y", lambda y: sum(y), "y") # [name of the new var], function, *[arguments to pass to the function] > ) > query.execute(x=range(1, 8), y=range(1, 7)) > or > query.execute(PgDatabase(**dbsettings)) > > -Brice > > Le 25/03/17 ? 16:40, Kyle Lahnakoski a ?crit : >> >> Pavel, >> >> I like PythonQL. I perform a lot of data transformation, and often find Python's list comprehensions too limiting; leaving me wishing for LINQ-like language features. >> As an alternative to extending Python with PythonQL, Terry Reedy suggested interpreting a DSL string, and Pavel Velikhov alluded to using magic method tricks found in ORM libraries. I can see how both these are not satisfactory. >> A third alternative could be to encode the query clauses as JSON objects. For example: >> result = [ select (x, sum_y) >> for x in range(1,8), >> y in range(1,7) >> where x % 2 == 0 and y % 2 != 0 and x > y >> group by x >> let sum_y = sum(y) >> where sum_y % 2 != 0 >> ] >> result = pq([ >> {"select":["x", "sum_y"]}, >> {"for":{"x": range(1,8), "y": range(1,7)}}, >> {"where": lambda x,y: x % 2 == 0 and y % 2 != 0 and x > y}, >> {"groupby": "x"}, >> {"with":{"sum_y":{"SUM":"y"}}, >> {"where": {"neq":[{"mod":["sum_y", 2]}, 0]}} >> ]) >> This representation does look a little lispy, and it may resemble PythonQL's parse tree. I think the benefits are: >> >> 1) no python language change >> 2) easier to parse >> 3) better than string-based DSL for catching syntax errors >> 4) {"clause": parameters} format is flexible for handling common query patterns ** >> 5) works in javascript too >> 6) easy to compose with automation (my favorite) >> >> It is probably easy for you to see the drawbacks. >> >> >> ** The `where` clause can accept a native lambda function, or an expression tree >> >> >> >> >> >> "If you are writing a loop, you are doing it wrong!" :) >> >> >> On 2017-03-24 11:10, Pavel Velikhov wrote: >>> Hi folks! >>> >>> We started a project to extend Python with a full-blown query language about a year ago. The project is call PythonQL, the links are given below in the references section. We have implemented what is kind of an alpha version now, and gained some experience and insights about why and where this is really useful. So I?d like to share those with you and gather some opinions whether you think we should try to include these extensions in the Python core. >>> >>> Intro >>> >>> What we have done is (mostly) extended Python?s comprehensions with group by, order by, let and window clauses, which can come in any order, thus comprehensions become a query language a bit cleaner and more powerful than SQL. And we added a couple small convenience extensions, like a We have identified three top motivations for folks to use these extensions: >>> >>> Our Motivations >>> >>> 1. This can become a standard for running queries against database systems. Instead of learning a large number of different SQL dialects (the pain point here are libraries of functions and operators that are different for each vendor), the Python developer needs only to learn PythonQL and he can query any SQL and NoSQL database. >>> >>> 2. A single PythonQL expression can integrate a number of databases/files/memory structures seamlessly, with the PythonQL optimizer figuring out which pieces of plans to ship to which databases. This is a cool virtual database integration story that can be very convenient, especially now, when a lot of data scientists use Python to wrangle the data all day long. >>> >>> 3. Querying data structures inside Python with the full power of SQL (and a bit more) is also really convenient on its own. Usually folks that are well-versed in SQL have to resort to completely different means when they need to run a query in Python on top of some data structures. >>> >>> Current Status >>> >>> We have PythonQL running, its installed via pip and an encoding hack, that runs our preprocessor. We currently compile PythonQL into Python using our executor functions and execute Python subexpressions via eval. We don?t do any optimization / rewriting of queries into languages of underlying systems. And the query processor is basic too, with naive implementations of operators. But we?ve build DBMS systems before, so if there is a good amount of support for this project, we?ll be able to build a real system here. >>> >>> Your take on this >>> >>> Extending Python?s grammar is surely a painful thing for the community. We?re now convinced that it is well worth it, because of all the wonderful functionality and convenience this extension offers. We?d like to get your feedback on this and maybe you?ll suggest some next steps for us. >>> >>> References >>> >>> PythonQL GitHub page: https://github.com/pythonql/pythonql >>> PythonQL Intro and Tutorial (this is all User Documentation we have right now): https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial >>> A use-case of querying Event Logs and doing Process Mining with PythonQL: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL >>> PythonQL demo site: www.pythonql.org >>> >>> Best regards, >>> PythonQL Team >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From contact at brice.xyz Mon Mar 27 05:54:58 2017 From: contact at brice.xyz (Brice PARENT) Date: Mon, 27 Mar 2017 11:54:58 +0200 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <8716ACD7-C609-4F9D-B9AF-9DBDA8369381@gmail.com> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> <4b77573a-94f0-13bb-b7b8-20ef89eb5642@brice.xyz> <8716ACD7-C609-4F9D-B9AF-9DBDA8369381@gmail.com> Message-ID: <151eae4e-3961-811c-d837-edceabd42ea7@brice.xyz> Le 27/03/17 ? 10:55, Pavel Velikhov a ?crit : > Hi Brice, > >> On 27 Mar 2017, at 10:17, Brice PARENT > > wrote: >> >> I prefer this a lot to the original syntax, and I really think this >> has much better chances to be integrated (if such an integration had >> to be done, and not kept as a separate module). >> >> Also, maybe managing this with classes instead of syntax could also >> be done easily (without any change to Python), like this: >> >> from pyql import PQL, Select, For, Where, GroupBy, Let >> >> result = PQL( >> Select("x", "sum_y"), >> For("x", range(1,8)), >> For("y",range(1,7)), >> Where(lambda x, y: x %2==0andy %2!=0andx >y, "x", "y"), # >> function, *[arguments to pass to the function] >> Where("sum_y", lambda sum_y: sum_y %2!=0) >> GroupBy("x"), >> Let("sum_y", lambda y: sum(y), "y") >> ) >> >> > > So here?s the deal: small queries will look pretty decent in pretty > much all paradigms, ORM, or PythonQL or your proposal. > Once they get bigger and combine multiple pain points (say outerjoins, > grouping and nested data) - then unless you have a > really clear and minimal language, folks will get confused and lost. > > We?ve gone through a few query languages that failed, including XQuery > and others, and the main reason was the need to learn > a whole new language and a bunch of libraries, nobody wanted to do it. > So the main selling point behind PythonQL is: its Python > that folks hopefully know already, with just a few extensions. I get it, but it's more a matter of perception. To me, the version I described is just Python, while yours is Python + specific syntax. As this syntax is only used in PyQL sub-language, it's not really Python any more... Also, what I like with what I used, is that it is object-based, which allows any part of the query to be reusable or built dynamically. We might also extend such a PQL object's constructor to embed automatically whatever default parameters or database connection we want, or shared behaviours, like: class MyPQL(PQL): def get_limit(self): if self.limit is not None: return self.limit return 10 def __init__(self, *args): args.append(Let("sum_y", lambda y: sum(y), "y")) args.append(GroupBy("x")) super().__init__(*args) result = MyPQL( Select("x", "sum_y"), For("x", range(1,8)), For("y",range(1,7)), Where(lambda x, y: x %2==0andy %2!=0andx >y, "x", "y"), Where("sum_y", lambda sum_y: sum_y %2!=0) ) Big queries, this way, may be split into smaller parts. And it allows you to do the following in a single query, instead of having to write one big for each condition where_from = [For("x", range(1,8)),For("y",range(1,7))] where = [Where(lambda x, y: x %2==0andy %2!=0andx >y, "x", "y")] if filter_sum_y: where.append(Where("sum_y", lambda sum_y: sum_y %2!=0)) if group_by is not None: grouping = GroupBy("x") result = MyPQL(Select("x", "sum_y"), *where_from, *where, *grouping) Side note : I'm not a big database user, I mostly use ORMs (Django's and PonyORM depending on the projects) to access PgSQL and SQLite (for unit testing), so I might not even have use cases for what you're trying to solve. I just give my point of view here to explain what I think could be more easily integrated and (re)used. And as I'm a big fan of the DRY mentality, I'm not a fan of the syntax-chaining things (as well as I don't really like big nested comprehensions). -Brice -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Mar 27 06:18:03 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 27 Mar 2017 11:18:03 +0100 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: <151eae4e-3961-811c-d837-edceabd42ea7@brice.xyz> References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> <4b77573a-94f0-13bb-b7b8-20ef89eb5642@brice.xyz> <8716ACD7-C609-4F9D-B9AF-9DBDA8369381@gmail.com> <151eae4e-3961-811c-d837-edceabd42ea7@brice.xyz> Message-ID: On 27 March 2017 at 10:54, Brice PARENT wrote: > I get it, but it's more a matter of perception. To me, the version I > described is just Python, while yours is Python + specific syntax. As this > syntax is only used in PyQL sub-language, it's not really Python any more... ... which is why I suspect that this discussion would be better expressed as a suggestion that Python provide better support for domain specific languages like the one PythonQL offers. In that context, the "extended comprehension" format *would* be Python, specifically it would simply be a DSL embedded in Python using Python's standard features for doing that. Of course, that's just a re-framing of the perception, and the people who don't like sub-languages will be just as uncomfortable with DSLs. However, it does put this request into the context of DSL support, which is something that many languages provide, to a greater or lesser extent. For Python, Guido's traditionally been against allowing the language to be mutable in the way that DSLs permit, so in the first instance it's likely that the PythonQL proposal will face a lot of resistance. It's possible that PythonQL could provide a use case that shows the benefits of allowing DSLs to such an extent that Guido changes his mind, but that's not yet proven (and it's not really something that's been argued here yet). And it does change the discussion from being about who prefers which syntax, to being about where we want the language to go in terms of DSLs. Personally, I quite like limited DSL support (things like allowing no-parenthesis function calls can make it possible to write code that uses functions as if they were keywords). But it does impose a burden on people supporting the code because they have to understand the non-standard syntax. So I'm happy with Python's current choice to not go down that route, even though I do find it occasionally limiting. If I needed PythonQL features, I'd personally find Brice's class-based approach quite readable/acceptable. I find PythonQL form nice also, but not enough of an advantage to warrant all the extra keywords/syntax etc. Paul From ram at rachum.com Mon Mar 27 08:50:38 2017 From: ram at rachum.com (Ram Rachum) Date: Mon, 27 Mar 2017 14:50:38 +0200 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json Message-ID: Hi guys, What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes? This would make writing / reading JSON to a file a one liner instead of a two-line with clause. Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Mon Mar 27 08:51:17 2017 From: ram at rachum.com (Ram Rachum) Date: Mon, 27 Mar 2017 14:51:17 +0200 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: Oh, and also it saves you from having to import json. On Mon, Mar 27, 2017 at 2:50 PM, Ram Rachum wrote: > Hi guys, > > What do you think about adding methods pathlib.Path.write_json and > pathlib.Path.read_json , similar to write_text, write_bytes, read_text, > read_bytes? > > This would make writing / reading JSON to a file a one liner instead of a > two-line with clause. > > > Thanks, > Ram. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Mar 27 08:56:35 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 27 Mar 2017 13:56:35 +0100 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: On 27 March 2017 at 13:50, Ram Rachum wrote: > This would make writing / reading JSON to a file a one liner instead of a > two-line with clause. That hardly seems like a significant benefit... Paul From steve.dower at python.org Mon Mar 27 10:04:58 2017 From: steve.dower at python.org (Steve Dower) Date: Mon, 27 Mar 2017 07:04:58 -0700 Subject: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json In-Reply-To: References: Message-ID: It was enough of a benefit for text (and I never forget the argument order for writing text to a file, unlike json.dump(file_or_data?, data_or_file?) ) +1 Top-posted from my Windows Phone -----Original Message----- From: "Paul Moore" Sent: ?3/?27/?2017 5:57 To: "Ram Rachum" Cc: "python-ideas" Subject: Re: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json On 27 March 2017 at 13:50, Ram Rachum wrote: > This would make writing / reading JSON to a file a one liner instead of a > two-line with clause. That hardly seems like a significant benefit... Paul _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From markusmeskanen at gmail.com Mon Mar 27 10:08:58 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Mon, 27 Mar 2017 17:08:58 +0300 Subject: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json In-Reply-To: References: Message-ID: -1, should we also include write_ini, write_yaml, etc? A class cannot account for everyone who wants to use it in different ways. On Mar 27, 2017 17:07, "Steve Dower" wrote: > It was enough of a benefit for text (and I never forget the argument order > for writing text to a file, unlike json.dump(file_or_data?, data_or_file?) ) > > +1 > > Top-posted from my Windows Phone > ------------------------------ > From: Paul Moore > Sent: ?3/?27/?2017 5:57 > To: Ram Rachum > Cc: python-ideas > Subject: Re: [Python-ideas] Add pathlib.Path.write_json > andpathlib.Path.read_json > > On 27 March 2017 at 13:50, Ram Rachum wrote: > > This would make writing / reading JSON to a file a one liner instead of a > > two-line with clause. > > That hardly seems like a significant benefit... > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Mon Mar 27 10:33:24 2017 From: donald at stufft.io (Donald Stufft) Date: Mon, 27 Mar 2017 10:33:24 -0400 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> > On Mar 27, 2017, at 8:50 AM, Ram Rachum wrote: > > What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes? > -1, I also think that write_* and read_* were mistakes to begin with. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Mar 27 10:36:15 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 27 Mar 2017 15:36:15 +0100 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: On 27 March 2017 at 15:33, Donald Stufft wrote: > What do you think about adding methods pathlib.Path.write_json and > pathlib.Path.read_json , similar to write_text, write_bytes, read_text, > read_bytes? > > > > -1, I also think that write_* and read_* were mistakes to begin with. Text is (much) more general-use than JSON. Paul From ram at rachum.com Mon Mar 27 10:40:52 2017 From: ram at rachum.com (Ram Rachum) Date: Mon, 27 Mar 2017 16:40:52 +0200 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: Another idea: Maybe make json.load and json.dump support Path objects? On Mon, Mar 27, 2017 at 4:36 PM, Paul Moore wrote: > On 27 March 2017 at 15:33, Donald Stufft wrote: > > What do you think about adding methods pathlib.Path.write_json and > > pathlib.Path.read_json , similar to write_text, write_bytes, read_text, > > read_bytes? > > > > > > > > -1, I also think that write_* and read_* were mistakes to begin with. > > Text is (much) more general-use than JSON. > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From donald at stufft.io Mon Mar 27 10:42:39 2017 From: donald at stufft.io (Donald Stufft) Date: Mon, 27 Mar 2017 10:42:39 -0400 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: > On Mar 27, 2017, at 10:36 AM, Paul Moore wrote: > > On 27 March 2017 at 15:33, Donald Stufft > wrote: >> What do you think about adding methods pathlib.Path.write_json and >> pathlib.Path.read_json , similar to write_text, write_bytes, read_text, >> read_bytes? >> >> >> >> -1, I also think that write_* and read_* were mistakes to begin with. > > Text is (much) more general-use than JSON. Sure. I also think touch() and all the others are the same :) I think they?re just an unfortunate detritus of a time before PathLike and that it?s super weird to have some operations you do to a file path (compared to things you do to generate, modify, or resolve a path) be hung off of the Path object and every other be an independent thing that takes it as an input. I?d find it equally weird if dictionary objects supported a print() or a .json() method. ? Donald Stufft -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Mon Mar 27 10:43:19 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 27 Mar 2017 17:43:19 +0300 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: On 27.03.17 15:50, Ram Rachum wrote: > Hi guys, > > What do you think about adding methods pathlib.Path.write_json and > pathlib.Path.read_json , similar to write_text, write_bytes, read_text, > read_bytes? > > This would make writing / reading JSON to a file a one liner instead of > a two-line with clause. Good try, but you have published this idea 5 days ahead of schedule. From markusmeskanen at gmail.com Mon Mar 27 10:44:07 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Mon, 27 Mar 2017 17:44:07 +0300 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: Another idea: Maybe make json.load and json.dump support Path objects? Much better. Or maybe add json.load_path and dump_path -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Mar 27 10:59:54 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 27 Mar 2017 15:59:54 +0100 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: On 27 March 2017 at 15:40, Ram Rachum wrote: > Another idea: Maybe make json.load and json.dump support Path objects? If they currently supported filenames, I'd say that's a reasonable extension. Given that they don't, it still seems like more effort than it's worth to save a few characters with path.open('w'): json.dump(obj, f) with path.open() as f: obj = json.load(f) Paul From steve at pearwood.info Mon Mar 27 11:04:29 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Mar 2017 02:04:29 +1100 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: <20170327150428.GD27969@ando.pearwood.info> On Mon, Mar 27, 2017 at 02:50:38PM +0200, Ram Rachum wrote: > Hi guys, > > What do you think about adding methods pathlib.Path.write_json and > pathlib.Path.read_json , similar to write_text, write_bytes, read_text, > read_bytes? > > This would make writing / reading JSON to a file a one liner instead of a > two-line with clause. Reading/writing JSON is already a one liner, for people who care about writing one liners: obj = json.load(open("foo.json")) json.dump(obj, open("foo.json")) Pathlib exists as an OO interface to low-level path and file operations. It understands how to read and write to files, but it doesn't understand the content of those files. I don't think it should. Of course pathlib can already read JSON, or for that matter ReST text or JPG binary files. It can read anything as text or bytes, including JSON: some_path.write_text(json.dumps(obj)) json.loads(some_path.read_text()) I don't think it should be pathlib's responsibility to deal with the file format (besides text). Today you want to add JSON support. What about XML and plists and ini files? Tomorrow you'll ask for HTML support, next week someone will want pathlib to support .wav files as a one liner, and before you know it pathlib is responsible for a hundred different file formats with separate read_* and write_* methods. That's not pathlib's responsibility, and there is nothing wrong with writing two lines of code. -- Steve From simon at acoeuro.com Mon Mar 27 11:17:40 2017 From: simon at acoeuro.com (Simon D.) Date: Mon, 27 Mar 2017 17:17:40 +0200 Subject: [Python-ideas] What about regexp string litterals : re".*" ? Message-ID: <20170327151740.GJ6883@tabr.siet.ch> Hello, After some french discussions about this idea, I subscribed here to suggest adding a new string litteral, for regexp, inspired by other types like : u"", r"", b"", br"", f""? The regexp string litteral could be represented by : re"" It would ease the use of regexps in Python, allowing to have some regexp litterals, like in Perl or JavaScript. We may end up with an integration like : >>> import re >>> if re".k" in 'ok': ... print "ok" ok >>> Regexps are part of the language in Perl, and the rather complicated integration of regexp in other languages, especially in Python, is something that comes up easily in language comparing discussion. I've always felt JavaScript integration being half the way it should, and new string litterals types in Python (like f"") looked like a good compromise to have a tight integration of regexps without asking to make them part of the language (as I imagine it has already been discussed years ago, and obviously denied?). As per XKCD illustration, using a regexp may be a problem on its own, but really, the "each-language a new and complicated approach" is another difficulty, of the level of writing regexps I think. And then, when you get the trick for Python, it feels to me still to much letters to type regarding the numerous problems one can solve using regexps. I know regexps are slower than string-based workflow (like .startswith) but regexps can do the most and the least, so they are rapide to come up with, once you started to think with them. As Python philosophy is to spare brain-cycles, sacrificing CPU-cycles, allowing to easily use regexps is a brain-cycle savior trick. What do you think ? -- Simon Descarpentries +336 769 702 53 http://acoeuro.com From chris.barker at noaa.gov Mon Mar 27 11:34:07 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 27 Mar 2017 08:34:07 -0700 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: On Mon, Mar 27, 2017 at 7:59 AM, Paul Moore wrote: > On 27 March 2017 at 15:40, Ram Rachum wrote: > > Another idea: Maybe make json.load and json.dump support Path objects? > > If they currently supported filenames, I'd say that's a reasonable > extension. Given that they don't, it still seems like more effort than > it's worth to save a few characters > Sure, but they probably should -- it's a REALLY common (most common) use-case to read and write JSON from a file. And many APIs support "filename or open file-like object". I'd love to see that added, and, or course, support for Path objects as well. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Mon Mar 27 10:48:17 2017 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 27 Mar 2017 10:48:17 -0400 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: <308b7c0e-d657-d413-de1d-d3ae2b4b3a7a@trueblade.com> On 3/27/17 10:40 AM, Ram Rachum wrote: > Another idea: Maybe make json.load and json.dump support Path objects? json.dump requires open file objects, not strings or Paths representing filenames. But does this not already do what you want: Path('foo.json').write_text(json.dumps(obj)) ? Eric. From storchaka at gmail.com Mon Mar 27 11:39:19 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 27 Mar 2017 18:39:19 +0300 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: <20170327151740.GJ6883@tabr.siet.ch> References: <20170327151740.GJ6883@tabr.siet.ch> Message-ID: On 27.03.17 18:17, Simon D. wrote: > After some french discussions about this idea, I subscribed here to > suggest adding a new string litteral, for regexp, inspired by other > types like : u"", r"", b"", br"", f""? > > The regexp string litteral could be represented by : re"" > > It would ease the use of regexps in Python, allowing to have some regexp > litterals, like in Perl or JavaScript. There are several regular expression libraries for Python. One of them is included in the stdlib, but this is not the first regular expression library in the stdlib and may be not the last. Particular project can choose using an alternative regular expression library (because it has additional features or is faster for particular cases). From p.f.moore at gmail.com Mon Mar 27 11:41:34 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 27 Mar 2017 16:41:34 +0100 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: <308b7c0e-d657-d413-de1d-d3ae2b4b3a7a@trueblade.com> References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> <308b7c0e-d657-d413-de1d-d3ae2b4b3a7a@trueblade.com> Message-ID: On 27 March 2017 at 15:48, Eric V. Smith wrote: > On 3/27/17 10:40 AM, Ram Rachum wrote: >> >> Another idea: Maybe make json.load and json.dump support Path objects? > > > json.dump requires open file objects, not strings or Paths representing > filenames. > > But does this not already do what you want: > > Path('foo.json').write_text(json.dumps(obj)) > ? Indeed. There have now been a few posts quoting ways of reading and writing JSON, all of which are pretty short (if that matters). Do we *really* need another way? Paul From ethan at stoneleaf.us Mon Mar 27 11:50:06 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 27 Mar 2017 08:50:06 -0700 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: <20170327150428.GD27969@ando.pearwood.info> References: <20170327150428.GD27969@ando.pearwood.info> Message-ID: <58D934AE.6080706@stoneleaf.us> On 03/27/2017 08:04 AM, Steven D'Aprano wrote: > On Mon, Mar 27, 2017 at 02:50:38PM +0200, Ram Rachum wrote: >> What do you think about adding methods pathlib.Path.write_json and >> pathlib.Path.read_json , similar to write_text, write_bytes, read_text, >> read_bytes? > > That's not pathlib's responsibility, and there is nothing wrong with > writing two lines of code. +1 From bruce at leban.us Mon Mar 27 12:43:53 2017 From: bruce at leban.us (Bruce Leban) Date: Mon, 27 Mar 2017 09:43:53 -0700 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: I'm not in favor of this idea for the reason mentioned by many of the other posters. BUT ... this does bring up something missing from json readers: *the ability to read one json object from the input rather than reading the entire input* and attempting to interpret it as one object. For my use case, it would be sufficient to read whole lines only but I can imagine other use cases. The basic rule would be to read as much of the input as necessary (and no more) to read a single json object, ignoring leading white space. In practical terms: - if the first character is [ or { or " read to the matching ] or } or " - otherwise if the first character is a digit or '-' read as many characters as possible to parse a number - otherwise attempt to match 'true', 'false' or 'null' - otherwise fail --- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS) On Mon, Mar 27, 2017 at 5:50 AM, Ram Rachum wrote: > Hi guys, > > What do you think about adding methods pathlib.Path.write_json and > pathlib.Path.read_json , similar to write_text, write_bytes, read_text, > read_bytes? > > This would make writing / reading JSON to a file a one liner instead of a > two-line with clause. > > > Thanks, > Ram. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Mar 27 16:33:11 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 27 Mar 2017 13:33:11 -0700 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: On Mon, Mar 27, 2017 at 9:43 AM, Bruce Leban wrote: > I'm not in favor of this idea for the reason mentioned by many of the > other posters. BUT ... this does bring up something missing from json > readers: *the ability to read one json object from the input rather than > reading the entire input* and attempting to interpret it as one object. > I can't tell from the JSON spec (at least not quickly), but it is possible to have more than one object at the top level? Experimenting with the python json module seems to indicate that it is not -- you can only have one "thing" in a JSON file -- either an "object" or an array. then, of course you can arbitrarily nest stuff inside that top-level container. Since the nesting is arbitrary, I'm not sure it's clear how a one-object-at-a-time reader would work in the general case? -CHB > For my use case, it would be sufficient to read whole lines only but I can > imagine other use cases. > > The basic rule would be to read as much of the input as necessary (and no > more) to read a single json object, ignoring leading white space. > > In practical terms: > > - if the first character is [ or { or " read to the matching ] or } or > " > - otherwise if the first character is a digit or '-' read as many > characters as possible to parse a number > - otherwise attempt to match 'true', 'false' or 'null' > - otherwise fail > > > --- Bruce > Check out my puzzle book and get it free here: > http://J.mp/ingToConclusionsFree (available on iOS) > > > > On Mon, Mar 27, 2017 at 5:50 AM, Ram Rachum wrote: > >> Hi guys, >> >> What do you think about adding methods pathlib.Path.write_json and >> pathlib.Path.read_json , similar to write_text, write_bytes, read_text, >> read_bytes? >> >> This would make writing / reading JSON to a file a one liner instead of a >> two-line with clause. >> >> >> Thanks, >> Ram. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Mar 27 16:35:15 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 27 Mar 2017 21:35:15 +0100 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: On 27 March 2017 at 17:43, Bruce Leban wrote: > the ability to read one json object from the input rather than reading the > entire input Is this a well-defined idea? From a quick read of the JSON spec (which is remarkably short on details of how JSON is stored in files, etc) the only reference I can see is to a "JSON text" which is a JSON representation of a single value. There's nothing describing how multiple values would be stored in the same file/transmitted in the same stream. It's not unreasonable to assume "read one object, then read another" but without an analysis of the grammar, it's not 100% clear if the grammar supports that (you sort of have to assume that when you hit "the end of the object" you skip some whitespace then start on the next - but the spec doesn't say anything like that. Alternatively, it's just as reasonable to assume that json.load/json.loads expect to be passed a single "JSON text" as defined by the spec. If the spec was clear on how multiple objects in a single stream should be handled, then yes the json module should support that. But without anything explicit in the spec, it's not as obvious. What do other languages do? Paul From mertz at gnosis.cx Mon Mar 27 16:45:01 2017 From: mertz at gnosis.cx (David Mertz) Date: Mon, 27 Mar 2017 13:45:01 -0700 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: The format JSON lines (http://jsonlines.org/) is pretty widely used, but is an extension of JSON itself. Basically, it's the idea that you can put one object per physical line to allow incremental reading or spending of objects. It's a good idea, and I think the `json` module should support it. But it definitely doesn't belong in `pathlib`. On Mar 27, 2017 3:36 PM, "Paul Moore" wrote: > On 27 March 2017 at 17:43, Bruce Leban wrote: > > the ability to read one json object from the input rather than reading > the > > entire input > > Is this a well-defined idea? From a quick read of the JSON spec (which > is remarkably short on details of how JSON is stored in files, etc) > the only reference I can see is to a "JSON text" which is a JSON > representation of a single value. There's nothing describing how > multiple values would be stored in the same file/transmitted in the > same stream. It's not unreasonable to assume "read one object, then > read another" but without an analysis of the grammar, it's not 100% > clear if the grammar supports that (you sort of have to assume that > when you hit "the end of the object" you skip some whitespace then > start on the next - but the spec doesn't say anything like that. > Alternatively, it's just as reasonable to assume that > json.load/json.loads expect to be passed a single "JSON text" as > defined by the spec. > > If the spec was clear on how multiple objects in a single stream > should be handled, then yes the json module should support that. But > without anything explicit in the spec, it's not as obvious. What do > other languages do? > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Mar 27 16:46:39 2017 From: mertz at gnosis.cx (David Mertz) Date: Mon, 27 Mar 2017 13:46:39 -0700 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: This is a better link: https://en.m.wikipedia.org/wiki/JSON_Streaming On Mar 27, 2017 3:45 PM, "David Mertz" wrote: > The format JSON lines (http://jsonlines.org/) is pretty widely used, but > is an extension of JSON itself. Basically, it's the idea that you can put > one object per physical line to allow incremental reading or spending of > objects. > > It's a good idea, and I think the `json` module should support it. But it > definitely doesn't belong in `pathlib`. > > On Mar 27, 2017 3:36 PM, "Paul Moore" wrote: > >> On 27 March 2017 at 17:43, Bruce Leban wrote: >> > the ability to read one json object from the input rather than reading >> the >> > entire input >> >> Is this a well-defined idea? From a quick read of the JSON spec (which >> is remarkably short on details of how JSON is stored in files, etc) >> the only reference I can see is to a "JSON text" which is a JSON >> representation of a single value. There's nothing describing how >> multiple values would be stored in the same file/transmitted in the >> same stream. It's not unreasonable to assume "read one object, then >> read another" but without an analysis of the grammar, it's not 100% >> clear if the grammar supports that (you sort of have to assume that >> when you hit "the end of the object" you skip some whitespace then >> start on the next - but the spec doesn't say anything like that. >> Alternatively, it's just as reasonable to assume that >> json.load/json.loads expect to be passed a single "JSON text" as >> defined by the spec. >> >> If the spec was clear on how multiple objects in a single stream >> should be handled, then yes the json module should support that. But >> without anything explicit in the spec, it's not as obvious. What do >> other languages do? >> >> Paul >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Mon Mar 27 16:52:02 2017 From: barry at barrys-emacs.org (Barry) Date: Mon, 27 Mar 2017 21:52:02 +0100 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: References: Message-ID: <9109925F-FD5B-45CF-97F9-D7F27C7959D5@barrys-emacs.org> I took to using chcp 65001 This puts cmd.exe into unicode mode. Of course the python 3.6 make this uneccesary i understand. Barry > On 24 Mar 2017, at 15:41, Ryan Gonzalez wrote: > > Recently, I was working on a Windows GUI application that ends up running ffmpeg, and I wanted to see the command that was being run. However, the file name had a Unicode character in it (it's a Sawano song), and when I tried to print it to the console, it crashed during the encode/decode. (The encoding used in cmd doesn't support Unicode characters.) > > The workaround was to do: > > > print(mystring.encode(sys.stdout.encoding, errors='replace).decode(sys.stdout.encoding)) > > > Not fun, especially since this was *just* a debug print. > > The proposal: why not add an 'errors' argument to print? That way, I could've just done: > > > print(mystring, errors='replace') > > > without having to worry about it crashing. > > -- > Ryan (????) > Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else > http://refi64.com > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Mar 27 17:27:33 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 27 Mar 2017 16:27:33 -0500 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: On Mon, Mar 27, 2017 at 10:34 AM, Chris Barker wrote: > On Mon, Mar 27, 2017 at 7:59 AM, Paul Moore wrote: > >> On 27 March 2017 at 15:40, Ram Rachum wrote: >> > Another idea: Maybe make json.load and json.dump support Path objects? >> >> If they currently supported filenames, I'd say that's a reasonable >> extension. Given that they don't, it still seems like more effort than >> it's worth to save a few characters >> > > Sure, but they probably should -- it's a REALLY common (most common) > use-case to read and write JSON from a file. And many APIs support > "filename or open file-like object". > > I'd love to see that added, and, or course, support for Path objects as > well. > https://docs.python.org/2/library/json.html#encoders-and-decoders # https://docs.python.org/2/library/json.html#json.JSONEncoder class PathJSONEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, pathlib.Path): return unicode(obj) # ? (what about bytes) return OrderedDict(( ('@type', 'pydatatypes:pathlib.Path'), # JSON-LD ('path', unicode(obj)), ) return json.JSONEncoder.default(self, obj) # https://docs.python.org/2/library/json.html#json.JSONDecoder def as_pathlib_Path(obj): if obj.get('@type') == 'pydatatypes:pathlib.Path': return pathlib.Path(obj.get('path')) return obj def read_json(self, **kwargs): object_pairs_hook = kwargs.pop('object_pairs_hook', collections.OrderedDict) # OrderedDefaultDict object_hook = kwargs.pop('object_hook', as_pathlib_Path) encoding = kwargs.pop('encoding', 'utf8') with codecs.open(self, 'r ', encoding=encoding) as _file: return json.load(_file, object_pairs_hook=object_pairs_hook, object_hook=object_hook, **kwargs) def write_json(self, obj, **kwargs): kwargs['cls'] = kwargs.pop('cls', PathJSONEncoder) encoding = kwargs.pop('encoding', 'utf8') with codecs.open(self, 'w', encoding=encoding) as _file: return json.dump(obj, _file, **kwargs) def test_pathlib_json_encoder_decoder(): p = pathlib.Path('./test.json') obj = dict(path=p, _path=str(unicode(p))) p.write_json(obj) obj2 = p.read_json() assert obj['path'] == obj2['path'] assert isinstance(obj['path'], pathlib.Path) https://github.com/jaraco/path.py/blob/master/path.py#L735 open() bytes() chunks() write_bytes() text() def write_text(self, text, encoding=None, errors='strict', linesep=os.linesep, append=False): lines() write_lines() read_hash() read_md5() read_hexhash() > > -CHB > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Mar 27 17:41:20 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 27 Mar 2017 16:41:20 -0500 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: FWIW, pyline could produce streaming JSON w/ json.dumps(indent=0), but because indent>0, there are newlines. pydoc json | pyline '{"a":l} if "json" in l.lower() else None' -O json pydoc json | pyline -r '.*JSON.*' 'rgx and line' -O json It's a similar issue: what are good default JSON encoding/decoding settings? # loads/JSONDecoder file.encoding # UTF-8 object_pairs_hook object_hook # dumps/JSONEncoder file.encoding # UTF-8 cls separators indent - [ ] ENH: pyline: add 'jsonlines' as an {output,} format >From https://twitter.com/raymondh/status/842777864193769472 : #python tip: Set separators=(',', ':') to dump JSON more compactly. > >>> json.dumps({'a':1, 'b':2}, separators=(',',':')) > '{"a":1,"b":2}' On Mon, Mar 27, 2017 at 3:46 PM, David Mertz wrote: > This is a better link: https://en.m.wikipedia.org/wiki/JSON_Streaming > > On Mar 27, 2017 3:45 PM, "David Mertz" wrote: > >> The format JSON lines (http://jsonlines.org/) is pretty widely used, but >> is an extension of JSON itself. Basically, it's the idea that you can put >> one object per physical line to allow incremental reading or spending of >> objects. >> >> It's a good idea, and I think the `json` module should support it. But it >> definitely doesn't belong in `pathlib`. >> >> On Mar 27, 2017 3:36 PM, "Paul Moore" wrote: >> >>> On 27 March 2017 at 17:43, Bruce Leban wrote: >>> > the ability to read one json object from the input rather than reading >>> the >>> > entire input >>> >>> Is this a well-defined idea? From a quick read of the JSON spec (which >>> is remarkably short on details of how JSON is stored in files, etc) >>> the only reference I can see is to a "JSON text" which is a JSON >>> representation of a single value. There's nothing describing how >>> multiple values would be stored in the same file/transmitted in the >>> same stream. It's not unreasonable to assume "read one object, then >>> read another" but without an analysis of the grammar, it's not 100% >>> clear if the grammar supports that (you sort of have to assume that >>> when you hit "the end of the object" you skip some whitespace then >>> start on the next - but the spec doesn't say anything like that. >>> Alternatively, it's just as reasonable to assume that >>> json.load/json.loads expect to be passed a single "JSON text" as >>> defined by the spec. >>> >>> If the spec was clear on how multiple objects in a single stream >>> should be handled, then yes the json module should support that. But >>> without anything explicit in the spec, it's not as obvious. What do >>> other languages do? >>> >>> Paul >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Mon Mar 27 11:51:19 2017 From: flying-sheep at web.de (Philipp A.) Date: Mon, 27 Mar 2017 15:51:19 +0000 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: Ram Rachum schrieb am Mo., 27. M?rz 2017 um 16:42 Uhr: > Another idea: Maybe make json.load and json.dump support Path objects? > yes, all string-path expecting stdlib APIs should support PEP 519 https://www.python.org/dev/peps/pep-0519/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Mar 27 19:35:20 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 28 Mar 2017 12:35:20 +1300 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: Message-ID: <58D9A1B8.5020101@canterbury.ac.nz> Paul Moore wrote: > Is this a well-defined idea? ... There's nothing describing how > multiple values would be stored in the same file/transmitted in the > same stream. I think this is something that's outside the scope of the spec. But since the grammar makes it clear when you've reached the end of a value, it seems entirely reasonable for a parser to just stop reading from the stream at that point, and leave whatever remains for the application to deal with as it sees fit. The application can then choose to immediately read another value from the same stream if it wants. -- Greg From victor.stinner at gmail.com Mon Mar 27 19:42:19 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 28 Mar 2017 01:42:19 +0200 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: <20170327150428.GD27969@ando.pearwood.info> References: <20170327150428.GD27969@ando.pearwood.info> Message-ID: 2017-03-27 17:04 GMT+02:00 Steven D'Aprano : > Of course pathlib can already read JSON, or for that matter ReST text > or JPG binary files. It can read anything as text or bytes, including > JSON: > > some_path.write_text(json.dumps(obj)) > json.loads(some_path.read_text()) Note: You should specify the encoding: some_path.write_text(json.dumps(obj), encoding='utf8') json.loads(some_path.read_text(encoding='utf8')) > I don't think it should be pathlib's responsibility to deal with the > file format (besides text). Right. Victor From eryksun at gmail.com Mon Mar 27 21:09:08 2017 From: eryksun at gmail.com (eryk sun) Date: Tue, 28 Mar 2017 01:09:08 +0000 Subject: [Python-ideas] Adding an 'errors' argument to print In-Reply-To: <9109925F-FD5B-45CF-97F9-D7F27C7959D5@barrys-emacs.org> References: <9109925F-FD5B-45CF-97F9-D7F27C7959D5@barrys-emacs.org> Message-ID: On Mon, Mar 27, 2017 at 8:52 PM, Barry wrote: > I took to using > > chcp 65001 > > This puts cmd.exe into unicode mode. conhost.exe hosts the console, and chcp.com is a console app that calls GetConsoleCP, SetConsoleCP and SetConsoleOutputCP to show or modify the console's input and output codepages. It doesn't support changing them separately. cmd.exe is just another console client, no different from python.exe or powershell.exe in this regard. Also, it's unrelated to how Python uses the console, but for the record, cmd has used the console's wide-character API since it was ported from OS/2 in the early 90s. Back then the console was hosted using threads in the csrss.exe system process, which made sense because the windowing system was hosted there. When they moved most of the window manager to kernel mode in NT 4 (1996), the console was mostly left behind in csrss.exe. It wasn't until Windows 7 that it found a new home in conhost.exe. In Windows 8 it got a real device driver instead of using fake file handles. In Windows 10 it was updated to be less of a franken-window -- e.g. now it has line-wrapped selection and text reflowing. Using codepage 65001 (UTF-8) in a console app has a couple of annoying bugs in the console itself, and another due to flushing of C FILE streams. For example, reading text that has even a single non-ASCII character will fail because conhost's encoding buffer is too small. It handles the error by returning a read of 0 bytes. That's EOF, so Python's REPL quits; input() raises EOFError; and stdin.read() returns an empty string. Microsoft should fix this in Windows 10, and probably will eventually. The Linux subsystem needs UTF-8, and it's silly that the console doesn't allow entering non-ASCII text in Linux programs. As was already recommended, I suggest using the wide-character API via win_unicode_console in 2.7 and 3.5. In 3.6 we get the wide-character API automatically thanks to Steve Dower's io._WindowsConsoleIO class. From steve at pearwood.info Mon Mar 27 23:02:05 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 28 Mar 2017 14:02:05 +1100 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: <20170327151740.GJ6883@tabr.siet.ch> References: <20170327151740.GJ6883@tabr.siet.ch> Message-ID: <20170328030205.GE27969@ando.pearwood.info> On Mon, Mar 27, 2017 at 05:17:40PM +0200, Simon D. wrote: > The regexp string litteral could be represented by : re"" > > It would ease the use of regexps in Python, allowing to have some regexp > litterals, like in Perl or JavaScript. > > We may end up with an integration like : > > >>> import re > >>> if re".k" in 'ok': > ... print "ok" > ok I dislike the suggested syntax re".k". It looks ugly and not different enough from a raw string. I can easily see people accidentally writing: if r".k" in 'ok': ... and wondering why their regex isn't working. Javascript uses /regex/ as a literal syntax for creating RegExp objects. That's the closest equivalent to the way Python would have to operate, although I don't think we can use the /.../ syntax without breaking the rule that Python's parser will not be more complex than LL(1). So I think /.../ is definitely out. Perl 6 uses m/regex/ and a number of other variations: https://docs.perl6.org/language/regexes I doubt that this will actually be useful. It *seems* useful if you just write trivial regexes like your example, but without Perl's rich set of terse (cryptic?) operators, I don't know that literal regexes makes enough difference to be worth the trouble. There's not very much difference between (say) these: mo = re.search(r'.k', mystring) if mo: print(mo.group()) mo = re.'.k'.search(mystring) if mo: print(mo.group()) You effectively save two parentheses, that's all. That doesn't seem like much of a win for introducing new syntax. Can you show some example code where a regex literal will have a worthwhile advantage? > Regexps are part of the language in Perl, and the rather complicated > integration of regexp in other languages, especially in Python, is > something that comes up easily in language comparing discussion. Surely you are joking? Regex integration in Python is simple. Regular expression objects are ordinary objects, like lists and dicts and floats. The only difference is that you don't call the Regex object constructor directly, you either pass a string to a module level function re.match(r'my regex', mystring) or you create a regex object: regex = re.compile(r'my regex') regex.match(mystring) That's very neat, Pythonic and simple. The regex itself is very close to the same syntax uses by Perl, Javascript or other variations, the only complication is that due to Python's escaping rules you should use a raw string r'' instead of doubling up all backslashes. I wouldn't call that "rather complicated" -- it is a lot less complicated than Perl: - m// can be abbreviated // - when do you use // directly and when do you use qr// ? - s/// operator implicitly defines a regex In Perl 6, I *think* they use rx// instead of qr//, or are they different things? Both m// and the s/// operator can use arbitrary delimiters, e.g. ! or , (but not : or parentheses) instead of the slashes, and m// regexes will implicitly match against $_ if you don't explicitly match against something else. Compared to Perl, I don't think Python's regexes are complicated. -- Steve From markusmeskanen at gmail.com Mon Mar 27 23:24:22 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Tue, 28 Mar 2017 06:24:22 +0300 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: <20170328030205.GE27969@ando.pearwood.info> References: <20170327151740.GJ6883@tabr.siet.ch> <20170328030205.GE27969@ando.pearwood.info> Message-ID: On Mar 28, 2017 06:08, "Steven D'Aprano" wrote: On Mon, Mar 27, 2017 at 05:17:40PM +0200, Simon D. wrote: > The regexp string litteral could be represented by : re"" > > It would ease the use of regexps in Python, allowing to have some regexp > litterals, like in Perl or JavaScript. > > We may end up with an integration like : > > >>> import re > >>> if re".k" in 'ok': > ... print "ok" > ok I dislike the suggested syntax re".k". It looks ugly and not different enough from a raw string. I can easily see people accidentally writing: if r".k" in 'ok': ... and wondering why their regex isn't working. While I agree with most of your arguments, surely you must be the one joking here? "Ugly" is obviously a matter of opinion, I personally find the proposed syntax more beautiful than the // used in many other languages. But claiming it's bad because people would mix it up with raw strings and people not realizing is nonsense. Not only does it look very different, but attempting to call match() or any other regex method on it would surely give out a reasonable error: AttributeError: 'str' object has no attribute 'match' Which _in the worst case scenario_ results into googling where the top rated StackOverflow question clearly explains the difference between r'' and re'' -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavol.lisy at gmail.com Mon Mar 27 23:59:22 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Tue, 28 Mar 2017 05:59:22 +0200 Subject: [Python-ideas] Proposal: Query language extension to Python (PythonQL) In-Reply-To: References: <629E7A2A-7668-4A9B-A812-F846A2785E3D@gmail.com> <416e2e44-5949-d33a-1cdd-2a3c50508efd@mozilla.com> <9FDAF5C1-2871-4AF9-A3D2-0219279F65D1@icloud.com> Message-ID: On 3/27/17, Chris Angelico wrote: > On Mon, Mar 27, 2017 at 12:26 PM, Terry Reedy wrote: >> It might be possible (or not!) to make the clause-heading words like >> 'where' >> or 'groupby' (this would have to be one word) recognized as special only >> in >> the context of starting a new comprehension clause. The precedents for >> 'keyword in context' are 'as', 'async', and 'await'. But these were >> temporary and a nuisance (both to code and for syntax highlighting) and I >> would not be in favor of repeating for this case. > > Apologies if it's already been mentioned, but is MacroPy able to do > this without introducing actual language keywords? > > ChrisA In very first mail (section "Current Status") in this thread was mentioned that they use "encoding hack" now. PL From rosuav at gmail.com Tue Mar 28 01:37:16 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 28 Mar 2017 16:37:16 +1100 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328030205.GE27969@ando.pearwood.info> Message-ID: On Tue, Mar 28, 2017 at 2:24 PM, Markus Meskanen wrote: > While I agree with most of your arguments, surely you must be the one joking > here? "Ugly" is obviously a matter of opinion, I personally find the > proposed syntax more beautiful than the // used in many other languages. But > claiming it's bad because people would mix it up with raw strings and people > not realizing is nonsense. Not only does it look very different, but > attempting to call match() or any other regex method on it would surely give > out a reasonable error: > > AttributeError: 'str' object has no attribute 'match' > > Which _in the worst case scenario_ results into googling where the top rated > StackOverflow question clearly explains the difference between r'' and re'' Yes, but if the "in" operator is used, it would still work, because r"..." is a str, and "str" in "string" is meaningful. But I think a better solution will be for regex literals to be syntax-highlighted differently. If they're a truly-supported syntactic feature, they can be made visually different in your editor, making the distinction blatantly obvious. That said, though, I'm -1 on this. Currently, every prefix letter has its own meaning, and broadly speaking, combining them combines their meanings. An re"..." literal should be a raw "e-string", whatever that is, so I would expect that e"..." is the same kind of thing but with different backslash handling. ChrisA From markusmeskanen at gmail.com Tue Mar 28 01:45:21 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Tue, 28 Mar 2017 08:45:21 +0300 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328030205.GE27969@ando.pearwood.info> Message-ID: On Tue, Mar 28, 2017 at 8:37 AM, Chris Angelico wrote: > > Yes, but if the "in" operator is used, it would still work, because > r"..." is a str, and "str" in "string" is meaningful. > > But I think a better solution will be for regex literals to be > syntax-highlighted differently. If they're a truly-supported syntactic > feature, they can be made visually different in your editor, making > the distinction blatantly obvious. > > That said, though, I'm -1 on this. Currently, every prefix letter has > its own meaning, and broadly speaking, combining them combines their > meanings. An re"..." literal should be a raw "e-string", whatever that > is, so I would expect that e"..." is the same kind of thing but with > different backslash handling. > > Fair enough, I haven't followed this thread too closely and didn't consider the "in" operator being used. Even then I find it unlikely that confusing re'...' with r'...' and not noticing would turn out to be an issue. That being said, I'm also -1 on this, especially now after your point on "e-string". Adding these re-strings would straight out prevent e-string from ever being implemented. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Mar 28 02:57:50 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 28 Mar 2017 09:57:50 +0300 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: <58D9A1B8.5020101@canterbury.ac.nz> References: <58D9A1B8.5020101@canterbury.ac.nz> Message-ID: On 28.03.17 02:35, Greg Ewing wrote: > Paul Moore wrote: >> Is this a well-defined idea? ... There's nothing describing how >> multiple values would be stored in the same file/transmitted in the >> same stream. > > I think this is something that's outside the scope of the spec. > > But since the grammar makes it clear when you've reached the end > of a value, it seems entirely reasonable for a parser to just > stop reading from the stream at that point, and leave whatever > remains for the application to deal with as it sees fit. The > application can then choose to immediately read another value > from the same stream if it wants. You can determine the end of integer literal only after reading a character past the end of the integer literal. This there is not a way to put back a character, it will be lost for following readers. And currently json.load() is implemented by reading all file content at once and passing it to json.loads(). Different implementation would be much more complex (if we don't want to loss the performance). From simon at acoeuro.com Tue Mar 28 03:54:34 2017 From: simon at acoeuro.com (Simon D.) Date: Tue, 28 Mar 2017 09:54:34 +0200 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> Message-ID: <20170328075434.GL2757@tabr.siet.ch> * Serhiy Storchaka [2017-03-27 18:39:19 +0300]: > There are several regular expression libraries for Python. One of them is > included in the stdlib, but this is not the first regular expression library > in the stdlib and may be not the last. Particular project can choose using > an alternative regular expression library (because it has additional > features or is faster for particular cases). > I believe that the u"" notation in Python 2.7 is defined by while importing the unicode_litterals module. Each regexp lib could provide its instanciation of regexp litteral notation. And if only the default one does, it would still be won for the beginers, and the majority of persons using the stdlib. -- Simon Descarpentries +336 769 702 53 http://s.d12s.fr From simon at acoeuro.com Tue Mar 28 03:56:05 2017 From: simon at acoeuro.com (Simon D.) Date: Tue, 28 Mar 2017 09:56:05 +0200 Subject: [Python-ideas] What about regexp string litterals : m".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328030205.GE27969@ando.pearwood.info> Message-ID: <20170328075605.GM2757@tabr.siet.ch> * Chris Angelico [2017-03-28 16:37:16 +1100]: > But I think a better solution will be for regex literals to be > syntax-highlighted differently. If they're a truly-supported syntactic > feature, they can be made visually different in your editor, making > the distinction blatantly obvious. > > That said, though, I'm -1 on this. Currently, every prefix letter has > its own meaning, and broadly speaking, combining them combines their > meanings. An re"..." literal should be a raw "e-string", whatever that > is, so I would expect that e"..." is the same kind of thing but with > different backslash handling. First, I would like to state that the "module-static" version of regexp functions, avoiding the compile step, are a great idea. (e.g. : mo = re.search(r'.k', mystring) ) The str integrated one also, but maybe confusing, which regexp lib is used ? (must be the default one). Then, re"" being two letters looks like a real problem. Lets pick one amongs the 22 remaining free alphabet letters. What about : - g"", x"" (like in regex) ? - m"" (like shawn for Perl, meaming Match ?) - q"" (for Query ?) - k"" (in memory of Stephen Cole Kleene ? https://en.wikipedia.org/wiki/Regular_expression) - /"" (to be half the way toward /regexp/ syntax) - ~"" ?"" (other symbols, I avoid regexp-starting symbols, would be ugly in real usage) And what about an approach with flag firsts ? (or where to put them ?) : i"" (regexp with ignorecase flag on) AILMSX"" (regexp with all flags on) It would consume a lot of letters, but would use it for a good reason :-) Personnally, I think a JavaScript-like syntaxe would be great, and I feel it as asking too much? : - it would naturally be highlihted differently ; - it would not be the first (happy) similarity (https://hackernoon.com/javascript-vs-python-in-2017-d31efbb641b4#.ky9it5hph) - its a working integration, including flag matters. -- Simon Descarpentries +336 769 702 53 http://s.d12s.fr From p.f.moore at gmail.com Tue Mar 28 04:31:04 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 28 Mar 2017 09:31:04 +0100 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: <20170328075434.GL2757@tabr.siet.ch> References: <20170327151740.GJ6883@tabr.siet.ch> <20170328075434.GL2757@tabr.siet.ch> Message-ID: On 28 March 2017 at 08:54, Simon D. wrote: > I believe that the u"" notation in Python 2.7 is defined by while > importing the unicode_litterals module. That's not true. The u"..." syntax is part of the language. from future import unicode_literals is something completely different. > Each regexp lib could provide its instanciation of regexp litteral > notation. The Python language has no way of doing that - user (or library) defined literals are not possible. > And if only the default one does, it would still be won for the > beginers, and the majority of persons using the stdlib. How? You've yet to prove that having a regex literal form is an improvement over re.compile(r'put your regex here'). You've asserted it, but that's a matter of opinion. We'd need evidence of real-life code that was clearly improved by the existence of your proposed construct. Paul From abedillon at gmail.com Wed Mar 29 16:30:18 2017 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 29 Mar 2017 15:30:18 -0500 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328075434.GL2757@tabr.siet.ch> Message-ID: My 2 cents is that regular expressions are pretty un-pythonic because of their horrible readability. I would much rather see Python adopt something like Verbal Expressions ( https://github.com/VerbalExpressions/PythonVerbalExpressions ) into the standard library than add special syntax support for normal REs. On Tue, Mar 28, 2017 at 3:31 AM, Paul Moore wrote: > On 28 March 2017 at 08:54, Simon D. wrote: > > I believe that the u"" notation in Python 2.7 is defined by while > > importing the unicode_litterals module. > > That's not true. The u"..." syntax is part of the language. from > future import unicode_literals is something completely different. > > > Each regexp lib could provide its instanciation of regexp litteral > > notation. > > The Python language has no way of doing that - user (or library) > defined literals are not possible. > > > And if only the default one does, it would still be won for the > > beginers, and the majority of persons using the stdlib. > > How? You've yet to prove that having a regex literal form is an > improvement over re.compile(r'put your regex here'). You've asserted > it, but that's a matter of opinion. We'd need evidence of real-life > code that was clearly improved by the existence of your proposed > construct. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at barrys-emacs.org Wed Mar 29 16:38:07 2017 From: barry at barrys-emacs.org (Barry Scott) Date: Wed, 29 Mar 2017 21:38:07 +0100 Subject: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json In-Reply-To: References: Message-ID: > On 27 Mar 2017, at 15:08, Markus Meskanen wrote: > > -1, should we also include write_ini, write_yaml, etc? > Markus, You illustrate why this is a bad design pattern to implement. It does not scale. I attended a talk at PYCON UK that talked to the point of using object composition rather then rich interfaces. I cannot recall the term that was used to cover this idea. I also think that its a mistake to open a text file from pathlib. -1 A pattern that allows pathlib.Path to be composed with content handling is an interesting idea. Maybe that should be explored? But that should be a separate topic. Barry > A class cannot account for everyone who wants to use it in different ways. > > On Mar 27, 2017 17:07, "Steve Dower" > wrote: > It was enough of a benefit for text (and I never forget the argument order for writing text to a file, unlike json.dump(file_or_data?, data_or_file?) ) > > +1 > > Top-posted from my Windows Phone > From: Paul Moore > Sent: ?3/?27/?2017 5:57 > To: Ram Rachum > Cc: python-ideas > Subject: Re: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json > > On 27 March 2017 at 13:50, Ram Rachum > wrote: > > This would make writing / reading JSON to a file a one liner instead of a > > two-line with clause. > > That hardly seems like a significant benefit... > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From markusmeskanen at gmail.com Wed Mar 29 16:59:10 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Wed, 29 Mar 2017 23:59:10 +0300 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328075434.GL2757@tabr.siet.ch> Message-ID: On Mar 29, 2017 23:31, "Abe Dillon" wrote: My 2 cents is that regular expressions are pretty un-pythonic because of their horrible readability. I would much rather see Python adopt something like Verbal Expressions ( https://github.com/VerbalExpressions/ PythonVerbalExpressions ) into the standard library than add special syntax support for normal REs. I've never heard of this before, looks *awesome*. Thanks, if it's as good as it sounds, I too would love something like this added to the standard library. -------------- next part -------------- An HTML attachment was scrubbed... URL: From prometheus235 at gmail.com Wed Mar 29 17:03:19 2017 From: prometheus235 at gmail.com (Nick Timkovich) Date: Wed, 29 Mar 2017 16:03:19 -0500 Subject: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json In-Reply-To: References: Message-ID: > > I attended a talk at PYCON UK that talked to the point of using object > composition > rather then rich interfaces. I cannot recall the term that was used to > cover this idea. > > Separating things by concern/abstraction (the storage vs. the serialization) results in easier-to-learn code, *especially* incrementally, as you can (for example) plug reading from a file, a socket, a database into the same JSON, INI, XML... functions. Learn N ways to read data, M ways to transform the data, and you can do N*M things with N+M knowledge. If the libraries start tightly coupling everything, you need to start going through N*M methods, then do it yourself anyways, because reader X doesn't support new-hotness-format Y directly. Perhaps less code could result from making objects "quack" alike, so instead of you doing the plumbing, the libraries themselves would. I recently was satisfied by being able to exchange with open('dump.txt') as f: for line in f:... with import gzip with gzip.open('dump.gz', 'rt') as f: for line in f:... and it just worked through the magic of file-like objects and context managers. Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Wed Mar 29 17:29:47 2017 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 29 Mar 2017 16:29:47 -0500 Subject: [Python-ideas] Add pathlib.Path.write_json and pathlib.Path.read_json In-Reply-To: References: <7962D629-8722-409A-904F-1D93F97E56ED@stufft.io> Message-ID: On Mon, Mar 27, 2017 at 4:27 PM, Wes Turner wrote: > > > On Mon, Mar 27, 2017 at 10:34 AM, Chris Barker > wrote: > >> On Mon, Mar 27, 2017 at 7:59 AM, Paul Moore wrote: >> >>> On 27 March 2017 at 15:40, Ram Rachum wrote: >>> > Another idea: Maybe make json.load and json.dump support Path objects? >>> >>> If they currently supported filenames, I'd say that's a reasonable >>> extension. Given that they don't, it still seems like more effort than >>> it's worth to save a few characters >>> >> >> Sure, but they probably should -- it's a REALLY common (most common) >> use-case to read and write JSON from a file. And many APIs support >> "filename or open file-like object". >> >> I'd love to see that added, and, or course, support for Path objects as >> well. >> > > > > > > https://docs.python.org/2/library/json.html#encoders-and-decoders > > # https://docs.python.org/2/library/json.html#json.JSONEncoder > > class PathJSONEncoder(json.JSONEncoder): > def default(self, obj): > if isinstance(obj, pathlib.Path): > return unicode(obj) # ? (what about bytes) > return OrderedDict(( > ('@type', 'pydatatypes:pathlib.Path'), # JSON-LD > ('path', unicode(obj)), ) > return json.JSONEncoder.default(self, obj) > > > # https://docs.python.org/2/library/json.html#json.JSONDecoder > def as_pathlib_Path(obj): > if obj.get('@type') == 'pydatatypes:pathlib.Path': > return pathlib.Path(obj.get('path')) > return obj > def as_pathlib_Path(obj): if hasattr(obj, 'get') and obj.get('@type') == 'pydatatypes:pathlib.Path': return pathlib.Path(obj.get('path')) return obj > > > def read_json(self, **kwargs): > object_pairs_hook = kwargs.pop('object_pairs_hook', > collections.OrderedDict) # OrderedDefaultDict > object_hook = kwargs.pop('object_hook', as_pathlib_Path) > encoding = kwargs.pop('encoding', 'utf8') > with codecs.open(self, 'r ', encoding=encoding) as _file: > return json.load(_file, > object_pairs_hook=object_pairs_hook, > object_hook=object_hook, > **kwargs) > > def write_json(self, obj, **kwargs): > kwargs['cls'] = kwargs.pop('cls', PathJSONEncoder) > encoding = kwargs.pop('encoding', 'utf8') > with codecs.open(self, 'w', encoding=encoding) as _file: > return json.dump(obj, _file, **kwargs) > > > def test_pathlib_json_encoder_decoder(): > p = pathlib.Path('./test.json') > obj = dict(path=p, _path=str(unicode(p))) > p.write_json(obj) > obj2 = p.read_json() > assert obj['path'] == obj2['path'] > assert isinstance(obj['path'], pathlib.Path) > should it be 'self' or 'obj'? > > > > https://github.com/jaraco/path.py/blob/master/path.py#L735 > open() > bytes() > chunks() > write_bytes() > text() > def write_text(self, text, encoding=None, errors='strict', > linesep=os.linesep, append=False): > lines() > write_lines() > > read_hash() > read_md5() > read_hexhash() > > > > >> >> -CHB >> >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Mar 29 21:16:48 2017 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 29 Mar 2017 20:16:48 -0500 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328075434.GL2757@tabr.siet.ch> Message-ID: I feel like that borders on a bit too wordy... Personally, I'd like to see something like Felix's regular definitions: http://felix-lang.org/share/src/web/tut/regexp_01.fdoc#Regular_definitions._h -- Ryan (????) Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else http://refi64.com On Mar 29, 2017 3:30 PM, "Abe Dillon" wrote: My 2 cents is that regular expressions are pretty un-pythonic because of their horrible readability. I would much rather see Python adopt something like Verbal Expressions ( https://github.com/VerbalExpressions/ PythonVerbalExpressions ) into the standard library than add special syntax support for normal REs. On Tue, Mar 28, 2017 at 3:31 AM, Paul Moore wrote: > On 28 March 2017 at 08:54, Simon D. wrote: > > I believe that the u"" notation in Python 2.7 is defined by while > > importing the unicode_litterals module. > > That's not true. The u"..." syntax is part of the language. from > future import unicode_literals is something completely different. > > > Each regexp lib could provide its instanciation of regexp litteral > > notation. > > The Python language has no way of doing that - user (or library) > defined literals are not possible. > > > And if only the default one does, it would still be won for the > > beginers, and the majority of persons using the stdlib. > > How? You've yet to prove that having a regex literal form is an > improvement over re.compile(r'put your regex here'). You've asserted > it, but that's a matter of opinion. We'd need evidence of real-life > code that was clearly improved by the existence of your proposed > construct. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Wed Mar 29 21:47:11 2017 From: abedillon at gmail.com (Abe Dillon) Date: Wed, 29 Mar 2017 20:47:11 -0500 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328075434.GL2757@tabr.siet.ch> Message-ID: > > I feel like that borders on a bit too wordy... I think the use of words instead of symbols is one of the things that makes Python so readable. The ternary operator is done with words: value = option1 if condition else option2 reads almost like English, while: value = condition ? option1: option2; Is just weird. I can read Verbal Expressions very quickly and understand exactly what's going on. If I have a decent IDE, I can write them almost as easily. I see no problem with wordiness if it means I don't have to stare at the code and scratch my head longer, or worse, open a reference to help me translate it (which is invariably the case when I look at regular expressions). On Wed, Mar 29, 2017 at 8:16 PM, Ryan Gonzalez wrote: > I feel like that borders on a bit too wordy... > > Personally, I'd like to see something like Felix's regular definitions: > > > http://felix-lang.org/share/src/web/tut/regexp_01.fdoc# > Regular_definitions._h > > > -- > Ryan (????) > Yoko Shimomura > ryo (supercell/EGOIST) > Hiroyuki Sawano >> everyone else > http://refi64.com > > On Mar 29, 2017 3:30 PM, "Abe Dillon" wrote: > > My 2 cents is that regular expressions are pretty un-pythonic because of > their horrible readability. I would much rather see Python adopt something > like Verbal Expressions ( https://github.com/VerbalExp > ressions/PythonVerbalExpressions ) into the standard library than add > special syntax support for normal REs. > > On Tue, Mar 28, 2017 at 3:31 AM, Paul Moore wrote: > >> On 28 March 2017 at 08:54, Simon D. wrote: >> > I believe that the u"" notation in Python 2.7 is defined by while >> > importing the unicode_litterals module. >> >> That's not true. The u"..." syntax is part of the language. from >> future import unicode_literals is something completely different. >> >> > Each regexp lib could provide its instanciation of regexp litteral >> > notation. >> >> The Python language has no way of doing that - user (or library) >> defined literals are not possible. >> >> > And if only the default one does, it would still be won for the >> > beginers, and the majority of persons using the stdlib. >> >> How? You've yet to prove that having a regex literal form is an >> improvement over re.compile(r'put your regex here'). You've asserted >> it, but that's a matter of opinion. We'd need evidence of real-life >> code that was clearly improved by the existence of your proposed >> construct. >> >> Paul >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Mar 29 21:51:43 2017 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 30 Mar 2017 12:51:43 +1100 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328075434.GL2757@tabr.siet.ch> Message-ID: On Thu, Mar 30, 2017 at 12:47 PM, Abe Dillon wrote: >> I feel like that borders on a bit too wordy... > > > I think the use of words instead of symbols is one of the things that makes > Python so readable. The ternary operator is done with words: > > value = option1 if condition else option2 > > reads almost like English, while: > > value = condition ? option1: option2; > > Is just weird. > > I can read Verbal Expressions very quickly and understand exactly what's > going on. If I have a decent IDE, I can write them almost as easily. I see > no problem with wordiness if it means I don't have to stare at the code and > scratch my head longer, or worse, open a reference to help me translate it > (which is invariably the case when I look at regular expressions). However, a huge advantage of REs is that they are common to many languages. You can take a regex from grep to Perl to your editor to Python. They're not absolutely identical, of course, but the basics are all the same. Creating a new search language means everyone has to learn anew. ChrisA From turnbull.stephen.fw at u.tsukuba.ac.jp Wed Mar 29 23:56:30 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Thu, 30 Mar 2017 12:56:30 +0900 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <20170328075434.GL2757@tabr.siet.ch> Message-ID: <22748.33262.919987.808494@turnbull.sk.tsukuba.ac.jp> Abe Dillon writes: > My 2 cents is that regular expressions are pretty un-pythonic because of > their horrible readability. I would much rather see Python adopt something > like Verbal Expressions ( > https://github.com/VerbalExpressions/PythonVerbalExpressions ) into the > standard library than add special syntax support for normal REs. You think that example is more readable than the proposed transalation ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ which is better written ^https?://(www\.)?[^ ]*$ or even ^https?://[^ ]*$ which makes it obvious that the regexp is not very useful from the word "^"? (It matches only URLs which are the only thing, including whitespace, on the line, probably not what was intended.) Are those groups capturing in Verbal Expressions? The use of "find" (~ "search") rather than "match" is disconcerting to the experienced user. What does alternation look like? How about alternation of non-trivial regular expressions? Etc, etc. As far as I can see, Verbal Expressions are basically a way of making it so painful to write regular expressions that people will restrict themselves to regular expressions that would be quite readable in traditional notation! I don't think that this failure to respect the developer's taste is restricted to this particular implementation, either. They *are* regular expressions, just with a verbose, obstructive notation. Far more important than "more readable" regular expressions would be a parsing library in the stdlib, reducing the developer's temptation to parse using complex regular expressions. IMHO YMMV etc. Steve From ncoghlan at gmail.com Thu Mar 30 00:49:34 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 30 Mar 2017 14:49:34 +1000 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: <20170327151740.GJ6883@tabr.siet.ch> References: <20170327151740.GJ6883@tabr.siet.ch> Message-ID: On 28 March 2017 at 01:17, Simon D. wrote: > It would ease the use of regexps in Python We don't really want to ease the use of regexps in Python - while they're an incredibly useful tool in a programmer's toolkit, they're so cryptic that they're almost inevitably a maintainability nightmare. Baking them directly into the language runtime also locks people in to a particular regex engine implementation, rather than being able to swap in a third party one if they choose to do so (as many folks currently do with the `regex` PyPI module). So it's appropriate to keep them as a string-based library level capability, and hence on a relatively level playing field with less comprehensive, but typically easier to maintain, options like string methods and third party text parsing libraries (such as https://pypi.python.org/pypi/parse for something close to the inverse of str.format) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From simon at acoeuro.com Thu Mar 30 02:38:31 2017 From: simon at acoeuro.com (Simon D.) Date: Thu, 30 Mar 2017 08:38:31 +0200 Subject: [Python-ideas] What about regexp string litterals : m".*" ? In-Reply-To: <20170328075605.GM2757@tabr.siet.ch> References: <20170327151740.GJ6883@tabr.siet.ch> <20170328030205.GE27969@ando.pearwood.info> <20170328075605.GM2757@tabr.siet.ch> Message-ID: <20170330063831.GO6535@tabr.siet.ch> * Simon D. [2017-03-28 09:56:05 +0200]: > The str integrated one also, but maybe confusing, which regexp lib is > used ? (must be the default one). > Ok, this was a mistake, based on JavaScript memories? There is no regexp aware functions around str, but some hint to go find your happiness in the re module. -- Simon Descarpentries +336 769 702 53 http://acoeuro.com From markusmeskanen at gmail.com Thu Mar 30 05:18:53 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Thu, 30 Mar 2017 12:18:53 +0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" Message-ID: Hi Pythonistas, yet again today I ended up writing: d = [[0] * 5 for _ in range(10)] And wondered, why don't we have a way to repeat other than looping over range() and using a dummy variable? This seems like a rather common thing to do, and while the benefit doesn't seem much, something like this would be much prettier and more pythonic than using underscore variable: d = [[0] * 5 repeat_for 10] And could obviously be used outside of comprehensions too: repeat_for 3: print('Attempting to reconnect...') if reconnect(): break else: print('Unable to reconnect :(') sys.exit(0) I chose to use repeat_for instead of repeat because it's way less likely to be used as a variable name, but obviously other names could be used like loop_for or repeat_times etc. Thoughts? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Mar 30 05:53:15 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 30 Mar 2017 19:53:15 +1000 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 30 March 2017 at 19:18, Markus Meskanen wrote: > Hi Pythonistas, > > yet again today I ended up writing: > > d = [[0] * 5 for _ in range(10)] > > And wondered, why don't we have a way to repeat other than looping over > range() and using a dummy variable? Because it's relatively rare to not use the loop variable for anything (even if it's just a debug message), and in the cases where you genuinely don't use it, a standard idiom can be applied (using a single or double underscore as a dummy variable), rather than all future users of the language needing to learn a special case syntax. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From markusmeskanen at gmail.com Thu Mar 30 05:57:12 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Thu, 30 Mar 2017 12:57:12 +0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On Mar 30, 2017 12:53, "Nick Coghlan" wrote: Because it's relatively rare to not use the loop variable for anything (even if it's just a debug message), and in the cases where you genuinely don't use it, a standard idiom can be applied (using a single or double underscore as a dummy variable), rather than all future users of the language needing to learn a special case syntax. Cheers, Nick. I -------------- next part -------------- An HTML attachment was scrubbed... URL: From markusmeskanen at gmail.com Thu Mar 30 05:59:57 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Thu, 30 Mar 2017 12:59:57 +0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On Thu, Mar 30, 2017 at 12:53 PM, Nick Coghlan wrote: > > Because it's relatively rare to not use the loop variable for anything > (even if it's just a debug message), and in the cases where you > genuinely don't use it, a standard idiom can be applied (using a > single or double underscore as a dummy variable), rather than all > future users of the language needing to learn a special case syntax. > I think "relatively rare" is rather subjective, it's surely not everyday stuff but that doesn't mean it's not done often. And instead of learning a special syntax, which is simple and easy to understand when they google "repeat many times python", they now end up learning a special semantic by naming the variable with an underscore. If and when someone asks "how to repeat many times in Python", I'd rather answer "use repeat_for X" instead of "use for _ in range(X)" -------------- next part -------------- An HTML attachment was scrubbed... URL: From allan.clark at gmail.com Thu Mar 30 06:24:52 2017 From: allan.clark at gmail.com (Allan Clark) Date: Thu, 30 Mar 2017 11:24:52 +0100 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: If there were to be special syntax for this case, I'd just allow an empty pattern, such as: d = [[0] * 5 for in 10] This is exactly the same as your 'repeat_for' except that it is spelt 'for in', which means there are no new keywords. It would also be allowed in for-loops in the same way as your example. I believe this would even be relatively simple to implement (but don't know). But I'm afraid, I'd be -1 on this, two reasons: 1. Subjective it may be, but my subjective opinion is that this does not come up often enough to warrant this change. 2. When it does come up for those learning the language, they learn a useful idiom of using '_' or '__' for variables that you intend not be used. Thanks, Allan. On 30 March 2017 at 10:59, Markus Meskanen wrote: > > > On Thu, Mar 30, 2017 at 12:53 PM, Nick Coghlan wrote: >> >> Because it's relatively rare to not use the loop variable for anything >> (even if it's just a debug message), and in the cases where you >> genuinely don't use it, a standard idiom can be applied (using a >> single or double underscore as a dummy variable), rather than all >> future users of the language needing to learn a special case syntax. > > > I think "relatively rare" is rather subjective, it's surely not everyday > stuff but that doesn't mean it's not done often. > And instead of learning a special syntax, which is simple and easy to > understand when they google "repeat many times python", they now end up > learning a special semantic by naming the variable with an underscore. If > and when someone asks "how to repeat many times in Python", I'd rather > answer "use repeat_for X" instead of "use for _ in range(X)" > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From allan.clark at gmail.com Thu Mar 30 06:26:25 2017 From: allan.clark at gmail.com (Allan Clark) Date: Thu, 30 Mar 2017 11:26:25 +0100 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: Sorry, that obviously should have been: d = [[0] * 5 for in range(10)] So it's not quite exactly the same as your example. On 30 March 2017 at 11:24, Allan Clark wrote: > If there were to be special syntax for this case, I'd just allow an > empty pattern, such as: > > d = [[0] * 5 for in 10] > > This is exactly the same as your 'repeat_for' except that it is spelt > 'for in', which means there are no new keywords. It would also be > allowed in for-loops in the same way as your example. I believe this > would even be relatively simple to implement (but don't know). > But I'm afraid, I'd be -1 on this, two reasons: > > 1. Subjective it may be, but my subjective opinion is that this does > not come up often enough to warrant this change. > 2. When it does come up for those learning the language, they learn a > useful idiom of using '_' or '__' for variables that you intend not be > used. > > Thanks, > Allan. > > > > On 30 March 2017 at 10:59, Markus Meskanen wrote: >> >> >> On Thu, Mar 30, 2017 at 12:53 PM, Nick Coghlan wrote: >>> >>> Because it's relatively rare to not use the loop variable for anything >>> (even if it's just a debug message), and in the cases where you >>> genuinely don't use it, a standard idiom can be applied (using a >>> single or double underscore as a dummy variable), rather than all >>> future users of the language needing to learn a special case syntax. >> >> >> I think "relatively rare" is rather subjective, it's surely not everyday >> stuff but that doesn't mean it's not done often. >> And instead of learning a special syntax, which is simple and easy to >> understand when they google "repeat many times python", they now end up >> learning a special semantic by naming the variable with an underscore. If >> and when someone asks "how to repeat many times in Python", I'd rather >> answer "use repeat_for X" instead of "use for _ in range(X)" >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> From mehaase at gmail.com Thu Mar 30 09:51:08 2017 From: mehaase at gmail.com (Mark E. Haase) Date: Thu, 30 Mar 2017 09:51:08 -0400 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: Your example is really repeating two things: d = [ [0 for _ in range(5)] for _ in range(10) ] But since list() uses * for repetition, you could write it more concisely as: d = [[0] * 5] * 10] I'm not picking on your specific example. I am only pointing out that Python gives you the tools you need to build nice APIs. If repetition is an important part of something you're working on, then consider using itertools.repeat, writing your own domain-specific repeat() method, or even override * like list() does. One of the coolest aspects of Python is how a relatively small set of abstractions can be combined to create lots of useful behaviors. For students, the lack of a "repeat" block might be confusing at first, but once the student understands for loops in general, it's an easy mental jump from "using the loop variable in the body" to "not using the loop variable in the body" to "underscore is the convention for an unused loop variable". In the long run, having one syntax that does many things is simpler than having many syntaxes that each do one little thing. On Thu, Mar 30, 2017 at 5:18 AM, Markus Meskanen wrote: > Hi Pythonistas, > > yet again today I ended up writing: > > d = [[0] * 5 for _ in range(10)] > > And wondered, why don't we have a way to repeat other than looping over > range() and using a dummy variable? This seems like a rather common thing > to do, and while the benefit doesn't seem much, something like this would > be much prettier and more pythonic than using underscore variable: > > d = [[0] * 5 repeat_for 10] > > And could obviously be used outside of comprehensions too: > > repeat_for 3: > print('Attempting to reconnect...') > if reconnect(): > break > else: > print('Unable to reconnect :(') > sys.exit(0) > > I chose to use repeat_for instead of repeat because it's way less likely > to be used as a variable name, but obviously other names could be used like > loop_for or repeat_times etc. > > Thoughts? > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmoisset at machinalis.com Thu Mar 30 09:55:54 2017 From: dmoisset at machinalis.com (Daniel Moisset) Date: Thu, 30 Mar 2017 14:55:54 +0100 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: That's not the same, in the OP example you can assert d[0] is not d[1], while in your code that assertion fails (the list comprehension evaluates the expression each time creating a new list, your code makes 10 references to a single 5 element list) On 30 March 2017 at 14:51, Mark E. Haase wrote: > Your example is really repeating two things: > > d = [ [0 for _ in range(5)] for _ in range(10) ] > > But since list() uses * for repetition, you could write it more concisely > as: > > d = [[0] * 5] * 10] > > I'm not picking on your specific example. I am only pointing out that > Python gives you the tools you need to build nice APIs. If repetition is an > important part of something you're working on, then consider using > itertools.repeat, writing your own domain-specific repeat() method, or even > override * like list() does. One of the coolest aspects of Python is how a > relatively small set of abstractions can be combined to create lots of > useful behaviors. > > For students, the lack of a "repeat" block might be confusing at first, > but once the student understands for loops in general, it's an easy mental > jump from "using the loop variable in the body" to "not using the loop > variable in the body" to "underscore is the convention for an unused loop > variable". In the long run, having one syntax that does many things is > simpler than having many syntaxes that each do one little thing. > > On Thu, Mar 30, 2017 at 5:18 AM, Markus Meskanen > wrote: > >> Hi Pythonistas, >> >> yet again today I ended up writing: >> >> d = [[0] * 5 for _ in range(10)] >> >> And wondered, why don't we have a way to repeat other than looping over >> range() and using a dummy variable? This seems like a rather common thing >> to do, and while the benefit doesn't seem much, something like this would >> be much prettier and more pythonic than using underscore variable: >> >> d = [[0] * 5 repeat_for 10] >> >> And could obviously be used outside of comprehensions too: >> >> repeat_for 3: >> print('Attempting to reconnect...') >> if reconnect(): >> break >> else: >> print('Unable to reconnect :(') >> sys.exit(0) >> >> I chose to use repeat_for instead of repeat because it's way less likely >> to be used as a variable name, but obviously other names could be used like >> loop_for or repeat_times etc. >> >> Thoughts? >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Daniel F. Moisset - UK Country Manager www.machinalis.com Skype: @dmoisset -------------- next part -------------- An HTML attachment was scrubbed... URL: From contact at brice.xyz Thu Mar 30 10:10:57 2017 From: contact at brice.xyz (Brice PARENT) Date: Thu, 30 Mar 2017 16:10:57 +0200 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: <696fee0b-da54-b40c-ddb5-aafffd0b8b44@brice.xyz> Le 30/03/17 ? 15:51, Mark E. Haase a ?crit : > I'm not picking on your specific example. I am only pointing out that > Python gives you the tools you need to build nice APIs. If repetition > is an important part of something you're working on, then consider > using itertools.repeat, writing your own domain-specific repeat() > method, or even override * like list() does. One of the coolest > aspects of Python is how a relatively small set of abstractions can be > combined to create lots of useful behaviors. > > For students, the lack of a "repeat" block might be confusing at > first, but once the student understands for loops in general, it's an > easy mental jump from "using the loop variable in the body" to "not > using the loop variable in the body" to "underscore is the convention > for an unused loop variable". In the long run, having one syntax that > does many things is simpler than having many syntaxes that each do one > little thing. +1 I would add that it is even the convention for all unused variables, not only in loops, as it is also used in other cases, like this for example : key, _, value = "foo:date:bar".split(":") From pavol.lisy at gmail.com Thu Mar 30 10:23:05 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Thu, 30 Mar 2017 16:23:05 +0200 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 3/30/17, Nick Coghlan wrote: > On 30 March 2017 at 19:18, Markus Meskanen > wrote: >> Hi Pythonistas, >> >> yet again today I ended up writing: >> >> d = [[0] * 5 for _ in range(10)] d = [[0]*5]*10 # what about this? >> And wondered, why don't we have a way to repeat other than looping over >> range() and using a dummy variable? > > Because it's relatively rare to not use the loop variable for anything > (even if it's just a debug message), and in the cases where you > genuinely don't use it, a standard idiom can be applied (using a > single or double underscore as a dummy variable), rather than all > future users of the language needing to learn a special case syntax. > > Cheers, > Nick. Simplified repeating could be probably useful in interactive mode. Just for curiosity - if PEP-501 will be accepted then how many times could be fnc called in next code? eval(i'{fnc()}, ' *3) PL. From ncoghlan at gmail.com Thu Mar 30 10:53:16 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 31 Mar 2017 00:53:16 +1000 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 31 March 2017 at 00:23, Pavol Lisy wrote: > Just for curiosity - if PEP-501 will be accepted then how many times > could be fnc called in next code? > > eval(i'{fnc()}, ' *3) Once (the same as f-strings), but then it would throw TypeError, as unlike strings and other sequences, InterpolationTemplate wouldn't define a multiplication operator. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From klahnakoski at mozilla.com Thu Mar 30 11:06:21 2017 From: klahnakoski at mozilla.com (Kyle Lahnakoski) Date: Thu, 30 Mar 2017 11:06:21 -0400 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 2017-03-30 05:18, Markus Meskanen wrote: > Hi Pythonistas, > > yet again today I ended up writing: > > d = [[0] * 5 for _ in range(10)] > > > Thoughts? It looks like you are initializing matrices. Can you make a helper function? d = matrix(shape=(5, 10), default=0) or maybe use NumPy? From wolfgang.maier at biologie.uni-freiburg.de Thu Mar 30 11:06:12 2017 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Thu, 30 Mar 2017 17:06:12 +0200 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 03/30/2017 04:23 PM, Pavol Lisy wrote: > On 3/30/17, Nick Coghlan wrote: >> On 30 March 2017 at 19:18, Markus Meskanen >> wrote: >>> >>> d = [[0] * 5 for _ in range(10)] > > d = [[0]*5]*10 # what about this? > These are not quite the same when the repeated object is mutable. Compare: >>> matrix1 = [[0] * 5 for _ in range(10)] >>> matrix1[0].append(1) >>> matrix1 [[0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]] >>> matrix2=[[0]*5]*10 >>> matrix2[0].append(1) >>> matrix2 [[0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 1]] so the comprehension is usually necessary. Wolfgang From markusmeskanen at gmail.com Thu Mar 30 11:08:31 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Thu, 30 Mar 2017 18:08:31 +0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: Just an example among many :) On Mar 30, 2017 6:07 PM, "Kyle Lahnakoski" wrote: > > > On 2017-03-30 05:18, Markus Meskanen wrote: > > Hi Pythonistas, > > > > yet again today I ended up writing: > > > > d = [[0] * 5 for _ in range(10)] > > > > > > Thoughts? > > It looks like you are initializing matrices. Can you make a helper > function? > > d = matrix(shape=(5, 10), default=0) > > or maybe use NumPy? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Thu Mar 30 12:04:09 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 30 Mar 2017 13:04:09 -0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 30 March 2017 at 10:51, Mark E. Haase wrote: > Your example is really repeating two things: > > d = [ [0 for _ in range(5)] for _ in range(10) ] > > But since list() uses * for repetition, you could write it more concisely > as: > > d = [[0] * 5] * 10] > > I'm not picking on your specific example. I am only pointing out that Python > gives you the tools you need to build nice APIs. If repetition is an > important part of something you're working on, then consider using > itertools.repeat, writing your own domain-specific repeat() method, or even > override * like list() does. One of the coolest aspects of Python is how a > relatively small set of abstractions can be combined to create lots of > useful behaviors. I find it weird that not the author, neither the previous repliers noticed that "a repetition other than a for with dummy variable" was already in plain sight, in the very example given. Of course one is also free to write [ [0 for _ in range(5)] for _ in range(10)] if he wishes so. From markusmeskanen at gmail.com Thu Mar 30 12:10:06 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Thu, 30 Mar 2017 19:10:06 +0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On Mar 30, 2017 19:04, "Joao S. O. Bueno" wrote: On 30 March 2017 at 10:51, Mark E. Haase wrote: > Your example is really repeating two things: > > d = [ [0 for _ in range(5)] for _ in range(10) ] > > But since list() uses * for repetition, you could write it more concisely > as: > > d = [[0] * 5] * 10] > > I'm not picking on your specific example. I am only pointing out that Python > gives you the tools you need to build nice APIs. If repetition is an > important part of something you're working on, then consider using > itertools.repeat, writing your own domain-specific repeat() method, or even > override * like list() does. One of the coolest aspects of Python is how a > relatively small set of abstractions can be combined to create lots of > useful behaviors. I find it weird that not the author, neither the previous repliers noticed that "a repetition other than a for with dummy variable" was already in plain sight, in the very example given. Of course one is also free to write [ [0 for _ in range(5)] for _ in range(10)] if he wishes so. Had you read all the replies, you'd see people (including me, OP) repeating this multiple times: d = [[0] * 5] * 10 Creates a list of ten references *to the same list*. This means that if I mutate any of the sub lists in d, all of the sub lists get mutated. There would only be one sub list, just ten references to it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Thu Mar 30 12:54:51 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 30 Mar 2017 13:54:51 -0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 30 March 2017 at 13:10, Markus Meskanen wrote: > > > On Mar 30, 2017 19:04, "Joao S. O. Bueno" wrote: > > On 30 March 2017 at 10:51, Mark E. Haase wrote: >> Your example is really repeating two things: >> >> d = [ [0 for _ in range(5)] for _ in range(10) ] >> >> But since list() uses * for repetition, you could write it more concisely >> as: >> >> d = [[0] * 5] * 10] >> >> I'm not picking on your specific example. I am only pointing out that >> Python >> gives you the tools you need to build nice APIs. If repetition is an >> important part of something you're working on, then consider using >> itertools.repeat, writing your own domain-specific repeat() method, or >> even >> override * like list() does. One of the coolest aspects of Python is how a >> relatively small set of abstractions can be combined to create lots of >> useful behaviors. > > I find it weird that not the author, neither the previous repliers noticed > that > "a repetition other than a for with dummy variable" was already in plain > sight, > in the very example given. > Of course one is also free to write [ [0 for _ in range(5)] for _ in > range(10)] if he wishes so. > > > Had you read all the replies, you'd see people (including me, OP) repeating > this multiple times: > > d = [[0] * 5] * 10 > > Creates a list of ten references *to the same list*. This means that if I > mutate any of the sub lists in d, all of the sub lists get mutated. There > would only be one sub list, just ten references to it. Yes. Nonetheless, it is still repeating. Accepting a new way for doing this would go from 2 ways with 2 semantics to 3 ways with two different semantics. And, all you need is to create a special class to actually dupicate the list on multiplying - not a big deal: In [76]: class D: ...: def __init__(self, value): ...: self.value = value ...: def __rmul__(self, other): ...: if hasattr(other, '__len__') and hasattr(other, '__add__'): ...: result = deepcopy(other) ...: for _ in range(1, self.value): ...: result += deepcopy(other) ...: return result ...: return NotImplemented ...: In [77]: from copy import deepcopy In [78]: a = [[0] * 5] * D(10) In [79]: a[5][2] = "*" In [80]: a Out[80]: [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, '*', 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]] From markusmeskanen at gmail.com Thu Mar 30 13:06:50 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Thu, 30 Mar 2017 20:06:50 +0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: And like I said before, for loop is just another way of doing while loop, yet nobody's complaining. There's nothing wrong with having two different ways of doing the same thing, as long as one of them is never the better way. If we add `repeat`, there's never a reason to use `for _ in range` anymore. What comes to your custom class solution, it's uglier, harder to follow, and way slower than just doing: d = [[0]*5 for _ in range(10)] While the proposed method would be faster, shorter, and cleaner. And like I said many times, the matrix example is just one of many. On Mar 30, 2017 19:54, "Joao S. O. Bueno" wrote: > On 30 March 2017 at 13:10, Markus Meskanen > wrote: > > > > > > On Mar 30, 2017 19:04, "Joao S. O. Bueno" wrote: > > > > On 30 March 2017 at 10:51, Mark E. Haase wrote: > >> Your example is really repeating two things: > >> > >> d = [ [0 for _ in range(5)] for _ in range(10) ] > >> > >> But since list() uses * for repetition, you could write it more > concisely > >> as: > >> > >> d = [[0] * 5] * 10] > >> > >> I'm not picking on your specific example. I am only pointing out that > >> Python > >> gives you the tools you need to build nice APIs. If repetition is an > >> important part of something you're working on, then consider using > >> itertools.repeat, writing your own domain-specific repeat() method, or > >> even > >> override * like list() does. One of the coolest aspects of Python is > how a > >> relatively small set of abstractions can be combined to create lots of > >> useful behaviors. > > > > I find it weird that not the author, neither the previous repliers > noticed > > that > > "a repetition other than a for with dummy variable" was already in plain > > sight, > > in the very example given. > > Of course one is also free to write [ [0 for _ in range(5)] for _ in > > range(10)] if he wishes so. > > > > > > Had you read all the replies, you'd see people (including me, OP) > repeating > > this multiple times: > > > > d = [[0] * 5] * 10 > > > > Creates a list of ten references *to the same list*. This means that if I > > mutate any of the sub lists in d, all of the sub lists get mutated. There > > would only be one sub list, just ten references to it. > > > Yes. Nonetheless, it is still repeating. Accepting a new way for doing > this would go from 2 ways with 2 semantics to 3 ways with two > different semantics. > > And, all you need is to create a special class to actually dupicate > the list on multiplying - not a big deal: > > In [76]: class D: > ...: def __init__(self, value): > ...: self.value = value > ...: def __rmul__(self, other): > ...: if hasattr(other, '__len__') and hasattr(other, '__add__'): > ...: result = deepcopy(other) > ...: for _ in range(1, self.value): > ...: result += deepcopy(other) > ...: return result > ...: return NotImplemented > ...: > > In [77]: from copy import deepcopy > > In [78]: a = [[0] * 5] * D(10) > > In [79]: a[5][2] = "*" > > In [80]: a > Out[80]: > [[0, 0, 0, 0, 0], > [0, 0, 0, 0, 0], > [0, 0, 0, 0, 0], > [0, 0, 0, 0, 0], > [0, 0, 0, 0, 0], > [0, 0, '*', 0, 0], > [0, 0, 0, 0, 0], > [0, 0, 0, 0, 0], > [0, 0, 0, 0, 0], > [0, 0, 0, 0, 0]] > -------------- next part -------------- An HTML attachment was scrubbed... URL: From contact at brice.xyz Thu Mar 30 13:23:45 2017 From: contact at brice.xyz (Brice PARENT) Date: Thu, 30 Mar 2017 19:23:45 +0200 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: Le 30/03/17 ? 19:06, Markus Meskanen a ?crit : > And like I said before, for loop is just another way of doing while > loop, yet nobody's complaining. There's nothing wrong with having two > different ways of doing the same thing, as long as one of them is > never the better way. If we add `repeat`, there's never a reason to > use `for _ in range` anymore. It doesn't always creates something easier to use, like for example : `for _ in range(x, y, z)` (fixed or variable parameters) `for _ in one_list` (saves a call to len() with your solution) `for _ in any_other_kind_of_iterable` (we don't need to know the length here, we may even use a generator) From markusmeskanen at gmail.com Thu Mar 30 13:38:57 2017 From: markusmeskanen at gmail.com (Markus Meskanen) Date: Thu, 30 Mar 2017 20:38:57 +0300 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On Thu, Mar 30, 2017 at 8:23 PM, Brice PARENT wrote: > > Le 30/03/17 ? 19:06, Markus Meskanen a ?crit : > >> And like I said before, for loop is just another way of doing while loop, >> yet nobody's complaining. There's nothing wrong with having two different >> ways of doing the same thing, as long as one of them is never the better >> way. If we add `repeat`, there's never a reason to use `for _ in range` >> anymore. >> > It doesn't always creates something easier to use, like for example : > `for _ in range(x, y, z)` (fixed or variable parameters) > `for _ in one_list` (saves a call to len() with your solution) > `for _ in any_other_kind_of_iterable` (we don't need to know the length > here, we may even use a generator) Your first example: > `for _ in range(x, y, z)` Makes little sense, since there's still a fixed amount of steps and normal range(x) could just be used instead. As a matter of fact, it can be replaced with either of these, arguably `repeat_for` version being cleaner: for _ in range((y-x+1)//z) repeat_for (y - x + 1) // z And in that one *extremely* unlikely and rare scenario where someone really does need range() with variable start, stop, and step, and doesn't need the returned variable, he can freely still use `for _ in range`. This won't remove it. Your other two examples: > `for _ in one_list` > `for _ in any_other_kind_of_iterable` Aren't related to the syntax I'm proposing, you even quoted this part yourself: > there's never a reason to use `for _ in range` anymore. But your examples don't use range() to begin with. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhodri at kynesim.co.uk Thu Mar 30 13:49:12 2017 From: rhodri at kynesim.co.uk (Rhodri James) Date: Thu, 30 Mar 2017 18:49:12 +0100 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: <144ee7e6-1087-981d-3d2c-a3eb1f3487fa@kynesim.co.uk> On 30/03/17 18:06, Markus Meskanen wrote: > And like I said before, for loop is just another way of doing while loop, > yet nobody's complaining. There's nothing wrong with having two different > ways of doing the same thing, as long as one of them is never the better > way. If we add `repeat`, there's never a reason to use `for _ in range` > anymore. That's a C-programmer point of view. In C, it's true; for (init(); cond(); inc()) { ... } is just a convenient form of init(); while (cond()) { ...; inc(); } (ignoring breaks and continues) In Python, the two types of loop are conceptually rather more different. A while loop loops based on a condition; a for loop iterates through an iterable. Doing simple repetition as "for _ in range(x)" is a bit artificial really, but less ugly than doing it with a while loop. Your proposed "repeat" (however it is spelt) is a special case, and a pretty limited one at that. I'm not sure I've needed it, certainly not for a while, and I have to say I don't find array initialisation a compelling use-case. I really don't like the idea of finding it in comprehensions. -- Rhodri James *-* Kynesim Ltd From joshua.morton13 at gmail.com Thu Mar 30 14:05:33 2017 From: joshua.morton13 at gmail.com (Joshua Morton) Date: Thu, 30 Mar 2017 18:05:33 +0000 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: A for loop in python saves an enormous amount of boilerplate code though (I would post an example, but I'd likely mess up a while loop over an iterator from memory if I posted it here). The `for x in y` construct saves multiple lines an an enormous amount of boilerplate and mental strain in the majority of loops. Your suggestion occasionally saves single digit characters. I'd be curious to know whether implementing this change and then applying the new construct would be a net increase or decrease in the size of the python interpreter and stdlib. Alternatively, writing def repeat_for(func, iters): return func() for _ in range(iters) does what you want without any required syntax changes. On Thu, Mar 30, 2017 at 10:07 AM Markus Meskanen wrote: And like I said before, for loop is just another way of doing while loop, yet nobody's complaining. There's nothing wrong with having two different ways of doing the same thing, as long as one of them is never the better way. If we add `repeat`, there's never a reason to use `for _ in range` anymore. What comes to your custom class solution, it's uglier, harder to follow, and way slower than just doing: d = [[0]*5 for _ in range(10)] While the proposed method would be faster, shorter, and cleaner. And like I said many times, the matrix example is just one of many. On Mar 30, 2017 19:54, "Joao S. O. Bueno" wrote: On 30 March 2017 at 13:10, Markus Meskanen wrote: > > > On Mar 30, 2017 19:04, "Joao S. O. Bueno" wrote: > > On 30 March 2017 at 10:51, Mark E. Haase wrote: >> Your example is really repeating two things: >> >> d = [ [0 for _ in range(5)] for _ in range(10) ] >> >> But since list() uses * for repetition, you could write it more concisely >> as: >> >> d = [[0] * 5] * 10] >> >> I'm not picking on your specific example. I am only pointing out that >> Python >> gives you the tools you need to build nice APIs. If repetition is an >> important part of something you're working on, then consider using >> itertools.repeat, writing your own domain-specific repeat() method, or >> even >> override * like list() does. One of the coolest aspects of Python is how a >> relatively small set of abstractions can be combined to create lots of >> useful behaviors. > > I find it weird that not the author, neither the previous repliers noticed > that > "a repetition other than a for with dummy variable" was already in plain > sight, > in the very example given. > Of course one is also free to write [ [0 for _ in range(5)] for _ in > range(10)] if he wishes so. > > > Had you read all the replies, you'd see people (including me, OP) repeating > this multiple times: > > d = [[0] * 5] * 10 > > Creates a list of ten references *to the same list*. This means that if I > mutate any of the sub lists in d, all of the sub lists get mutated. There > would only be one sub list, just ten references to it. Yes. Nonetheless, it is still repeating. Accepting a new way for doing this would go from 2 ways with 2 semantics to 3 ways with two different semantics. And, all you need is to create a special class to actually dupicate the list on multiplying - not a big deal: In [76]: class D: ...: def __init__(self, value): ...: self.value = value ...: def __rmul__(self, other): ...: if hasattr(other, '__len__') and hasattr(other, '__add__'): ...: result = deepcopy(other) ...: for _ in range(1, self.value): ...: result += deepcopy(other) ...: return result ...: return NotImplemented ...: In [77]: from copy import deepcopy In [78]: a = [[0] * 5] * D(10) In [79]: a[5][2] = "*" In [80]: a Out[80]: [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, '*', 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]] _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pavol.lisy at gmail.com Thu Mar 30 14:08:04 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Thu, 30 Mar 2017 20:08:04 +0200 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 3/30/17, Nick Coghlan wrote: > On 31 March 2017 at 00:23, Pavol Lisy wrote: >> Just for curiosity - if PEP-501 will be accepted then how many times >> could be fnc called in next code? >> >> eval(i'{fnc()}, ' *3) > > Once (the same as f-strings), but then it would throw TypeError, as > unlike strings and other sequences, InterpolationTemplate wouldn't > define a multiplication operator. > > Cheers, > Nick. Could you explain the reason behind not implement it, please? PL From pavol.lisy at gmail.com Thu Mar 30 14:49:29 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Thu, 30 Mar 2017 20:49:29 +0200 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 3/30/17, Joshua Morton wrote: > def repeat_for(func, iters): > return func() for _ in range(iters) > > does what you want without any required syntax changes. I've got a "SyntaxError: invalid syntax" PL. From contact at brice.xyz Thu Mar 30 17:39:08 2017 From: contact at brice.xyz (Brice PARENT) Date: Thu, 30 Mar 2017 23:39:08 +0200 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: <8584f11a-33cd-e05f-837e-ac4c87ba6f06@brice.xyz> > > > `for _ in range(x, y, z)` > > Makes little sense, since there's still a fixed amount of steps and > normal range(x) could just be used instead. As a matter of fact, it > can be replaced with either of these, arguably `repeat_for` version > being cleaner: > > for _ in range((y-x+1)//z) > repeat_for (y - x + 1) // z > > And in that one *extremely* unlikely and rare scenario where someone > really does need range() with variable start, stop, and step, and > doesn't need the returned variable, he can freely still use `for _ in > range`. This won't remove it. > > Your other two examples: > > > `for _ in one_list` > > `for _ in any_other_kind_of_iterable` > > Aren't related to the syntax I'm proposing, you even quoted this part > yourself: > > > there's never a reason to use `for _ in range` anymore. > > But your examples don't use range() to begin with. That's exactly my point ! What you propose is exaggeratedly specific. It will only work to replace exactly `for _ in range(x)`, nothing more. Every Python developer needs to know about `for i in range(x)`, as it is a really common pattern. It feels really strange to switch to a completely different syntax just for the case we don't care about `i`, a syntax that'll never be used for anything else. And to lose its readability if your `range` requires more that one argument. The Zen of Python tells that way better than I do; - There should be one-- and preferably only one --obvious way to do it. -> As we won't forbid the syntax with range, why add a second obvious way to do it? - Special cases aren't special enough to break the rules. -> The new syntax aims at a very specific special case, don't break the rules for it. -Brice -------------- next part -------------- An HTML attachment was scrubbed... URL: From abedillon at gmail.com Thu Mar 30 21:38:10 2017 From: abedillon at gmail.com (Abe Dillon) Date: Thu, 30 Mar 2017 20:38:10 -0500 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> Message-ID: > a huge advantage of REs is that they are common to many > languages. You can take a regex from grep to Perl to your editor to > Python. They're not absolutely identical, of course, but the basics > are all the same. Creating a new search language means everyone has to > learn anew. > ChrisA 1) I'm not suggesting we get rid of the re module (the VE implementation I linked requires it) 2) You can easily output regex from verbal expressions 3) verbal expressions are implemented in many different languages too: https://verbalexpressions.github.io/ 4) It even has a generic interface that all implementations are meant to follow: https://github.com/VerbalExpressions/implementation/wiki/List-of-methods-to-implement Note that the entire documentation is 250 words while just the syntax portion of Python docs for the re module is over 3000 words. > You think that example is more readable than the proposed transalation > ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ > which is better written > ^https?://(www\.)?[^ ]*$ > or even > ^https?://[^ ]*$ Yes. I find it *far* more readable. It's not a soup of symbols like Perl code. I can only surmise that you're fluent in regex because it seems difficult for you to see how the above could be less readable than English words. which makes it obvious that the regexp is not very useful from the > word "^"? (It matches only URLs which are the only thing, including > whitespace, on the line, probably not what was intended.) I could tell it only matches URLs that are the only thing inside the string because it clearly says: start_of_line() and end_of_line(). I would have had to refer to a reference to know that "^" doesn't always mean "not", it sometimes means "start of string" and probably other things. I would also have to check a reference to know that "$" can mean "end of string" (and probably other things). Are those groups capturing in Verbal Expressions? The use of "find" > (~ "search") rather than "match" is disconcerting to the experienced > user. You can alternately use the word "then". The source code is just one python file. It's very easy to read. I actually like "then" over "find" for the example: verbal_expression.start_of_line() .then('http') .maybe('s') .then('://') .maybe('www.') .anything_but(' ') .end_of_line() What does alternation look like? .OR(option1).OR(option2).OR(option3)... How about alternation of > non-trivial regular expressions? .OR(other_verbal_expression) As far as I can see, Verbal Expressions are basically a way of making > it so painful to write regular expressions that people will restrict > themselves to regular expressions What's so painful to write about them? Does your IDE not have autocompletion? I find REs so painful to write that I usually just use string methods if at all feasible. I don't think that this failure to respect the > developer's taste is restricted to this particular implementation, > either. I generally find it distasteful to write a pseudolanguage in strings inside of other languages (this applies to SQL as well). Especially when the design principals of that pseudolanguage are *diametrically opposed* to the design principals of the host language. A key principal of Python's design is: "you read code a lot more often than you write code, so emphasize readability". Regex seems to be based on: "Do the most with the fewest key-strokes. Readability be dammed!". It makes a lot more sense to wrap the psudolanguage in constructs that bring it in-line with the host language than to take on the mental burden of trying to comprehend two different languages at the same time. If you disagree, nothing's stopping you from continuing to write res the old-fashion way. Can we at least agree that baking special re syntax directly into the language is a bad idea? On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan wrote: > On 28 March 2017 at 01:17, Simon D. wrote: > > It would ease the use of regexps in Python > > We don't really want to ease the use of regexps in Python - while > they're an incredibly useful tool in a programmer's toolkit, they're > so cryptic that they're almost inevitably a maintainability nightmare. > > Baking them directly into the language runtime also locks people in to > a particular regex engine implementation, rather than being able to > swap in a third party one if they choose to do so (as many folks > currently do with the `regex` PyPI module). > > So it's appropriate to keep them as a string-based library level > capability, and hence on a relatively level playing field with less > comprehensive, but typically easier to maintain, options like string > methods and third party text parsing libraries (such as > https://pypi.python.org/pypi/parse for something close to the inverse > of str.format) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Mar 31 00:13:26 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 31 Mar 2017 14:13:26 +1000 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 31 March 2017 at 04:08, Pavol Lisy wrote: > On 3/30/17, Nick Coghlan wrote: >> On 31 March 2017 at 00:23, Pavol Lisy wrote: >>> Just for curiosity - if PEP-501 will be accepted then how many times >>> could be fnc called in next code? >>> >>> eval(i'{fnc()}, ' *3) >> >> Once (the same as f-strings), but then it would throw TypeError, as >> unlike strings and other sequences, InterpolationTemplate wouldn't >> define a multiplication operator. >> >> Cheers, >> Nick. > > Could you explain the reason behind not implement it, please? For the same reason dictionaries don't implement it: it doesn't make any sense in the general case. Repeating the *rendered* template might make sense, but that will depend on the specific renderer and the kinds of objects it produces (e.g. an SQL or shell renderer probably wouldn't produce output that supported repetition). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Mar 31 00:36:42 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 31 Mar 2017 15:36:42 +1100 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: <20170331043641.GB9464@ando.pearwood.info> On Thu, Mar 30, 2017 at 04:23:05PM +0200, Pavol Lisy wrote: > On 3/30/17, Nick Coghlan wrote: > > On 30 March 2017 at 19:18, Markus Meskanen > > wrote: > >> Hi Pythonistas, > >> > >> yet again today I ended up writing: > >> > >> d = [[0] * 5 for _ in range(10)] > > d = [[0]*5]*10 # what about this? That doesn't do what you want. It's actually a common "gotcha", since it makes ten repetitions of the same five element list, not ten *copies*. py> d = [[0]*5]*10 py> d[0][0] = 9999 py> print(d) [[9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0], [9999, 0, 0, 0, 0]] A slightly unfortunate conflict between desires: on the one hand, we definitely don't want * making copies of its arguments; on the other hand, that makes it less useful for initialising multi-dimensional (nested) lists. But then, nested lists ought to be rare. "Flat is better than nested." > Simplified repeating could be probably useful in interactive mode. I'm sorry, did you just suggest that language features should behave differently in interactive mode than non-interactive mode? If so, that's a TERRIBLE idea. The point of interactive mode is to try out syntax and code and see what it does, before using it in non- interactive scripts. If things behave differently, people will be left confused why the *exact same line of code* works differently in a script and when they try it interactively. It is bad enough that the interactive interpreter includes a handful of features that make it different from non-interactive. I've been caught out repeatedly by the "last evaluated result" variable _ changing when the destructor method __del__ runs. That's another unavoidable case: adding extra, necessary functionality to the interactive interpreter, namely the ability to access the last evaluated result, neccessarily holds onto a reference to that last result. But that's easy enough to reason about, once you remember what's going on. But changing the behaviour of the language itself is just a bad idea. -- Steve From ncoghlan at gmail.com Fri Mar 31 01:12:20 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 31 Mar 2017 15:12:20 +1000 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 31 March 2017 at 03:06, Markus Meskanen wrote: > And like I said before, for loop is just another way of doing while loop, > yet nobody's complaining. There's nothing wrong with having two different > ways of doing the same thing, as long as one of them is never the better > way. If we add `repeat`, there's never a reason to use `for _ in range` > anymore. > > What comes to your custom class solution, it's uglier, harder to follow, and > way slower than just doing: > > d = [[0]*5 for _ in range(10)] > > While the proposed method would be faster, shorter, and cleaner. Well, no, as regularly doing this suggests someone is attempting to write C-in-Python rather than writing Python-in-Python. While C is certainly Python's heritage (especially back in the days before the iterator protocol, when "for i in range(len(container)):" was still a recommended idiom), writing Python code using C idioms isn't even close to being the recommended way of doing things today. So when you say "I use the 'expr for __ in range(count)' pattern a lot", we hear "I don't typically exploit first class functions and the iterator protocol to their full power". And that's fine as far as it goes - 'expr for __ in range(count)' is perfectly acceptable code, and there's nothing wrong with it. However, what it *doesn't* provide is adequate justification for adding an entirely new construct to the language - given other iterator protocol and first class function based tools like enumerate(), itertools.repeat(), zip(), map(), etc, we don't want to add a new non-composable form of iteration purely for the "repeat this operation a known number of times" case. To elaborate on that point, note that any comprehension can always be reformulated as an iteration over a sequence of callables, in this case: init = ([0]*5).copy d = [init() for init in (init,)*10] (This is actually ~20% faster on my machine than the original version with the dummy variable, since it moves the sequence repetition step outside the loop and hence only does it once rather than 10 times) And that can be factored out into a "repeat_call" helper function, with itertools.repeat making it easy to avoid actually creating a tuple: from itertools import repeat def repeat_call(callable, n): for c in repeat(callable, n) yield c() (You can also avoid the itertools dependency by using the dummy variable formulation inside "repeat_call" without the difference being visible to external code) At that point, regardless of the internal implementation details of `repeat_call`, the original example would just look like: d = list(repeat_call(([0]*5).copy, 10)) To say "give me a list containing 10 distinct lists, each containing 5 zeroes". So *if* we were to add anything to the language here, it would be to add `itertools.repeat_call` as a new iteration primitive, since it isn't entirely straightforward to construct that operation out of the existing primitives, with itertools.starmap coming closest: def repeat_call(callable, n): yield from starmap(callable, repeat((), n)) But the explicit for loop being clearest: def repeat_call(callable, n): for __ in range(n): yield callable() Cheers, Nick. P.S. The common problems shared by all of the `repeat_call` formulations in this post are that they don't set __length_hint__ appropriately, and hence lose efficiency when using them to build containers, and also they don't have nice representations they way other itertools objects do. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Mar 31 01:22:58 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 31 Mar 2017 15:22:58 +1000 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On 31 March 2017 at 15:12, Nick Coghlan wrote: > So *if* we were to add anything to the language here, it would be to > add `itertools.repeat_call` as a new iteration primitive, since it > isn't entirely straightforward to construct that operation out of the > existing primitives, with itertools.starmap coming closest: > > def repeat_call(callable, n): > yield from starmap(callable, repeat((), n)) > > But the explicit for loop being clearest: > > def repeat_call(callable, n): > for __ in range(n): > yield callable() It occurred to me to check whether or not `more_itertools` already had a suitable operation here, and it does: https://more-itertools.readthedocs.io/en/latest/api.html#more_itertools.repeatfunc This is the `repeatfunc` recipe from the itertools documentation: https://docs.python.org/3/library/itertools.html#itertools-recipes Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Fri Mar 31 01:38:39 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 31 Mar 2017 16:38:39 +1100 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: <20170331053838.GC9464@ando.pearwood.info> On Thu, Mar 30, 2017 at 12:59:57PM +0300, Markus Meskanen wrote: > And instead of learning a special syntax, which is simple and easy to > understand when they google "repeat many times python", they now end up > learning a special semantic by naming the variable with an underscore. Let me preface this by saying that, *in principle*, I don't mind the idea of a repeat-without-dummy-variable syntax. I spent a lot of time programming in Apple's Hypertalk back in the 80s and 90s, and it had no fewer than five different loop styles: repeat [forever] repeat until condition repeat while condition repeat with i = start [to|down to] end repeat [for] N [times] where words in [] are optional. That last case is exactly the sort of thing you are talking about: it repeats N times, without creating a loop variable. In Hypertalk's case, this worked really well, and I liked it. So I can say that in principle this is not a bad idea, and it really suits some language styles. But not Python. It works for Hypertalk because it had an already *extremely* verbose and English-like syntax. A typical piece of Hypertalk code might be something like this: add one to total put the value of num after word 4 of item 3 of line 2 of field 1 For Hypertalk's target demographic (non-programmers who happen to be doing a bit of programming) it really is better for the language to provide a close match to their mental concept "repeat five times". The whole language is designed to work like people think, even when that's inefficient. I miss it a lot :-) But Python is a much more general purpose language, and beginners and non-programmers are only a tiny fraction of Python's demographic. Python is only English-like compared to languages like Perl or C which look like line-noise to the uninitiated. So what works for Hypertalk doesn't necessarily work for Python. "repeat 5 times" matches the philosophy and style of Hypertalk, but it clashes with the philosophy and style of Python. Python does not generally go into special-purpose syntax useful only in a specialised situation, preferring instead *general* syntax that can be adapted to a wide-range of situations. Using _ to mean "I don't care about this name" is general purpose. You can use it *anywhere*, not just in loops: # I only care about the side-effects, not the return result _ = call_function() # unpack seven items, but I only care about three of them a, _, b, _, _, _, c = items And it is optional too! If you don't like the name _ you can use anything you like: d = [[0]*5 for whocares in [None]*10] will create your nested lists for you, and probably ever so slightly more efficiently than using range(). So even if there's nothing overtly or especially *bad* about adding specialist syntax to the language, it isn't a great fit to the rest of Python. And of course any change has some cost: - more complexity in the parser and compiler; - more code to implement it; - more features to be tested; - more documentation; - more things for people to learn; - tutorials have more to cover; - people writing a loop have one extra decision to think about; etc. It might not be a *big* cost (it's just one small feature, not an entire Javascript interpreter added to the language!) but it is still a cost. Does the feature's usefulness outweigh its cost? Probably not. It's only a tiny feature, of limited usefulness. At *best* it is a marginal improvement, and it probably isn't even that. I'm not saying it isn't useful. I'm saying it isn't useful *enough*. -- Steve From suresh_vv at yahoo.com Fri Mar 31 02:49:04 2017 From: suresh_vv at yahoo.com (Suresh V.) Date: Fri, 31 Mar 2017 12:19:04 +0530 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On Thursday 30 March 2017 02:48 PM, Markus Meskanen wrote: > Hi Pythonistas, > > yet again today I ended up writing: > > d = [[0] * 5 for _ in range(10)] > > And wondered, why don't we have a way to repeat other than looping over > range() and using a dummy variable? This seems like a rather common > thing to do, and while the benefit doesn't seem much, something like > this would be much prettier and more pythonic than using underscore > variable: > > d = [[0] * 5 repeat_for 10] Why not: d = [[0] * 5 ] * 10 From jcgoble3 at gmail.com Fri Mar 31 03:20:32 2017 From: jcgoble3 at gmail.com (Jonathan Goble) Date: Fri, 31 Mar 2017 07:20:32 +0000 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On Fri, Mar 31, 2017 at 2:49 AM Suresh V. via Python-ideas < python-ideas at python.org> wrote: > On Thursday 30 March 2017 02:48 PM, Markus Meskanen wrote: > > Hi Pythonistas, > > > > yet again today I ended up writing: > > > > d = [[0] * 5 for _ in range(10)] > > > > And wondered, why don't we have a way to repeat other than looping over > > range() and using a dummy variable? This seems like a rather common > > thing to do, and while the benefit doesn't seem much, something like > > this would be much prettier and more pythonic than using underscore > > variable: > > > > d = [[0] * 5 repeat_for 10] > > Why not: > > d = [[0] * 5 ] * 10 > If you had read the thread before replying, you would have seen that several people have suggested this, and several others have pointed out why it won't work: because that creates a list of 10 references to the SAME five-element list, meaning that mutating d[0] also affects d[1] through d[9] (since each is the same list). The comprehension is necessary to ensure that each element of d is a distinct list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Fri Mar 31 03:23:47 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Fri, 31 Mar 2017 16:23:47 +0900 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> Message-ID: <22750.1027.251684.274189@turnbull.sk.tsukuba.ac.jp> Abe Dillon writes: > Note that the entire documentation is 250 words while just the syntax > portion of Python docs for the re module is over 3000 words. Since Verbal Expressions (below, VEs, indicating notation) "compile" to regular expressions (spelling out indicates the internal matching implementation), the documentation of VEs presumably ignores everything except the limited language it's useful for. To actually understand VEs, you need to refer to the RE docs. Not a win IMO. > > You think that example is more readable than the proposed transalation > > ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ > > which is better written > > ^https?://(www\.)?[^ ]*$ > > or even > > ^https?://[^ ]*$ > > > Yes. I find it *far* more readable. It's not a soup of symbols like Perl > code. I can only surmise that you're fluent in regex because it seems > difficult for you to see how the above could be less readable than English > words. Yes, I'm fairly fluent in regular expression notation (below, REs). I've maintained a compiler for one dialect. I'm not interested in the difference between words and punctuation though. The reason I find the middle RE most readable is that it "looks like" what it's supposed to match, in a contiguous string as the object it will match will be contiguous. If I need to parse it to figure out *exactly* what it matches, yes, that takes more effort. But to understand a VE's semantics correctly, I'd have to look it up as often as you have to look up REs because many words chosen to notate VEs have English meanings that are (a) ambiguous, as in all natural language, and (b) only approximate matches to RE semantics. > I could tell it only matches URLs that are the only thing inside > the string because it clearly says: start_of_line() and > end_of_line(). That's not the problem. The problem is the semantics of the method "find". "then" would indeed read better, although it doesn't exactly match the semantics of concatenation in REs. > I would have had to refer to a reference to know that "^" doesn't > always mean "not", it sometimes means "start of string" and > probably other things. I would also have to check a reference to > know that "$" can mean "end of string" (and probably other things). And you'll still have to do that when reading other people's REs. > > Are those groups capturing in Verbal Expressions? The use of > > "find" (~ "search") rather than "match" is disconcerting to the > > experienced user. > > You can alternately use the word "then". The source code is just > one python file. It's very easy to read. I actually like "then" > over "find" for the example: You're missing the point. The reader does not get to choose the notation, the author does. I do understand what several varieties of RE mean, but the variations are of two kinds: basic versus extended (ie, what tokens need to be escaped to be taken literally, which ones have special meaning if escaped), and extensions (which can be ignored). Modern RE facilities are essentially all of the extended variety. Once you've learned that, you're in good shape for almost any RE that should be written outside of an obfuscated code contest. This is a fundamental principle of Python design: don't make readers of code learn new things. That includes using notation developed elsewhere in many cases. > What does alternation look like? > > .OR(option1).OR(option2).OR(option3)... > > How about alternation of > > non-trivial regular expressions? > > .OR(other_verbal_expression) Real examples, rather than pseudo code, would be nice. I think you, too, will find that examples of even fairly simple nested alternations containing other constructs become quite hard to read, as they fall off the bottom of the screen. For example, the VE equivalent of scheme = "(https?|ftp|file):" would be (AFAICT): scheme = VerEx().then(VerEx().then("http") .maybe("s") .OR("ftp") .OR("file")) .then(":") which is pretty hideous, I think. And the colon is captured by a group. If perversely I wanted to extract that group from a match, what would its index be? I guess you could keep the linear arrangement with scheme = (VerEx().add("(") .then("http") .maybe("s") .OR("ftp") .OR("file") .add(")") .then(":")) but is that really an improvement over scheme = VerEx().add("(https?|ftp|file):") ;-) > > As far as I can see, Verbal Expressions are basically a way of > > making it so painful to write regular expressions that people > > will restrict themselves to regular expressions > > What's so painful to write about them? One thing that's painful is that VEs "look like" context-free grammars, but clumsy and without the powerful semantics. You can get the readability you want with greater power using grammars, which is why I would prefer we work on getting a parser module into the stdlib. But if one doesn't know about grammars, it's still not great. The main pains about writing VEs for me are (1) reading what I just wrote, (2) accessing capturing groups, and (3) verbosity. Even a VE to accurately match what is normally a fairly short string, such as the scheme, credentials, authority, and port portions of a "standard" URL, is going to be hundreds of characters long and likely dozens of lines if folded as in the examples. Another issue is that we already have a perfectly good poor man's matching library: glob. The URL example becomes http{,s}://{,www.}* Granted you lose the anchors, but how often does that matter? You apparently don't use them often enough to remember them. > Does your IDE not have autocompletion? I don't want an IDE. I have Emacs. > I find REs so painful to write that I usually just use string > methods if at all feasible. Guess what? That's the right thing to do anyway. They're a lot more readable and efficient when partitioning a string into two or three parts, or recognizing a short list of affixes. But chaining many methods, as VEs do, is not a very Pythonic way to write a program. > > I don't think that this failure to respect the developer's taste > > is restricted to this particular implementation, either. > > I generally find it distasteful to write a pseudolanguage in > strings inside of other languages (this applies to SQL as well). You mean like arithmetic operators? (Lisp does this right, right? Only one kind of expression, the function call!) It's a matter of what you're used to. I understand that people new to text-processing, or who don't do so much of it, don't find REs easy to read. So how is this a huge loss? They don't use regular expressions very often! In fact, they're far more likely to encounter, and possibly need to understand, REs written by others! > Especially when the design principals of that pseudolanguage are > *diametrically opposed* to the design principals of the host > language. A key principal of Python's design is: "you read code a > lot more often than you write code, so emphasize > readability". Regex seems to be based on: "Do the most with the > fewest key-strokes. So is all of mathematics. There's nothing wrong with concise expression for use in special cases. > Readability be dammed!". It makes a lot more sense to wrap the > psudolanguage in constructs that bring it in-line with the host > language than to take on the mental burden of trying to comprehend > two different languages at the same time. > > If you disagree, nothing's stopping you from continuing to write > res the old-fashion way. I don't think that RE and SQL are "pseudo" languages, no. And I, and most developers, will continue to write regular expressions using the much more compact and expressive RE notation. (In fact with the exception of the "word" method, in VEs you still need to use RE notion to express most of the Python extensions.) So what you're saying is that you don't read much code, except maybe your own. Isn't that your problem? Those of us who cooperate widely on applications using regular expressions will continue to communicate using REs. If that leaves you out, that's not good. But adding VEs to the stdlib (and thus encouraging their use) will split the community into RE users and VE users, if VEs are at all useful. That's a bad. I don't see that the potential usefulness of VEs to infrequent users of regular expressions outweighing the downsides of "many ways to do it" in the stdlib. > Can we at least agree that baking special re syntax directly into > the language is a bad idea? I agree that there's no particular need for RE literals. If one wants to mark an RE as some special kind of object, re.compile() does that very well both by converting to a different type internally and as a marker syntactically. > On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan wrote: > > > We don't really want to ease the use of regexps in Python - while > > they're an incredibly useful tool in a programmer's toolkit, > > they're so cryptic that they're almost inevitably a > > maintainability nightmare. I agree with Nick. Regular expressions, whatever the notation, are a useful tool (no suspension of disbelief necessary for me, though!). But they are cryptic, and it's not just the notation. People (even experienced RE users) are often surprised by what fairly simple regular expression match in a given text, because people want to read a regexp as instructions to a one-pass greedy parser, and it isn't. For example, above I wrote scheme = "(https?|ftp|file):" rather than scheme = "(\w+):" because it's not unlikely that I would want to treat those differently from other schemes such as mailto, news, and doi. In many applications of regular expressions (such as tokenization for a parser) you need many expressions. Compactness really is a virtue in REs. Steve From rosuav at gmail.com Fri Mar 31 03:58:21 2017 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 31 Mar 2017 18:58:21 +1100 Subject: [Python-ideas] Construct a matrix from a list: matrix multiplication Message-ID: This keeps on coming up in one form or another - either someone multiplies a list of lists and ends up surprised that they're all the same, or is frustrated with the verbosity of the alternatives. Can we use the matmul operator for this? class List(list): def __matmul__(self, other): return [copy.copy(x) for x in self for _ in range(other)] >>> x = List([[0]*4]) @ 2 >>> x [[0, 0, 0, 0], [0, 0, 0, 0]] >>> x[0][0] = 1 >>> x [[1, 0, 0, 0], [0, 0, 0, 0]] If this were supported by the built-in list type, it would be either of these: >>> x = [[0] * 4] @ 2 >>> x = [[0] @ 4] @ 4 (identical functionality, as copying an integer has no effect). The semantics could be either as shown above (copy.copy()), or something very simple and narrow like "lists get shallow-copied, other objects get referenced". Thoughts? ChrisA From stephanh42 at gmail.com Fri Mar 31 04:20:48 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Fri, 31 Mar 2017 10:20:48 +0200 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: <22750.1027.251684.274189@turnbull.sk.tsukuba.ac.jp> References: <20170327151740.GJ6883@tabr.siet.ch> <22750.1027.251684.274189@turnbull.sk.tsukuba.ac.jp> Message-ID: Hi all, FWIW, I also strongly prefer the Verbal Expression style and consider "normal" regular expressions to become quickly unreadable and unmaintainable. Verbal Expressions are also much more composable. Stephan 2017-03-31 9:23 GMT+02:00 Stephen J. Turnbull : > Abe Dillon writes: > > > Note that the entire documentation is 250 words while just the syntax > > portion of Python docs for the re module is over 3000 words. > > Since Verbal Expressions (below, VEs, indicating notation) "compile" > to regular expressions (spelling out indicates the internal matching > implementation), the documentation of VEs presumably ignores > everything except the limited language it's useful for. To actually > understand VEs, you need to refer to the RE docs. Not a win IMO. > > > > You think that example is more readable than the proposed transalation > > > ^(http)(s)?(\:\/\/)(www\.)?([^\ ]*)$ > > > which is better written > > > ^https?://(www\.)?[^ ]*$ > > > or even > > > ^https?://[^ ]*$ > > > > > > Yes. I find it *far* more readable. It's not a soup of symbols like Perl > > code. I can only surmise that you're fluent in regex because it seems > > difficult for you to see how the above could be less readable than English > > words. > > Yes, I'm fairly fluent in regular expression notation (below, REs). > I've maintained a compiler for one dialect. > > I'm not interested in the difference between words and punctuation > though. The reason I find the middle RE most readable is that it > "looks like" what it's supposed to match, in a contiguous string as > the object it will match will be contiguous. If I need to parse it to > figure out *exactly* what it matches, yes, that takes more effort. > But to understand a VE's semantics correctly, I'd have to look it up > as often as you have to look up REs because many words chosen to notate > VEs have English meanings that are (a) ambiguous, as in all natural > language, and (b) only approximate matches to RE semantics. > > > I could tell it only matches URLs that are the only thing inside > > the string because it clearly says: start_of_line() and > > end_of_line(). > > That's not the problem. The problem is the semantics of the method > "find". "then" would indeed read better, although it doesn't exactly > match the semantics of concatenation in REs. > > > I would have had to refer to a reference to know that "^" doesn't > > always mean "not", it sometimes means "start of string" and > > probably other things. I would also have to check a reference to > > know that "$" can mean "end of string" (and probably other things). > > And you'll still have to do that when reading other people's REs. > > > > Are those groups capturing in Verbal Expressions? The use of > > > "find" (~ "search") rather than "match" is disconcerting to the > > > experienced user. > > > > You can alternately use the word "then". The source code is just > > one python file. It's very easy to read. I actually like "then" > > over "find" for the example: > > You're missing the point. The reader does not get to choose the > notation, the author does. I do understand what several varieties of > RE mean, but the variations are of two kinds: basic versus extended > (ie, what tokens need to be escaped to be taken literally, which ones > have special meaning if escaped), and extensions (which can be > ignored). Modern RE facilities are essentially all of the extended > variety. Once you've learned that, you're in good shape for almost > any RE that should be written outside of an obfuscated code contest. > > This is a fundamental principle of Python design: don't make readers > of code learn new things. That includes using notation developed > elsewhere in many cases. > > > What does alternation look like? > > > > .OR(option1).OR(option2).OR(option3)... > > > > How about alternation of > > > non-trivial regular expressions? > > > > .OR(other_verbal_expression) > > Real examples, rather than pseudo code, would be nice. I think you, > too, will find that examples of even fairly simple nested alternations > containing other constructs become quite hard to read, as they fall > off the bottom of the screen. > > For example, the VE equivalent of > > scheme = "(https?|ftp|file):" > > would be (AFAICT): > > scheme = VerEx().then(VerEx().then("http") > .maybe("s") > .OR("ftp") > .OR("file")) > .then(":") > > which is pretty hideous, I think. And the colon is captured by a > group. If perversely I wanted to extract that group from a match, > what would its index be? > > I guess you could keep the linear arrangement with > > scheme = (VerEx().add("(") > .then("http") > .maybe("s") > .OR("ftp") > .OR("file") > .add(")") > .then(":")) > > but is that really an improvement over > > scheme = VerEx().add("(https?|ftp|file):") > > ;-) > > > > As far as I can see, Verbal Expressions are basically a way of > > > making it so painful to write regular expressions that people > > > will restrict themselves to regular expressions > > > > What's so painful to write about them? > > One thing that's painful is that VEs "look like" context-free > grammars, but clumsy and without the powerful semantics. You can get > the readability you want with greater power using grammars, which is > why I would prefer we work on getting a parser module into the stdlib. > > But if one doesn't know about grammars, it's still not great. The > main pains about writing VEs for me are (1) reading what I just wrote, > (2) accessing capturing groups, and (3) verbosity. Even a VE to > accurately match what is normally a fairly short string, such as the > scheme, credentials, authority, and port portions of a "standard" URL, > is going to be hundreds of characters long and likely dozens of lines > if folded as in the examples. > > Another issue is that we already have a perfectly good poor man's > matching library: glob. The URL example becomes > > http{,s}://{,www.}* > > Granted you lose the anchors, but how often does that matter? You > apparently don't use them often enough to remember them. > > > Does your IDE not have autocompletion? > > I don't want an IDE. I have Emacs. > > > I find REs so painful to write that I usually just use string > > methods if at all feasible. > > Guess what? That's the right thing to do anyway. They're a lot more > readable and efficient when partitioning a string into two or three > parts, or recognizing a short list of affixes. But chaining many > methods, as VEs do, is not a very Pythonic way to write a program. > > > > I don't think that this failure to respect the developer's taste > > > is restricted to this particular implementation, either. > > > > I generally find it distasteful to write a pseudolanguage in > > strings inside of other languages (this applies to SQL as well). > > You mean like arithmetic operators? (Lisp does this right, right? > Only one kind of expression, the function call!) It's a matter of > what you're used to. I understand that people new to text-processing, > or who don't do so much of it, don't find REs easy to read. So how is > this a huge loss? They don't use regular expressions very often! In > fact, they're far more likely to encounter, and possibly need to > understand, REs written by others! > > > Especially when the design principals of that pseudolanguage are > > *diametrically opposed* to the design principals of the host > > language. A key principal of Python's design is: "you read code a > > lot more often than you write code, so emphasize > > readability". Regex seems to be based on: "Do the most with the > > fewest key-strokes. > > So is all of mathematics. There's nothing wrong with concise > expression for use in special cases. > > > Readability be dammed!". It makes a lot more sense to wrap the > > psudolanguage in constructs that bring it in-line with the host > > language than to take on the mental burden of trying to comprehend > > two different languages at the same time. > > > > If you disagree, nothing's stopping you from continuing to write > > res the old-fashion way. > > I don't think that RE and SQL are "pseudo" languages, no. And I, and > most developers, will continue to write regular expressions using the > much more compact and expressive RE notation. (In fact with the > exception of the "word" method, in VEs you still need to use RE notion > to express most of the Python extensions.) So what you're saying is > that you don't read much code, except maybe your own. Isn't that your > problem? Those of us who cooperate widely on applications using > regular expressions will continue to communicate using REs. If that > leaves you out, that's not good. But adding VEs to the stdlib (and > thus encouraging their use) will split the community into RE users and > VE users, if VEs are at all useful. That's a bad. I don't see that > the potential usefulness of VEs to infrequent users of regular > expressions outweighing the downsides of "many ways to do it" in the > stdlib. > > > Can we at least agree that baking special re syntax directly into > > the language is a bad idea? > > I agree that there's no particular need for RE literals. If one wants > to mark an RE as some special kind of object, re.compile() does that > very well both by converting to a different type internally and as a > marker syntactically. > > > On Wed, Mar 29, 2017 at 11:49 PM, Nick Coghlan wrote: > > > > > We don't really want to ease the use of regexps in Python - while > > > they're an incredibly useful tool in a programmer's toolkit, > > > they're so cryptic that they're almost inevitably a > > > maintainability nightmare. > > I agree with Nick. Regular expressions, whatever the notation, are a > useful tool (no suspension of disbelief necessary for me, though!). > But they are cryptic, and it's not just the notation. People (even > experienced RE users) are often surprised by what fairly simple > regular expression match in a given text, because people want to read > a regexp as instructions to a one-pass greedy parser, and it isn't. > > For example, above I wrote > > scheme = "(https?|ftp|file):" > > rather than > > scheme = "(\w+):" > > because it's not unlikely that I would want to treat those differently > from other schemes such as mailto, news, and doi. In many > applications of regular expressions (such as tokenization for a > parser) you need many expressions. Compactness really is a virtue in > REs. > > Steve > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From pavol.lisy at gmail.com Fri Mar 31 04:21:02 2017 From: pavol.lisy at gmail.com (Pavol Lisy) Date: Fri, 31 Mar 2017 10:21:02 +0200 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: <20170331043641.GB9464@ando.pearwood.info> References: <20170331043641.GB9464@ando.pearwood.info> Message-ID: On 3/31/17, Steven D'Aprano wrote: > On Thu, Mar 30, 2017 at 04:23:05PM +0200, Pavol Lisy wrote: >> On 3/30/17, Nick Coghlan wrote: >> > On 30 March 2017 at 19:18, Markus Meskanen >> > wrote: >> >> Hi Pythonistas, >> >> >> >> yet again today I ended up writing: >> >> >> >> d = [[0] * 5 for _ in range(10)] >> >> d = [[0]*5]*10 # what about this? > > That doesn't do what you want. > > It's actually a common "gotcha", since it makes ten repetitions of the > same five element list, not ten *copies*. Yes. It is clear that I did mistake here (sorry for that!). >> Simplified repeating could be probably useful in interactive mode. > I'm sorry, did you just suggest that language features should behave > differently in interactive mode than non-interactive mode? No. I did not suggest it. In contrary: if it is really useful for interactive python (*) then I suggest to implement it generally! (*) - this is matter of discussion where I don't think this proposal will be accepted... Python is multipurpose language which means that something which is not useful in A may be useful in B. Interactive python could be something like discussing with computer where we don't like to talk too much to say a little. > The point of interactive mode is to try > out syntax and code and see what it does, before using it in non- > interactive scripts. If things behave differently, people will be left > confused why the *exact same line of code* works differently in a script > and when they try it interactively. Sorry, this is not true. Not everybody use interactive python only for testing python constructs. (For example see http://ipython.readthedocs.io/en/stable/interactive/shell.html?highlight=shell ) PL From dickinsm at gmail.com Fri Mar 31 04:23:26 2017 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 31 Mar 2017 09:23:26 +0100 Subject: [Python-ideas] Way to repeat other than "for _ in range(x)" In-Reply-To: References: Message-ID: On Thu, Mar 30, 2017 at 10:18 AM, Markus Meskanen wrote: > And wondered, why don't we have a way to repeat other than looping over > range() and using a dummy variable? If it's the assignment to a dummy variable that bothers you, the language already has a way around this: Python 3.6.0 (default, Jan 9 2017, 12:18:47) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import functools, itertools >>> times = functools.partial(itertools.repeat, ()) >>> ["spam" for () in times(4)] ['spam', 'spam', 'spam', 'spam'] Look Ma, no dummy variables! -- Mark From p.f.moore at gmail.com Fri Mar 31 04:26:36 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 31 Mar 2017 09:26:36 +0100 Subject: [Python-ideas] What about regexp string litterals : re".*" ? In-Reply-To: References: <20170327151740.GJ6883@tabr.siet.ch> <22750.1027.251684.274189@turnbull.sk.tsukuba.ac.jp> Message-ID: On 31 March 2017 at 09:20, Stephan Houben wrote: > FWIW, I also strongly prefer the Verbal Expression style and consider > "normal" regular expressions to become quickly unreadable and > unmaintainable. Do you publish your code widely? What's the view of 3rd party users of your code? Until this thread, I'd never even heard of the Verbal Expression style, and I read a *lot* of open source Python code. While it's purely anecdotal, that suggests to me that the style isn't particularly commonly used. (OTOH, there's also a lot less use of REs in Python code than in other languages. Much string manipulation in Python avoids using regular languages at all, in my experience. I think that's a good thing - use simpler tools when appropriate and keep the power tools for the hard cases where they justify their complexity). Paul From steve at pearwood.info Fri Mar 31 21:44:17 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 1 Apr 2017 12:44:17 +1100 Subject: [Python-ideas] Construct a matrix from a list: matrix multiplication In-Reply-To: References: Message-ID: <20170401014415.GD9464@ando.pearwood.info> On Fri, Mar 31, 2017 at 06:58:21PM +1100, Chris Angelico wrote: > This keeps on coming up in one form or another - either someone > multiplies a list of lists and ends up surprised that they're all the > same, or is frustrated with the verbosity of the alternatives. > > Can we use the matmul operator for this? I like the idea of using * for repetition without copying, and @ for repetition with shallow copying. That does mean that now you have a built-in operator which relies on the copy module, since it has to work with arbitrary objects. Isn't copy written in Python? [...] > If this were supported by the built-in list type, it would be either of these: > > >>> x = [[0] * 4] @ 2 > >>> x = [[0] @ 4] @ 4 > > (identical functionality, as copying an integer has no effect). I think that's an implementation detail: copying immutable objects *might* return a reference to the original immutable object, or it might return a new object. For ints, any sane implementation would surely behave as we say, but let's not specify that as part of the behaviour of @ itself. > The semantics could be either as shown above (copy.copy()), or > something very simple and narrow like "lists get shallow-copied, other > objects get referenced". I prefer the distinction copy versus non-copy. That makes it simple to understand, and means that it works if somebody wants a list of dicts instead of a list of lists: data = [{'a': 1, 'b': 2}] @ 5 -- Steve From rosuav at gmail.com Fri Mar 31 22:11:17 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 1 Apr 2017 13:11:17 +1100 Subject: [Python-ideas] Construct a matrix from a list: matrix multiplication In-Reply-To: <20170401014415.GD9464@ando.pearwood.info> References: <20170401014415.GD9464@ando.pearwood.info> Message-ID: On Sat, Apr 1, 2017 at 12:44 PM, Steven D'Aprano wrote: > On Fri, Mar 31, 2017 at 06:58:21PM +1100, Chris Angelico wrote: >> This keeps on coming up in one form or another - either someone >> multiplies a list of lists and ends up surprised that they're all the >> same, or is frustrated with the verbosity of the alternatives. >> >> Can we use the matmul operator for this? > > I like the idea of using * for repetition without copying, and @ for > repetition with shallow copying. > > That does mean that now you have a built-in operator which relies on the > copy module, since it has to work with arbitrary objects. Isn't copy > written in Python? Yes it is, but I'm not entirely sure of all its semantics. >> If this were supported by the built-in list type, it would be either of these: >> >> >>> x = [[0] * 4] @ 2 >> >>> x = [[0] @ 4] @ 4 >> >> (identical functionality, as copying an integer has no effect). > > I think that's an implementation detail: copying immutable objects > *might* return a reference to the original immutable object, or it might > return a new object. For ints, any sane implementation would surely > behave as we say, but let's not specify that as part of the > behaviour of @ itself. Right, right. What I meant was that people would be free to build the matrix either way, since it wouldn't have any significant difference. Of course it'd be legal to copy the integer too, but you shouldn't care one way or the other. >> The semantics could be either as shown above (copy.copy()), or >> something very simple and narrow like "lists get shallow-copied, other >> objects get referenced". > > I prefer the distinction copy versus non-copy. That makes it simple to > understand, and means that it works if somebody wants a list of dicts > instead of a list of lists: > > data = [{'a': 1, 'b': 2}] @ 5 Sounds like a plan. Probably, though, the @ operator should be defined in concrete terms that are similar to the implementation of copy.copy(), rather than actually being copy.copy(), in case someone shadows or monkey-patches the module. But normal usage should have them behave the same way. ChrisA