From joshua.landau.ws at gmail.com Mon Jul 1 00:00:08 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 30 Jun 2013 23:00:08 +0100 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: On 30 June 2013 21:52, Guido van Rossum wrote: > I apologize, this thread was too long for me to follow. Is the issue > the following? > >>>> def stopif(x): > ... if x: raise StopIteration > ... return True > ... >>>> [i for i in range(10) if stopif(i==3)] > Traceback (most recent call last): > File "", line 1, in > File "", line 1, in > File "", line 2, in stopif > StopIteration >>>> list(i for i in range(10) if stopif(i==3)) > [0, 1, 2] > > I.e. the difference between list() and [] is that if > raises StopIteration, list(...) returns the elements up to > that point but [...] passes the exception out? > > That seems a bug to me inherited from the Python 2 implementation of > list comprehensions and I'm fine with fixing it in 3.4. The intention > of the changes to comprehensions in Python 3 was that these two forms > would be completely equivalent. The difficulty has always been that > CPython comprehensions were traditionally faster than generator > expressions and we're reluctant to give that up. But it's still a bug. But which way is the bug? Personally, the list comprehension has it right. I'd prefer if (raise_stopiteration() for _ in [0]) actually had the StopIteration fall through. From greg.ewing at canterbury.ac.nz Mon Jul 1 00:08:28 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 01 Jul 2013 10:08:28 +1200 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: <51D0AC5C.6050109@canterbury.ac.nz> Ron Adam wrote: > It's the same as in-lineing the generator > parts into the iterator that is driving it. > > We don't need to do that because we already have an optimised version of > that. It just needs to catch the StopIteration to be the same. > > I think that it's not uncommon for people to think this is how list > comps work. And I think it is surprising for them that the > StopIteration isn't caught. I tend to feel that the fact that raising StopIteration in a generator has the same effect as returning from the generator is a quirk of the implementation that shouldn't be relied on. I'm not sure we should be giving it official status by going out of our way to make listcomps behave the same. -- Greg From abarnert at yahoo.com Mon Jul 1 00:24:48 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 30 Jun 2013 15:24:48 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <51D0AC5C.6050109@canterbury.ac.nz> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> <51D0AC5C.6050109@canterbury.ac.nz> Message-ID: <1B7E7893-438E-47EC-888E-CED39F86B431@yahoo.com> On Jun 30, 2013, at 15:08, Greg Ewing wrote: > Ron Adam wrote: >> It's the same as in-lineing the generator parts into the iterator that is driving it. >> We don't need to do that because we already have an optimised version of that. It just needs to catch the StopIteration to be the same. >> I think that it's not uncommon for people to think this is how list comps work. And I think it is surprising for them that the StopIteration isn't caught. > > I tend to feel that the fact that raising StopIteration in a > generator has the same effect as returning from the generator > is a quirk of the implementation that shouldn't be relied on. > I'm not sure we should be giving it official status by going > out of our way to make listcomps behave the same. What other effect could it possibly have? The genexp doesn't do anything special with a StopIteration--it passes it through like any other exception. And this is the same as for an explicit generator function. The calling code--whether it's calling next(), using a for loop, or iterating in C--sees StopIteration for a generator return by definition, and sees StopIteration if explicitly raised because that's how exceptions and generators work. Unless you add an extra new rule that says raising StopIteration inside a generator is illegal and will raise a TypeError or something, there's no other way a valid implementation of Python could possibly work. I think a lot of people are looking at this wrong. In list(genexp), it's not the genexp part that's doing anything here; it's the list function. There's no way that it could distinguish between an explicit StopIteration and an implicit one, so it treats them the same way. If a comprehension is supposed to be the same as list(genexp) it should act like list--or like any other iteration mechanism you can write in Python or even in C--rather than acting magically as it currently does. From ncoghlan at gmail.com Mon Jul 1 00:28:50 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jul 2013 08:28:50 +1000 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: On 1 Jul 2013 07:01, "Guido van Rossum" wrote: > > I apologize, this thread was too long for me to follow. Is the issue > the following? > > >>> def stopif(x): > ... if x: raise StopIteration > ... return True > ... > >>> [i for i in range(10) if stopif(i==3)] > Traceback (most recent call last): > File "", line 1, in > File "", line 1, in > File "", line 2, in stopif > StopIteration > >>> list(i for i in range(10) if stopif(i==3)) > [0, 1, 2] > > I.e. the difference between list() and [] is that if > raises StopIteration, list(...) returns the elements up to > that point but [...] passes the exception out? > > That seems a bug to me inherited from the Python 2 implementation of > list comprehensions and I'm fine with fixing it in 3.4. The intention > of the changes to comprehensions in Python 3 was that these two forms > would be completely equivalent. The difficulty has always been that > CPython comprehensions were traditionally faster than generator > expressions and we're reluctant to give that up. But it's still a bug. Yep, and Andrew pointed out the overhead of fixing it is actually quite low - we just have to tweak comprehensions to wrap the entire loop in a try/except that ignores the StopIteration exception. That brings them into line with the generator form where it doesn't matter if the exception comes directly from the generator code or is raised by the interpreter due to the frame terminating, the loop implicit in the list call will treat it as indicating the end of the generator. Cheers, Nick. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Mon Jul 1 00:40:29 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 30 Jun 2013 23:40:29 +0100 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <1B7E7893-438E-47EC-888E-CED39F86B431@yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> <51D0AC5C.6050109@canterbury.ac.nz> <1B7E7893-438E-47EC-888E-CED39F86B431@yahoo.com> Message-ID: On 30 June 2013 23:24, Andrew Barnert wrote: > I think a lot of people are looking at this wrong. In list(genexp), it's not the genexp part that's doing anything here; it's the list function. There's no way that it could distinguish between an explicit StopIteration and an implicit one, so it treats them the same way. If a comprehension is supposed to be the same as list(genexp) it should act like list--or like any other iteration mechanism you can write in Python or even in C--rather than acting magically as it currently does. Damnit, you're so obviously right. Yeah, fine. Whatever. From guido at python.org Mon Jul 1 01:24:38 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 30 Jun 2013 16:24:38 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> Message-ID: On Sun, Jun 30, 2013 at 3:28 PM, Nick Coghlan wrote: > > On 1 Jul 2013 07:01, "Guido van Rossum" wrote: >> >> I apologize, this thread was too long for me to follow. Is the issue >> the following? >> >> >>> def stopif(x): >> ... if x: raise StopIteration >> ... return True >> ... >> >>> [i for i in range(10) if stopif(i==3)] >> Traceback (most recent call last): >> File "", line 1, in >> File "", line 1, in >> File "", line 2, in stopif >> StopIteration >> >>> list(i for i in range(10) if stopif(i==3)) >> [0, 1, 2] >> >> I.e. the difference between list() and [] is that if >> raises StopIteration, list(...) returns the elements up to >> that point but [...] passes the exception out? >> >> That seems a bug to me inherited from the Python 2 implementation of >> list comprehensions and I'm fine with fixing it in 3.4. The intention >> of the changes to comprehensions in Python 3 was that these two forms >> would be completely equivalent. The difficulty has always been that >> CPython comprehensions were traditionally faster than generator >> expressions and we're reluctant to give that up. But it's still a bug. > > Yep, and Andrew pointed out the overhead of fixing it is actually quite low > - we just have to tweak comprehensions to wrap the entire loop in a > try/except that ignores the StopIteration exception. That brings them into > line with the generator form where it doesn't matter if the exception comes > directly from the generator code or is raised by the interpreter due to the > frame terminating, the loop implicit in the list call will treat it as > indicating the end of the generator. Thanks, sounds good. -- --Guido van Rossum (python.org/~guido) From shane at umbrellacode.com Mon Jul 1 01:44:56 2013 From: shane at umbrellacode.com (Shane Green) Date: Sun, 30 Jun 2013 16:44:56 -0700 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <8D03FBA9D18C192-1864-1B74F@webmail-m103.sysops.aol.com> <8D03FC00B987DF8-1864-1BA67@webmail-m103.sysops.aol.com> <8D03FC42830FCEA-1864-1BD1A@webmail-m103.sysops.aol.com> <8D03FC9DB9F3EC0-1864-1C17F@webmail-m103.sysops.aol.com> <51C9CE75.5030300@nedbatchelder.com> <44DAA987-C02D-4A8F-9E22-B88F9FEA4C7E@umbrellacode.com> <87y59yys8n.fsf@uwakimon.sk.tsukuba.ac.jp> <51CA4F8A.303@pearwood.info> <1372310008.50112.YahooMailNeo@web184705.mail.ne1.yahoo.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On Jun 29, 2013, at 9:53 PM, Andrew Barnert wrote: > From: Shane Green > Sent: Saturday, June 29, 2013 5:10 AM > > >> Thanks Andrew. My knee jerk reaction was to strongly prefer option two, which sounds like?if I understood correctly, and I?m not sure I do?it keeps both comprehensions and expressions. Rereading your points again, I must admit I didn?t see much to justify the knee jerk reaction. > > Sorry, it's my fault for conflating two independent choices. Let me refactor things: > > Nobody's talking about getting rid of comprehensions. Today, you have the choice to write [comp] instead of list(comp), and that won't change. However, today these differ in that the latter lets you exit early with StopIteration, while the former doesn't. That's what's proposed to change. > > Choice 1 is about the language definition. With this change, there is no remaining difference between genexps except for returning a list vs. an iterator. That means we could simplify things by defining the two concepts together (or defining one in terms of the other). I don't see any downside to doing that. > > Choice 2 is about the CPython implementation. We can reimplement comprehensions as wrappers around genexps, or we can just add a try?except into comprehensions. The former would simplify the compiler and the interpreter, but at a cost of up to 40% for comprehensions. The latter would leave things no simpler than they are today, but also no slower. > > Once put this way, I think the choices are obvious: Simplify the language, don't simplify the implementation. > >> I do commonly use list comprehensions precisely *because* of the performance impact, and can think of a few places the 40% would be problematic. > > Usually that's a premature optimization. This kind of implies that it?s likely my use of comprehensions was premature, and therefore detracts from the validity of my usage. I?ve been releasing building management frameworks built using Python since 1.5.2. When you implement like the BACnet stack, caches, schedulers, property-level access control, etc.?and many things we would not have needed to develop given if we?d needed them now?in Python to run on embedded devices, you learn exactly where your bottlenecks are. Exhibiting performance benefits similar to map and filter, a well placed loop replacement could sometimes be the difference between needing to migrate some implementation to C or not. In our processor bound system it is likely the gains were often greeter than 40%, but the bottom line is that comprehensions are sometimes introduced during performance enhancing refactors, and that those examples would be particularly hard hit by a performance loss just like they benefited from the enhancement. So it?s not a decision that should be made lightly. > For anything simple enough that the iteration cost isn't swamped by your real work, the performance usually doesn't matter anyway. > > But "usually" isn't always, and there definitely are real-world cases where it would hurt. > >> Was there a measurable performance difference with approach 2? > > > Once I realized that the right place to put the try is just outside the loop? that makes it obvious that there is no per-iteration cost, only a constant cost. Oh yeah, don?t want to setup the try/catch on every iteration. > > If you don't raise an exception through a listcomp, that cost is basically running one more opcode and loading a few more bytes into memory. It adds less than 1% for even a trivial comp that loops 10 times, or for a realistic but still simple comp that loops 3 times. That?s excellent. > > I'll post actual numbers for local tests and for benchmarks once I get things finished (hopefully Monday). > >> On Jun 28, 2013, at 8:16 PM, Andrew Barnert wrote: > >> >> On Jun 28, 2013, at 18:50, Shane Green wrote: >>> >>> >>> Yes, but it only works for generator expressions and not comprehensions. >>> >>> >>> This is the point if options #1 and 2: make StopIteration work in comps either (1) by redefining comprehensions in terms of genexps or (2) by fiat. >>> >>> >>> After some research, it turns out that these are equivalent. Replacing any [comprehension] with list(comprehension) is guaranteed by the language (and the CPython implementation) to give you exactly the same value unless (a) something in the comp raises StopIteration, or (b) something in the comp relies on reflective properties (e.g., sys._getframe().f_code.co_flags) that aren't guaranteed anyway. >>> >>> >>> So, other than being 4 characters more verbose and 40% slower, there's already an answer for comprehensions. >>> >>> >>> And if either of those problems is unacceptable, a patch for #1 or #2 is actually pretty easy. >>> >>> >>> I've got two different proof of concepts: one actually implements the comp as passing the genexp to list, the other just wraps everything after the BUILD_LIST and before the RETURN_VALUE in a the equivalent of try: ... except StopIteration: pass. I need to add some error handling to the C code, and for #2 write sufficient tests that verify that it really does work exactly like #1, but I should have working patches to play with in a couple days. >>> >>> My opinion of that workaround is that it?s also a step backward in terms of readability. I suspect. >>> >>>> >>>> if i < 50 else stop() would probably also work, since it throws an exception. That?s better, IMHO. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Jun 28, 2013, at 6:38 PM, Andrew Carter wrote: >>>> >>>> Digging through the archives (with a quick google search) http://mail.python.org/pipermail/python-ideas/2013-January/019051.html, if you really want an expression it seems you can just do >>>>> >>>>> >>>>> def stop(): >>>>> raise StopIteration >>>>> list(i for i in range(100) if i < 50 or stop()) >>>>> >>>>> >>>>> it seems to me that this would provide syntax that doesn't require lambdas. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, Jun 28, 2013 at 4:50 PM, Alexander Belopolsky wrote: >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jun 28, 2013 at 7:38 PM, Shane Green wrote: >>>>>> >>>>>> .. >>>>>>> [x until condition for x in l ...] or >>>>>>> [x for x in l until condition] >>>>>> >>>>>> >>>>>> Just to throw in one more variation: >>>>>> >>>>>> >>>>>> [expr for item in iterable break if condition] >>>>>> >>>>>> >>>>>> (inversion of "if" and "break"reinforces the idea that we are dealing with an expression rather than a statement - compare with "a if cond else b") >>>>>> _______________________________________________ >>>>>> Python-ideas mailing list >>>>>> Python-ideas at python.org >>>>>> http://mail.python.org/mailman/listinfo/python-ideas >>>> >>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbvfour at gmail.com Mon Jul 1 02:32:07 2013 From: nbvfour at gmail.com (nbv4) Date: Sun, 30 Jun 2013 17:32:07 -0700 (PDT) Subject: [Python-ideas] Idea for new multi-line triple quote literal Message-ID: The tripple quote string literal is a great feature, but there is one problem. When you use them, it forces you to break out of you're current indentation which maks code look ugly. I propose a new way to define a triple back quote that woks the same way regular triple quotes work, but instead does some simple parsing of the data within the quotes to preserve the flow of the code. Due to the brittle and sometimes ambigious nature of anything 'automatic', this feature is obviously not meant for data where exact white space is needed. It would be great for docstrings, exception messages and other type text. Here is a short example of it's usage: https://gist.github.com/priestc/5897602 -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jul 1 02:59:05 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 01 Jul 2013 10:59:05 +1000 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <51D0AC5C.6050109@canterbury.ac.nz> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> <51D0AC5C.6050109@canterbury.ac.nz> Message-ID: <51D0D459.3090400@pearwood.info> On 01/07/13 08:08, Greg Ewing wrote: > Ron Adam wrote: >> It's the same as in-lineing the generator parts into the iterator that is driving it. >> >> We don't need to do that because we already have an optimised version of that. It just needs to catch the StopIteration to be the same. >> >> I think that it's not uncommon for people to think this is how list comps work. And I think it is surprising for them that the StopIteration isn't caught. > > I tend to feel that the fact that raising StopIteration in a > generator has the same effect as returning from the generator > is a quirk of the implementation that shouldn't be relied on. > I'm not sure we should be giving it official status by going > out of our way to make listcomps behave the same. Raising StopIteration has been one of the two official ways to halt a generator, and has been since they were introduced. I nearly wrote "has been documented as..." except it seems to me that it has never been explicitly stated in the docs or the PEP. The closest I can find is this part of the PEP that *implies*, without actually coming right out and saying so, that raising StopIteration is the official way to halt a generator: [quote] Q. Why allow "return" at all? Why not force termination to be spelled "raise StopIteration"? A. The mechanics of StopIteration are low-level details, much like the mechanics of IndexError in Python 2.1: the implementation needs to do *something* well-defined under the covers, and Python exposes these mechanisms for advanced users. That's not an argument for forcing everyone to work at that level, though. "return" means "I'm done" in any kind of function, and that's easy to explain and to use. Note that "return" isn't always equivalent to "raise StopIteration" in try/except construct, either (see the "Specification: Return" section). [end quote] http://www.python.org/dev/peps/pep-0255/ Generally the PEP talks about how generators which have already been stopped (due to some unhandled exception, or a return) will raise StopIteration, but otherwise emphasize using return rather than StopIteration. The What's New for 2.2 does explicitly say you can raise StopIteration to halt a generator, but almost as an afterthought. In contrast, StopIteration is explicitly stated to be the way to signal that an iterator is halted. http://docs.python.org/3.4/whatsnew/2.2.html#pep-234-iterators So I think the missing piece is that generators are actually iterators. Since raising StopIteration is the official way to halt an iterator, it's also the (or at least, an) official way to halt a generator, and not a quirk of the implementation. -- Steven From steve at pearwood.info Mon Jul 1 03:09:23 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 01 Jul 2013 11:09:23 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: Message-ID: <51D0D6C3.2020502@pearwood.info> Hi nbv4, and welcome. On 01/07/13 10:32, nbv4 wrote: > The tripple quote string literal is a great feature, but there is one > problem. When you use them, it forces you to break out of you're current > indentation which maks code look ugly. I propose a new way to define a > triple back quote that woks the same way regular triple quotes work, but > instead does some simple parsing of the data within the quotes to preserve > the flow of the code. Due to the brittle and sometimes ambigious nature of > anything 'automatic', this feature is obviously not meant for data where > exact white space is needed. It would be great for docstrings, exception > messages and other type text. > > Here is a short example of it's usage: > https://gist.github.com/priestc/5897602 For something as trivial as the example you give, there is no need to send people off to a website, which they may not have access too. Here's the simplified version: # Proposed syntax def func(): s = """line 1 line 2 line 3""" t = ---line 1 line 2 line 3--- return s, t The difference being, lines 2 and 3 of s will begin with four spaces, while t reduces the whitespace between lines to a single space: s == 'line 1\n line 2\n line 3' t == 'line 1 line 2 line 3' I don't think this is particularly useful. I would be more interested in it if it kept the newlines but got rid of the leading spaces: t == 'line 1\nline 2\nline 3' but in either case, I think the choice of --- as delimiter is ugly and arbitrary, and very likely is ambiguous (currently, x = ---1 is legal code). Similar suggestions to this have been made many times before, you should search the archives: http://mail.python.org/mailman/listinfo/python-ideas Regards, -- Steven From ncoghlan at gmail.com Mon Jul 1 03:30:30 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jul 2013 11:30:30 +1000 Subject: [Python-ideas] "Iteration stopping" syntax [Was: Is this PEP-able? for X in ListY while conditionZ:] In-Reply-To: <51D0D459.3090400@pearwood.info> References: <8D03F2B8CF0E7BE-1864-1796B@webmail-m103.sysops.aol.com> <1372318104.59061.YahooMailNeo@web184704.mail.ne1.yahoo.com> <1372333793.61595.YahooMailNeo@web184701.mail.ne1.yahoo.com> <1b16ba5603a7a995f4a4256720b2532b@chopin.edu.pl> <7BFCA631-0086-411E-8655-43762817FD35@umbrellacode.com> <25CD59DA-6DBD-430D-8B87-A8707A9BE0D2@yahoo.com> <1FAC27E4-B4C0-463F-A5F2-53EC33F2D89A@umbrellacode.com> <1372568014.75211.YahooMailNeo@web184702.mail.ne1.yahoo.com> <1372577188.85901.YahooMailNeo@web184701.mail.ne1.yahoo.com> <51D0AC5C.6050109@canterbury.ac.nz> <51D0D459.3090400@pearwood.info> Message-ID: On 1 July 2013 10:59, Steven D'Aprano wrote: > So I think the missing piece is that generators are actually iterators. > Since raising StopIteration is the official way to halt an iterator, it's > also the (or at least, an) official way to halt a generator, and not a quirk > of the implementation. Yeah, the reason Andrew's proposed fix to the comprehension semantics makes sense is the fact that exactly *where* StopIteration gets raised during a "__next__" invocation is supposed to be completely opaque from the point of view of the iterator protocol: while True: try: x = next(itr): except StopIteration: break # Process x... The caught "StopIteration" could come from: - a generator iterator frame terminating - a generator iterator explicitly raising StopIteration - a sequence iterator triggering IndexError - a sentinel iterator noticing the sentinel value - any other __next__ method raising StopIteration When I did the conversion to "make [x for x in y] merely an optimised version of list(x for x in y)" change for Python 3, I know I missed the fact that part of that change involved moving the evaluation of all of the subexpressions inside the implicit try/except that is part of the iterator protocol, and I don't recall anyone else bringing it up either. Even if it did come up, we must have dismissed it as introducing too much overhead to set up the almost-certainly-unnecessary try/except for each iteration. Fortunately, Andrew is right that we can avoid that overhead and use a single try/except to cover the whole comprehension, which is a nice and cheap change. Cheers, Nick. From ncoghlan at gmail.com Mon Jul 1 03:47:29 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jul 2013 11:47:29 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <51D0D6C3.2020502@pearwood.info> References: <51D0D6C3.2020502@pearwood.info> Message-ID: On 1 July 2013 11:09, Steven D'Aprano wrote: > but in either case, I think the choice of --- as delimiter is ugly and > arbitrary, and very likely is ambiguous (currently, x = ---1 is legal code). > Similar suggestions to this have been made many times before, you should > search the archives: > > http://mail.python.org/mailman/listinfo/python-ideas I'm still partial to the idea of offering textwrap.indent() and textwrap.dedent() as string methods. 1. You could add a ".dedent()" at the end of a triple quoted string for this kind of problem. For a lot of code, the runtime cost isn't an issue. 2. A JIT would definitely be able to avoid recalculating the result every time 3. Even CPython may eventually gain constant folding for that kind of method applied directly to a string literal 4. I dedent and indent long strings more often than I capitalize, center, tab expand, or perform various other operations which already grace the str type as methods. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Mon Jul 1 03:57:43 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 30 Jun 2013 18:57:43 -0700 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: On Sun, Jun 30, 2013 at 6:47 PM, Nick Coghlan wrote: > On 1 July 2013 11:09, Steven D'Aprano wrote: >> but in either case, I think the choice of --- as delimiter is ugly and >> arbitrary, and very likely is ambiguous (currently, x = ---1 is legal code). >> Similar suggestions to this have been made many times before, you should >> search the archives: >> >> http://mail.python.org/mailman/listinfo/python-ideas > > I'm still partial to the idea of offering textwrap.indent() and > textwrap.dedent() as string methods. > > 1. You could add a ".dedent()" at the end of a triple quoted string > for this kind of problem. For a lot of code, the runtime cost isn't an > issue. > 2. A JIT would definitely be able to avoid recalculating the result every time > 3. Even CPython may eventually gain constant folding for that kind of > method applied directly to a string literal > 4. I dedent and indent long strings more often than I capitalize, > center, tab expand, or perform various other operations which already > grace the str type as methods. That's a compelling argument. Let's do it. (Assuming the definition of exactly how to indent or dedent is not up for discussion -- if there are good reasons to disagree with textwrap now's the time to bring it up.) -- --Guido van Rossum (python.org/~guido) From joshua.landau.ws at gmail.com Mon Jul 1 03:58:53 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 1 Jul 2013 02:58:53 +0100 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: On 1 July 2013 02:47, Nick Coghlan wrote: > I'm still partial to the idea of offering textwrap.indent() and > textwrap.dedent() as string methods. If that's an option, I'm all for it. From abarnert at yahoo.com Mon Jul 1 04:14:16 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 30 Jun 2013 19:14:16 -0700 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <51D0D6C3.2020502@pearwood.info> References: <51D0D6C3.2020502@pearwood.info> Message-ID: <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> On Jun 30, 2013, at 18:09, Steven D'Aprano wrote: > Hi nbv4, and welcome. > > On 01/07/13 10:32, nbv4 wrote: >> The tripple quote string literal is a great feature, but there is one >> problem. When you use them, it forces you to break out of you're current >> indentation which maks code look ugly. I propose a new way to define a >> triple back quote that woks the same way regular triple quotes work, but >> instead does some simple parsing of the data within the quotes to preserve >> the flow of the code. Due to the brittle and sometimes ambigious nature of >> anything 'automatic', this feature is obviously not meant for data where >> exact white space is needed. It would be great for docstrings, exception >> messages and other type text. >> >> Here is a short example of it's usage: >> https://gist.github.com/priestc/5897602 > > > For something as trivial as the example you give, there is no need to send people off to a website, which they may not have access too. Here's the simplified version: > > # Proposed syntax > def func(): > s = """line 1 > line 2 > line 3""" > t = ---line 1 > line 2 > line 3--- > return s, t > > The difference being, lines 2 and 3 of s will begin with four spaces, while t reduces the whitespace between lines to a single space: > > s == 'line 1\n line 2\n line 3' > t == 'line 1 line 2 line 3' > > I don't think this is particularly useful. I would be more interested in it if it kept the newlines but got rid of the leading spaces: > > t == 'line 1\nline 2\nline 3' > > > but in either case, I think the choice of --- as delimiter is ugly and arbitrary, and very likely is ambiguous (currently, x = ---1 is legal code). The proposal used backticks, not hyphens: print ``` Now my code looks much better. The output of this function is as if you had written it in the same way its written in the code above, except with all newlines replaced with spaces and big swaths of spaces removed. ``` So, both the ambiguity and a lot of the ugliness you complain about aren't in the proposal at all. That being said, I don't really like the proposal, and at least half of the alternatives suggested last time this came up a few months ago were better. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jul 1 04:28:21 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 01 Jul 2013 12:28:21 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> Message-ID: <51D0E945.90609@pearwood.info> On 01/07/13 12:14, Andrew Barnert wrote: > On Jun 30, 2013, at 18:09, Steven D'Aprano wrote: > >> Hi nbv4, and welcome. >> >> On 01/07/13 10:32, nbv4 wrote: >>> Here is a short example of it's usage: >>> https://gist.github.com/priestc/5897602 [...] > The proposal used backticks, not hyphens: Wow, that's weird. They look exactly like hyphens in my browser. -- Steven From haoyi.sg at gmail.com Mon Jul 1 04:35:37 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Mon, 1 Jul 2013 10:35:37 +0800 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <51D0E945.90609@pearwood.info> References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> Message-ID: +1 for just offering .dedent() on strings. On Mon, Jul 1, 2013 at 10:28 AM, Steven D'Aprano wrote: > On 01/07/13 12:14, Andrew Barnert wrote: > >> On Jun 30, 2013, at 18:09, Steven D'Aprano wrote: >> >> Hi nbv4, and welcome. >>> >>> On 01/07/13 10:32, nbv4 wrote: >>> >> > Here is a short example of it's usage: >>>> https://gist.github.com/**priestc/5897602 >>>> >>> [...] > > > The proposal used backticks, not hyphens: >> > > > Wow, that's weird. They look exactly like hyphens in my browser. > > > > > -- > Steven > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbvfour at gmail.com Mon Jul 1 04:52:17 2013 From: nbvfour at gmail.com (nbv4) Date: Sun, 30 Jun 2013 19:52:17 -0700 (PDT) Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: I wanted to make sure the text I was posting was being rendered in fixed width font, hence the gist. On Sunday, June 30, 2013 6:47:29 PM UTC-7, Nick Coghlan wrote: > > On 1 July 2013 11:09, Steven D'Aprano > > wrote: > > but in either case, I think the choice of --- as delimiter is ugly and > > arbitrary, and very likely is ambiguous (currently, x = ---1 is legal > code). > > Similar suggestions to this have been made many times before, you should > > search the archives: > > > > http://mail.python.org/mailman/listinfo/python-ideas > > I'm still partial to the idea of offering textwrap.indent() and > textwrap.dedent() as string methods. > > 1. You could add a ".dedent()" at the end of a triple quoted string > for this kind of problem. For a lot of code, the runtime cost isn't an > issue. > 2. A JIT would definitely be able to avoid recalculating the result every > time > 3. Even CPython may eventually gain constant folding for that kind of > method applied directly to a string literal > 4. I dedent and indent long strings more often than I capitalize, > center, tab expand, or perform various other operations which already > grace the str type as methods. > > Cheers, > Nick. > > -- > Nick Coghlan | ncog... at gmail.com | Brisbane, > Australia > _______________________________________________ > Python-ideas mailing list > Python... at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbvfour at gmail.com Mon Jul 1 04:49:04 2013 From: nbvfour at gmail.com (nbv4) Date: Sun, 30 Jun 2013 19:49:04 -0700 (PDT) Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> Message-ID: <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> "dedent" is a weird word, maybe "unindent" would be better? On Sunday, June 30, 2013 7:35:37 PM UTC-7, Haoyi Li wrote: > > +1 for just offering .dedent() on strings. > > > On Mon, Jul 1, 2013 at 10:28 AM, Steven D'Aprano > > wrote: > >> On 01/07/13 12:14, Andrew Barnert wrote: >> >>> On Jun 30, 2013, at 18:09, Steven D'Aprano > >>> wrote: >>> >>> Hi nbv4, and welcome. >>>> >>>> On 01/07/13 10:32, nbv4 wrote: >>>> >>> >> Here is a short example of it's usage: >>>>> https://gist.github.com/**priestc/5897602 >>>>> >>>> [...] >> >> >> The proposal used backticks, not hyphens: >>> >> >> >> Wow, that's weird. They look exactly like hyphens in my browser. >> >> >> >> >> -- >> Steven >> ______________________________**_________________ >> Python-ideas mailing list >> Python... at python.org >> http://mail.python.org/**mailman/listinfo/python-ideas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Mon Jul 1 05:15:39 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 01 Jul 2013 13:15:39 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal References: Message-ID: <7w38rznf6s.fsf@benfinney.id.au> nbv4 writes: > The tripple quote string literal is a great feature, but there is one > problem. When you use them, it forces you to break out of you're current > indentation which maks code look ugly. We have the ?textwrap.dedent? function in the standard library. Here is a StackOverflow answer that shows how it is used for exactly the situation you describe . > I propose a new way to define a triple back quote that woks the same > way regular triple quotes work, but instead does some simple parsing > of the data within the quotes to preserve the flow of the code. ?1. This problem already has a standard-library solution, it is not common enough to need a change to the language syntax. If anything, a ?dedent? method on strings would be good. But not adding more complexities to syntax for this, please. -- \ ?Following fashion and the status quo is easy. Thinking about | `\ your users' lives and creating something practical is much | _o__) harder.? ?Ryan Singer, 2008-07-09 | Ben Finney From ben+python at benfinney.id.au Mon Jul 1 05:19:57 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 01 Jul 2013 13:19:57 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> Message-ID: <7wy59rm0f6.fsf@benfinney.id.au> nbv4 writes: > "dedent" is a weird word, maybe "unindent" would be better? That ship has sailed, since the name ?dedent? is already established usage in Python (from ?textwrap.deden in the standard libraryt?). A better term, by symmetry with ?indent?, would have been ?outdent? . But that would be needlessly confusing since we already refer to it in Python by the neologism ?dedent?. -- \ ?I love to go down to the schoolyard and watch all the little | `\ children jump up and down and run around yelling and screaming. | _o__) They don't know I'm only using blanks.? ?Emo Philips | Ben Finney From tjreedy at udel.edu Mon Jul 1 07:34:13 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 01 Jul 2013 01:34:13 -0400 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: On 6/30/2013 9:57 PM, Guido van Rossum wrote: > That's a compelling argument. Let's do it. http://bugs.python.org/issue18335 -- Terry Jan Reedy From ron3200 at gmail.com Mon Jul 1 07:56:26 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 01 Jul 2013 00:56:26 -0500 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: On 06/30/2013 08:57 PM, Guido van Rossum wrote: > On Sun, Jun 30, 2013 at 6:47 PM, Nick Coghlan wrote: >> >On 1 July 2013 11:09, Steven D'Aprano wrote: >>> >>but in either case, I think the choice of --- as delimiter is ugly and >>> >>arbitrary, and very likely is ambiguous (currently, x = ---1 is legal code). >>> >>Similar suggestions to this have been made many times before, you should >>> >>search the archives: >>> >> >>> >>http://mail.python.org/mailman/listinfo/python-ideas >> > >> >I'm still partial to the idea of offering textwrap.indent() and >> >textwrap.dedent() as string methods. >> > >> >1. You could add a ".dedent()" at the end of a triple quoted string >> >for this kind of problem. For a lot of code, the runtime cost isn't an >> >issue. >> >2. A JIT would definitely be able to avoid recalculating the result every time >> >3. Even CPython may eventually gain constant folding for that kind of >> >method applied directly to a string literal >> >4. I dedent and indent long strings more often than I capitalize, >> >center, tab expand, or perform various other operations which already >> >grace the str type as methods. > That's a compelling argument. Let's do it. (Assuming the definition of > exactly how to indent or dedent is not up for discussion -- if there > are good reasons to disagree with textwrap now's the time to bring it > up.) It would be an improvement to have them as methods, but I'd actually like to have Str.indent(n) method that takes a value for the leading white space. The value to this method would always be a positive number, and any common leading white space would be replaced by the new indent amount. S.indent(0) would be the same as S.dedent(). s = """\ A multi-line string with 4 leading spaces. """.indent(4) s = """\ A multi-line string with 4 leading spaces. """.indent(4) if cond: s = """\ Another multi-line string with 4 leading spaces. """.indent(4) The reason I prefer this is ... It's more relevant to what I'm going to use the string for and is not just compensating for the block indention level, which has nothing to do with how I'm going to use the string. It explicitly specifies the amount of leading white space I want in the resulting string object. If I want a different indent level, I can just change the value. Or call the indent method again with the new value. I don't need to know what the current leading white space is on the string, just what I want for my output. Strangely, the online docs for textwrap include an indent function that works a bit different, but it is no longer present in textwrap. Looks like an over site to me. Cheers, Ron From markus at unterwaditzer.net Mon Jul 1 07:50:30 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Mon, 1 Jul 2013 07:50:30 +0200 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: <20130701055030.GA934@untibox.unti> I think this could cause problems with multi-line strings that contain additional indentation: def get_yaml(): x = """ foo: Bar user: fname: Hans lname: Gans """.dedent() return x While i don't see many arguments why somebody would want to store configuration files inside a string, i am sure many beginners who try to use this method will be surprised by its behavior. -- Markus On Mon, Jul 01, 2013 at 11:47:29AM +1000, Nick Coghlan wrote: > On 1 July 2013 11:09, Steven D'Aprano wrote: > > but in either case, I think the choice of --- as delimiter is ugly and > > arbitrary, and very likely is ambiguous (currently, x = ---1 is legal code). > > Similar suggestions to this have been made many times before, you should > > search the archives: > > > > http://mail.python.org/mailman/listinfo/python-ideas > > I'm still partial to the idea of offering textwrap.indent() and > textwrap.dedent() as string methods. > > 1. You could add a ".dedent()" at the end of a triple quoted string > for this kind of problem. For a lot of code, the runtime cost isn't an > issue. > 2. A JIT would definitely be able to avoid recalculating the result every time > 3. Even CPython may eventually gain constant folding for that kind of > method applied directly to a string literal > 4. I dedent and indent long strings more often than I capitalize, > center, tab expand, or perform various other operations which already > grace the str type as methods. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From markus at unterwaditzer.net Mon Jul 1 08:23:26 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Mon, 01 Jul 2013 08:23:26 +0200 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> <20130701055030.GA934@untibox.unti> Message-ID: <10ac1148-411e-4a0b-b6cc-f3706fa75b10@email.android.com> Thanks for pointing it out, i have to admit that i didn't read the documentation carefully, and yes, that indeed makes my example invalid, but then there are still strings that have indentation on all lines, which would be completely trunchated with dedent(). -- Markus (from phone) Daniel Robinson wrote: >Have you tried using textwrap.dedent with this string? It gives what I >think is the expected result: > >'\nfoo: Bar\nuser:\n fname: Hans\n lname: Gans\n' > >since it only removes common leading whitespace. > >On Mon, Jul 1, 2013 at 1:50 AM, Markus Unterwaditzer < >markus at unterwaditzer.net> wrote: > >> I think this could cause problems with multi-line strings that >contain >> additional indentation: >> >> def get_yaml(): >> x = """ >> foo: Bar >> user: >> fname: Hans >> lname: Gans >> """.dedent() >> return x >> >> >> While i don't see many arguments why somebody would want to store >> configuration >> files inside a string, i am sure many beginners who try to use this >method >> will >> be surprised by its behavior. >> >> -- Markus >> >> On Mon, Jul 01, 2013 at 11:47:29AM +1000, Nick Coghlan wrote: >> > On 1 July 2013 11:09, Steven D'Aprano wrote: >> > > but in either case, I think the choice of --- as delimiter is >ugly and >> > > arbitrary, and very likely is ambiguous (currently, x = ---1 is >legal >> code). >> > > Similar suggestions to this have been made many times before, you >> should >> > > search the archives: >> > > >> > > http://mail.python.org/mailman/listinfo/python-ideas >> > >> > I'm still partial to the idea of offering textwrap.indent() and >> > textwrap.dedent() as string methods. >> > >> > 1. You could add a ".dedent()" at the end of a triple quoted string >> > for this kind of problem. For a lot of code, the runtime cost isn't >an >> > issue. >> > 2. A JIT would definitely be able to avoid recalculating the result >> every time >> > 3. Even CPython may eventually gain constant folding for that kind >of >> > method applied directly to a string literal >> > 4. I dedent and indent long strings more often than I capitalize, >> > center, tab expand, or perform various other operations which >already >> > grace the str type as methods. >> > >> > Cheers, >> > Nick. >> > >> > -- >> > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > http://mail.python.org/mailman/listinfo/python-ideas >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> From ron3200 at gmail.com Mon Jul 1 08:45:14 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 01 Jul 2013 01:45:14 -0500 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: <51D1257A.5010001@gmail.com> On 07/01/2013 01:18 AM, Daniel Robinson wrote: > Textwrap.indent already exists, but was added in Python 3.3. Maybe you're > using an earlier interpreter but looking at the Python 3.3 docs? Fixed now. :-) Ron From ncoghlan at gmail.com Mon Jul 1 10:31:32 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jul 2013 18:31:32 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <10ac1148-411e-4a0b-b6cc-f3706fa75b10@email.android.com> References: <51D0D6C3.2020502@pearwood.info> <20130701055030.GA934@untibox.unti> <10ac1148-411e-4a0b-b6cc-f3706fa75b10@email.android.com> Message-ID: On 1 July 2013 16:23, Markus Unterwaditzer wrote: > Thanks for pointing it out, i have to admit that i didn't read the documentation carefully, and yes, that indeed makes my example invalid, but then there are still strings that have indentation on all lines, which would be completely trunchated with dedent(). If you call a function or method that has the purpose of stripping common leading whitespace, stripping common leading whitespace is hardly a surprising outcome. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jul 1 11:05:48 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jul 2013 19:05:48 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: On 1 July 2013 11:57, Guido van Rossum wrote: > On Sun, Jun 30, 2013 at 6:47 PM, Nick Coghlan wrote: > > On 1 July 2013 11:09, Steven D'Aprano wrote: > >> but in either case, I think the choice of --- as delimiter is ugly and > >> arbitrary, and very likely is ambiguous (currently, x = ---1 is legal > code). > >> Similar suggestions to this have been made many times before, you should > >> search the archives: > >> > >> http://mail.python.org/mailman/listinfo/python-ideas > > > > I'm still partial to the idea of offering textwrap.indent() and > > textwrap.dedent() as string methods. > > > > 1. You could add a ".dedent()" at the end of a triple quoted string > > for this kind of problem. For a lot of code, the runtime cost isn't an > > issue. > > 2. A JIT would definitely be able to avoid recalculating the result > every time > > 3. Even CPython may eventually gain constant folding for that kind of > > method applied directly to a string literal > > 4. I dedent and indent long strings more often than I capitalize, > > center, tab expand, or perform various other operations which already > > grace the str type as methods. > > That's a compelling argument. Let's do it. (Assuming the definition of > exactly how to indent or dedent is not up for discussion -- if there > are good reasons to disagree with textwrap now's the time to bring it > up.) > The only slight quirk that occurred to me is that if dedent is a method, people will probably want to use them with docstrings, and the compiler currently doesn't allow that. There are then two options for changing the compiler (if we decide we want to allow for "neat" docstrings): 1. Implicitly call dedent on docstrings at compilation time (feasible with dedent as a method). 2. Allow method calls on docstrings without breaking docstring detection It's technically a separate question from the decision on whether or not to add the methods, but I figured it was worth bringing up. Touching the methods of a builtin *and* possibly the compiler behaviour as well is likely enough to nudge the idea into PEP territory. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Mon Jul 1 12:00:05 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 01 Jul 2013 13:00:05 +0300 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: 01.07.13 04:47, Nick Coghlan ???????(??): > 4. I dedent and indent long strings more often than I capitalize, > center, tab expand, or perform various other operations which already > grace the str type as methods. The str type already has many rarely used methods, so let's add a couple more. It doesn't look good argument. From ncoghlan at gmail.com Mon Jul 1 12:16:02 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jul 2013 20:16:02 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: On 1 Jul 2013 20:01, "Serhiy Storchaka" wrote: > > 01.07.13 04:47, Nick Coghlan ???????(??): > >> 4. I dedent and indent long strings more often than I capitalize, >> center, tab expand, or perform various other operations which already >> grace the str type as methods. > > > The str type already has many rarely used methods, so let's add a couple more. It doesn't look good argument. I almost left that point out, as I kinda feel the same way. OTOH, I want indent pretty much every time I write a script that invokes other command line tools, and I'm serious about the possibility of having the compiler implicitly dedent docstrings once the method support exists to make it practical. Cheers, Nick. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jul 1 12:21:17 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 1 Jul 2013 12:21:17 +0200 Subject: [Python-ideas] Idea for new multi-line triple quote literal References: <51D0D6C3.2020502@pearwood.info> Message-ID: <20130701122117.31e8c0cb@pitrou.net> Le Mon, 1 Jul 2013 20:16:02 +1000, Nick Coghlan a ?crit : > On 1 Jul 2013 20:01, "Serhiy Storchaka" > wrote: > > > > 01.07.13 04:47, Nick Coghlan ???????(??): > > > >> 4. I dedent and indent long strings more often than I capitalize, > >> center, tab expand, or perform various other operations which > >> already grace the str type as methods. > > > > > > The str type already has many rarely used methods, so let's add a > > couple > more. It doesn't look good argument. > > I almost left that point out, as I kinda feel the same way. OTOH, I > want indent pretty much every time I write a script that invokes > other command line tools, and I'm serious about the possibility of > having the compiler implicitly dedent docstrings once the method > support exists to make it practical. I think dedent() is quite useful too. OTOH, I don't think there's much point in indent() as a str method (except for consistency). Also, indent() may want more options and knobs than dedent() has, and therefore would be better implemented in pure Python. Regards Antoine. From markus at unterwaditzer.net Mon Jul 1 12:52:32 2013 From: markus at unterwaditzer.net (Markus Unterwaditzer) Date: Mon, 1 Jul 2013 12:52:32 +0200 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <10ac1148-411e-4a0b-b6cc-f3706fa75b10@email.android.com> References: <51D0D6C3.2020502@pearwood.info> <20130701055030.GA934@untibox.unti> <10ac1148-411e-4a0b-b6cc-f3706fa75b10@email.android.com> Message-ID: <20130701105232.GA1246@untibox.unti> I realize how unclear i've been. What i am asking for is to have a way to handle strings like this: def foo(): return """ look at me i am cool i can do indentation\n""" I rather want to write code that looks a lot more like this: def foo(): return """ look at me i am cool i can do indentation """ So the first line would have four spaces at the beginning, the second eight, the third four again. The "\n" character after "indentation" would be the last character in the string. With a new string method i would probably try this: def foo(): """ look at me i am cool i can do indentation """.dedent() But, by this string method behaving like textwrap.dedent, this would also remove the first four spaces of each line, which is not the behavior i wanted. -- Markus On Mon, Jul 01, 2013 at 08:23:26AM +0200, Markus Unterwaditzer wrote: > Thanks for pointing it out, i have to admit that i didn't read the documentation carefully, and yes, that indeed makes my example invalid, but then there are still strings that have indentation on all lines, which would be completely trunchated with dedent(). > > -- Markus (from phone) > > Daniel Robinson wrote: > >Have you tried using textwrap.dedent with this string? It gives what I > >think is the expected result: > > > >'\nfoo: Bar\nuser:\n fname: Hans\n lname: Gans\n' > > > >since it only removes common leading whitespace. > > > >On Mon, Jul 1, 2013 at 1:50 AM, Markus Unterwaditzer < > >markus at unterwaditzer.net> wrote: > > > >> I think this could cause problems with multi-line strings that > >contain > >> additional indentation: > >> > >> def get_yaml(): > >> x = """ > >> foo: Bar > >> user: > >> fname: Hans > >> lname: Gans > >> """.dedent() > >> return x > >> > >> > >> While i don't see many arguments why somebody would want to store > >> configuration > >> files inside a string, i am sure many beginners who try to use this > >method > >> will > >> be surprised by its behavior. > >> > >> -- Markus > >> > >> On Mon, Jul 01, 2013 at 11:47:29AM +1000, Nick Coghlan wrote: > >> > On 1 July 2013 11:09, Steven D'Aprano wrote: > >> > > but in either case, I think the choice of --- as delimiter is > >ugly and > >> > > arbitrary, and very likely is ambiguous (currently, x = ---1 is > >legal > >> code). > >> > > Similar suggestions to this have been made many times before, you > >> should > >> > > search the archives: > >> > > > >> > > http://mail.python.org/mailman/listinfo/python-ideas > >> > > >> > I'm still partial to the idea of offering textwrap.indent() and > >> > textwrap.dedent() as string methods. > >> > > >> > 1. You could add a ".dedent()" at the end of a triple quoted string > >> > for this kind of problem. For a lot of code, the runtime cost isn't > >an > >> > issue. > >> > 2. A JIT would definitely be able to avoid recalculating the result > >> every time > >> > 3. Even CPython may eventually gain constant folding for that kind > >of > >> > method applied directly to a string literal > >> > 4. I dedent and indent long strings more often than I capitalize, > >> > center, tab expand, or perform various other operations which > >already > >> > grace the str type as methods. > >> > > >> > Cheers, > >> > Nick. > >> > > >> > -- > >> > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > >> > _______________________________________________ > >> > Python-ideas mailing list > >> > Python-ideas at python.org > >> > http://mail.python.org/mailman/listinfo/python-ideas > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From spaghettitoastbook at gmail.com Mon Jul 1 07:56:27 2013 From: spaghettitoastbook at gmail.com (SpaghettiToastBook .) Date: Mon, 1 Jul 2013 01:56:27 -0400 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: Maybe "dedent" should be replaced with "outdent", while keeping the old names for compatibility. On Mon, Jul 1, 2013 at 1:34 AM, Terry Reedy wrote: > On 6/30/2013 9:57 PM, Guido van Rossum wrote: >> >> That's a compelling argument. Let's do it. > > http://bugs.python.org/issue18335 > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From gottagetmac at gmail.com Mon Jul 1 08:15:46 2013 From: gottagetmac at gmail.com (Daniel Robinson) Date: Mon, 1 Jul 2013 02:15:46 -0400 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <20130701055030.GA934@untibox.unti> References: <51D0D6C3.2020502@pearwood.info> <20130701055030.GA934@untibox.unti> Message-ID: Have you tried using textwrap.dedent with this string? It gives what I think is the expected result: '\nfoo: Bar\nuser:\n fname: Hans\n lname: Gans\n' since it only removes common leading whitespace. On Mon, Jul 1, 2013 at 1:50 AM, Markus Unterwaditzer < markus at unterwaditzer.net> wrote: > I think this could cause problems with multi-line strings that contain > additional indentation: > > def get_yaml(): > x = """ > foo: Bar > user: > fname: Hans > lname: Gans > """.dedent() > return x > > > While i don't see many arguments why somebody would want to store > configuration > files inside a string, i am sure many beginners who try to use this method > will > be surprised by its behavior. > > -- Markus > > On Mon, Jul 01, 2013 at 11:47:29AM +1000, Nick Coghlan wrote: > > On 1 July 2013 11:09, Steven D'Aprano wrote: > > > but in either case, I think the choice of --- as delimiter is ugly and > > > arbitrary, and very likely is ambiguous (currently, x = ---1 is legal > code). > > > Similar suggestions to this have been made many times before, you > should > > > search the archives: > > > > > > http://mail.python.org/mailman/listinfo/python-ideas > > > > I'm still partial to the idea of offering textwrap.indent() and > > textwrap.dedent() as string methods. > > > > 1. You could add a ".dedent()" at the end of a triple quoted string > > for this kind of problem. For a lot of code, the runtime cost isn't an > > issue. > > 2. A JIT would definitely be able to avoid recalculating the result > every time > > 3. Even CPython may eventually gain constant folding for that kind of > > method applied directly to a string literal > > 4. I dedent and indent long strings more often than I capitalize, > > center, tab expand, or perform various other operations which already > > grace the str type as methods. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gottagetmac at gmail.com Mon Jul 1 08:18:07 2013 From: gottagetmac at gmail.com (Daniel Robinson) Date: Mon, 1 Jul 2013 02:18:07 -0400 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: Textwrap.indent already exists, but was added in Python 3.3. Maybe you're using an earlier interpreter but looking at the Python 3.3 docs? On Mon, Jul 1, 2013 at 1:56 AM, Ron Adam wrote: > > > On 06/30/2013 08:57 PM, Guido van Rossum wrote: > >> On Sun, Jun 30, 2013 at 6:47 PM, Nick Coghlan wrote: >> >>> >On 1 July 2013 11:09, Steven D'Aprano wrote: >>> >>>> >>but in either case, I think the choice of --- as delimiter is ugly and >>>> >>arbitrary, and very likely is ambiguous (currently, x = ---1 is legal >>>> code). >>>> >>Similar suggestions to this have been made many times before, you >>>> should >>>> >>search the archives: >>>> >> >>>> >>http://mail.python.org/**mailman/listinfo/python-ideas >>>> >>> > >>> >I'm still partial to the idea of offering textwrap.indent() and >>> >textwrap.dedent() as string methods. >>> > >>> >1. You could add a ".dedent()" at the end of a triple quoted string >>> >for this kind of problem. For a lot of code, the runtime cost isn't an >>> >issue. >>> >2. A JIT would definitely be able to avoid recalculating the result >>> every time >>> >3. Even CPython may eventually gain constant folding for that kind of >>> >method applied directly to a string literal >>> >4. I dedent and indent long strings more often than I capitalize, >>> >center, tab expand, or perform various other operations which already >>> >grace the str type as methods. >>> >> That's a compelling argument. Let's do it. (Assuming the definition of >> exactly how to indent or dedent is not up for discussion -- if there >> are good reasons to disagree with textwrap now's the time to bring it >> up.) >> > > It would be an improvement to have them as methods, but I'd actually like > to have Str.indent(n) method that takes a value for the leading white space. > > The value to this method would always be a positive number, and any common > leading white space would be replaced by the new indent amount. > > S.indent(0) would be the same as S.dedent(). > > s = """\ > A multi-line string > with 4 leading spaces. > """.indent(4) > > > s = """\ > A multi-line string > with 4 leading spaces. > """.indent(4) > > > if cond: > s = """\ > Another multi-line string > with 4 leading spaces. > """.indent(4) > > > > The reason I prefer this is ... > > It's more relevant to what I'm going to use the string for and is not just > compensating for the block indention level, which has nothing to do with > how I'm going to use the string. > > It explicitly specifies the amount of leading white space I want in the > resulting string object. If I want a different indent level, I can just > change the value. Or call the indent method again with the new value. > > I don't need to know what the current leading white space is on the > string, just what I want for my output. > > > > Strangely, the online docs for textwrap include an indent function that > works a bit different, but it is no longer present in textwrap. Looks like > an over site to me. > > Cheers, > Ron > > > > > > > > > > > > > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spaghettitoastbook at gmail.com Mon Jul 1 09:13:33 2013 From: spaghettitoastbook at gmail.com (SpaghettiToastBook .) Date: Mon, 1 Jul 2013 03:13:33 -0400 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <7wy59rm0f6.fsf@benfinney.id.au> References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> <7wy59rm0f6.fsf@benfinney.id.au> Message-ID: Maybe "dedent" should be replaced with "outdent", while keeping the old names for compatibility. On Sun, Jun 30, 2013 at 11:19 PM, Ben Finney wrote: > nbv4 writes: > >> "dedent" is a weird word, maybe "unindent" would be better? > > That ship has sailed, since the name ?dedent? is already established > usage in Python (from ?textwrap.deden in the standard libraryt?). > > A better term, by symmetry with ?indent?, would have been ?outdent? > . But that would be > needlessly confusing since we already refer to it in Python by the > neologism ?dedent?. > > -- > \ ?I love to go down to the schoolyard and watch all the little | > `\ children jump up and down and run around yelling and screaming. | > _o__) They don't know I'm only using blanks.? ?Emo Philips | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From spaghettitoastbook at gmail.com Mon Jul 1 09:26:42 2013 From: spaghettitoastbook at gmail.com (SpaghettiToastBook .) Date: Mon, 1 Jul 2013 03:26:42 -0400 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> <7wy59rm0f6.fsf@benfinney.id.au> Message-ID: Maybe "dedent" should be replaced with "outdent", while keeping the old names for compatibility. SpaghettiToastBook On Mon, Jul 1, 2013 at 3:13 AM, SpaghettiToastBook . wrote: > Maybe "dedent" should be replaced with "outdent", while keeping the > old names for compatibility. > > On Sun, Jun 30, 2013 at 11:19 PM, Ben Finney wrote: >> nbv4 writes: >> >>> "dedent" is a weird word, maybe "unindent" would be better? >> >> That ship has sailed, since the name ?dedent? is already established >> usage in Python (from ?textwrap.deden in the standard libraryt?). >> >> A better term, by symmetry with ?indent?, would have been ?outdent? >> . But that would be >> needlessly confusing since we already refer to it in Python by the >> neologism ?dedent?. >> >> -- >> \ ?I love to go down to the schoolyard and watch all the little | >> `\ children jump up and down and run around yelling and screaming. | >> _o__) They don't know I'm only using blanks.? ?Emo Philips | >> Ben Finney >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas From oscar.j.benjamin at gmail.com Mon Jul 1 13:57:00 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 1 Jul 2013 12:57:00 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: On 30 June 2013 00:51, Nick Coghlan wrote: > [x for x in iterable; break if x is None] > [x for x in data if x; break if x is None] > > One nice advantage of that notation is that: > > 1. The statement after the ";" is exactly the statement that would > appear in the expanded loop > 2. It can be combined unambiguously with a filtering clause > 3. It clearly disallows its use with nested loops in the comprehension It has the significant disadvantage that Steven pointed out which is that it doesn't read very well. The most important aspect of a comprehension is its comprehensibility. Consider getting the prime numbers less than 100: primes100 = {p for p in primes(); break if p >= 100} You need to invert the if condition to understand which primes are in the resulting set. With for/while it reads properly and the condition at the right expresses a true property of the elements in the resulting set: primes100 = {p for p in primes() while p < 100} At the moment the obvious way to get the prime numbers less than 100 would be to do something like: from math import ceil, sqrt def isprime(N): return N > 1 and all(N % n for n in range(2, ceil(sqrt(N)))) primes100 = [p for p in range(1, 100) if isprime(p)] However this is a suboptimal algorithm. At the point when we want to determine if the number N is prime we have already found all the primes less than N. We only need to check modulo division against those but this construction doesn't give us an easy way to do that. It's better to have a primes() generator that can keep track of this information: from itertools import count def primes(): primes_seen = [] for n in count(2): if all(n % p for p in primes_seen): yield n primes_seen.append(n) This algorithm is actually even poorer as it doesn't stop at sqrt(n). We can fix that with takewhile: from itertools import count, takewhile def primes(): primes_seen = [] for n in count(2): if all(n % p for p in takewhile(lambda p: p**2 < n, primes_seen)): yield n primes_seen.append(n) primes100 = {p for p in takewhile(lambda p: p < 100, primes()} Using for/while this becomes significantly clearer (in my opinion): from itertools import count def primes(): primes_seen = [] for n in count(2): if all(n % p for p in primes_seen while p**2 <= n): yield n primes_seen.append(n) primes100 = {p for p in primes() while p < 100} The main objection to for/while seems to be that it doesn't unroll in the same way as current comprehensions. I think that for/while is just as useful for an ordinary for loop as it is for a comprehension. In C you can easily add anything to the termination condition for a loop e.g.: for (i=0; i 100: break print('%sth prime is %s' % (n, p)) or perhaps for n, p in enumerate(takewhile(lambda p: p < 100, primes())): print('%sth prime is %s' % (n, p)) or even worse for n, p in enumerate(takewhile((100).__gt__, primes())): print('%sth prime is %s' % (n, p)) I think that it would be better if this could be spelled as for n, p in enumerate(primes()) while p < 100: print('%sth prime is %s' % (n, p)) If that were the case then a for/while comprehension could unroll into a for/while loop just as with current comprehensions: result = [x for y in stuff while z] becomes: result = [] for y in stuff while z: result.append(x) Oscar From gottagetmac at gmail.com Mon Jul 1 14:39:13 2013 From: gottagetmac at gmail.com (Daniel Robinson) Date: Mon, 1 Jul 2013 08:39:13 -0400 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: There is another problem with running dedent on docstrings, I believe: a PEP 257 compliant docstring with a summary line won't dedent at all, since the first line lacks indentation. If you wanted to automatically clean docstrings, I think you would want to use the trim(docstring) function from PEP 257, rather than dedent. But I'm guessing there was a reason this has not been done before. On Mon, Jul 1, 2013 at 5:05 AM, Nick Coghlan wrote: > On 1 July 2013 11:57, Guido van Rossum wrote: > >> On Sun, Jun 30, 2013 at 6:47 PM, Nick Coghlan wrote: >> > On 1 July 2013 11:09, Steven D'Aprano wrote: >> >> but in either case, I think the choice of --- as delimiter is ugly and >> >> arbitrary, and very likely is ambiguous (currently, x = ---1 is legal >> code). >> >> Similar suggestions to this have been made many times before, you >> should >> >> search the archives: >> >> >> >> http://mail.python.org/mailman/listinfo/python-ideas >> > >> > I'm still partial to the idea of offering textwrap.indent() and >> > textwrap.dedent() as string methods. >> > >> > 1. You could add a ".dedent()" at the end of a triple quoted string >> > for this kind of problem. For a lot of code, the runtime cost isn't an >> > issue. >> > 2. A JIT would definitely be able to avoid recalculating the result >> every time >> > 3. Even CPython may eventually gain constant folding for that kind of >> > method applied directly to a string literal >> > 4. I dedent and indent long strings more often than I capitalize, >> > center, tab expand, or perform various other operations which already >> > grace the str type as methods. >> >> That's a compelling argument. Let's do it. (Assuming the definition of >> exactly how to indent or dedent is not up for discussion -- if there >> are good reasons to disagree with textwrap now's the time to bring it >> up.) >> > > The only slight quirk that occurred to me is that if dedent is a method, > people will probably want to use them with docstrings, and the compiler > currently doesn't allow that. > > There are then two options for changing the compiler (if we decide we want > to allow for "neat" docstrings): > > 1. Implicitly call dedent on docstrings at compilation time (feasible with > dedent as a method). > 2. Allow method calls on docstrings without breaking docstring detection > > It's technically a separate question from the decision on whether or not > to add the methods, but I figured it was worth bringing up. Touching the > methods of a builtin *and* possibly the compiler behaviour as well is > likely enough to nudge the idea into PEP territory. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jul 1 14:44:33 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jul 2013 22:44:33 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: On 1 Jul 2013 22:39, "Daniel Robinson" wrote: > > There is another problem with running dedent on docstrings, I believe: a PEP 257 compliant docstring with a summary line won't dedent at all, since the first line lacks indentation. > > If you wanted to automatically clean docstrings, I think you would want to use the trim(docstring) function from PEP 257, rather than dedent. But I'm guessing there was a reason this has not been done before. I did think of that, and considered the fact existing docstrings would generally be left alone to be a feature rather than a bug. Regardless, this is all idle speculation unless/until someone comes up with a draft patch to add at least dedent, and perhaps indent, as string methods. It shouldn't be too tricky, but it's still C code. Cheers, Nick. > > On Mon, Jul 1, 2013 at 5:05 AM, Nick Coghlan wrote: >> >> On 1 July 2013 11:57, Guido van Rossum wrote: >>> >>> On Sun, Jun 30, 2013 at 6:47 PM, Nick Coghlan wrote: >>> > On 1 July 2013 11:09, Steven D'Aprano wrote: >>> >> but in either case, I think the choice of --- as delimiter is ugly and >>> >> arbitrary, and very likely is ambiguous (currently, x = ---1 is legal code). >>> >> Similar suggestions to this have been made many times before, you should >>> >> search the archives: >>> >> >>> >> http://mail.python.org/mailman/listinfo/python-ideas >>> > >>> > I'm still partial to the idea of offering textwrap.indent() and >>> > textwrap.dedent() as string methods. >>> > >>> > 1. You could add a ".dedent()" at the end of a triple quoted string >>> > for this kind of problem. For a lot of code, the runtime cost isn't an >>> > issue. >>> > 2. A JIT would definitely be able to avoid recalculating the result every time >>> > 3. Even CPython may eventually gain constant folding for that kind of >>> > method applied directly to a string literal >>> > 4. I dedent and indent long strings more often than I capitalize, >>> > center, tab expand, or perform various other operations which already >>> > grace the str type as methods. >>> >>> That's a compelling argument. Let's do it. (Assuming the definition of >>> exactly how to indent or dedent is not up for discussion -- if there >>> are good reasons to disagree with textwrap now's the time to bring it >>> up.) >> >> >> The only slight quirk that occurred to me is that if dedent is a method, people will probably want to use them with docstrings, and the compiler currently doesn't allow that. >> >> There are then two options for changing the compiler (if we decide we want to allow for "neat" docstrings): >> >> 1. Implicitly call dedent on docstrings at compilation time (feasible with dedent as a method). >> 2. Allow method calls on docstrings without breaking docstring detection >> >> It's technically a separate question from the decision on whether or not to add the methods, but I figured it was worth bringing up. Touching the methods of a builtin *and* possibly the compiler behaviour as well is likely enough to nudge the idea into PEP territory. >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jul 1 15:11:39 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 01 Jul 2013 23:11:39 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: <51D1800B.4050209@pearwood.info> On 01/07/13 22:44, Nick Coghlan wrote: > On 1 Jul 2013 22:39, "Daniel Robinson" wrote: >> >> There is another problem with running dedent on docstrings, I believe: a > PEP 257 compliant docstring with a summary line won't dedent at all, since > the first line lacks indentation. >> >> If you wanted to automatically clean docstrings, I think you would want > to use the trim(docstring) function from PEP 257, rather than dedent. But > I'm guessing there was a reason this has not been done before. > > I did think of that, and considered the fact existing docstrings would > generally be left alone to be a feature rather than a bug. +1 There's little reason to manually call dedent on a docstring, since pydoc will reformat it for display: py> def factory(): ... def inner(): ... """Doc string. ... ... Note the indentation. ... """ ... return inner ... py> help(factory()) Help on function inner in module __main__: inner() Doc string. Note the indentation. -- Steven From steve at pearwood.info Mon Jul 1 15:12:59 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 01 Jul 2013 23:12:59 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> Message-ID: <51D1805B.20309@pearwood.info> On 01/07/13 12:49, nbv4 wrote: > "dedent" is a weird word, maybe "unindent" would be better? The de- prefix is a standard English prefix meaning removal, negation or reversal: http://dictionary.reference.com/browse/de- Neologism or not, I think that dedent is sufficiently understandable and widespread that there's no need to deprecate it in favour of "outdent". It's being used in the F# and Ruby communities, as well as Python: http://en.wiktionary.org/wiki/dedent https://github.com/cespare/ruby-dedent -- Steven From jimjhb at aol.com Mon Jul 1 15:44:46 2013 From: jimjhb at aol.com (jimjhb at aol.com) Date: Mon, 1 Jul 2013 09:44:46 -0400 (EDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <1372574633.43291.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: <8D044784872EF37-1864-3E09F@webmail-m103.sysops.aol.com> Nick wrote: >PEP 315 has now been >explicitly rejected: the official syntax for terminating a loop early >is the existing break statement, thus any proposal for terminating a (Nick means terminating a 'for' loop; there is a place to shove a conditional in the 'while' loop syntax.) I think the (informational) PEP should reflect that this is contrary to early notions of structured programming (MISRA-1998) but is in accord with updated notions (MISRA-2004, MISRA-2012). I understand and appreciate that many python programmers think this may be silly, but structured programming had a big impact on the computer science committee and some acknowledgement of this is, in my view, warranted. Plus, by citing the later MISRA-2004, it leaves the Python implementation in the free and clear. >I think the idea of early termination of comprehensions has a *much* >better chance of getting Guido's interest if it helps make the >behaviour of else clauses on loops more comprehensible without needing It looks like this has happened. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Mon Jul 1 15:58:33 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 01 Jul 2013 08:58:33 -0500 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <20130701105232.GA1246@untibox.unti> References: <51D0D6C3.2020502@pearwood.info> <20130701055030.GA934@untibox.unti> <10ac1148-411e-4a0b-b6cc-f3706fa75b10@email.android.com> <20130701105232.GA1246@untibox.unti> Message-ID: On 07/01/2013 05:52 AM, Markus Unterwaditzer wrote: > I rather want to write code that looks a lot more like this: > > def foo(): > return """ > look at me > i am cool > i can do indentation > """ > > So the first line would have four spaces at the beginning, the second eight, > the third four again. The "\n" character after "indentation" would be the > last character in the string. Currently to get what you want you need to use both dedent and indent. >>> print(indent(dedent(""" ... look at me ... I am cool ... I can do indentation ... """), prefix=" ")) look at me I am cool I can do indentation Since indent is already in textwrap and works differently, I think 'margin' would be a good method name to set the left margin. def foo(): return """\ look at me i am cool i can do dedent. """.margin(4) I think this fits a much more common usage pattern, and we won't need to add two methods to do what you want here. A dedent method could take a 'margin' argument.. s.dedent(margin=4) But I prefer just calling it margin. Cheers, Ron From flying-sheep at web.de Mon Jul 1 18:22:25 2013 From: flying-sheep at web.de (Philipp A.) Date: Mon, 1 Jul 2013 18:22:25 +0200 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: 2013/7/1 Guido van Rossum guido at python.org That?s a compelling argument. Let?s do it. (Assuming the definition of exactly how to indent or dedent is not up for discussion ? if there are good reasons to disagree with textwrap now's the time to bring it up.) I don?t know if it?s a good reason, but I?m of the opinion that the required backslash at the beginning of to-be-dedented string is strange: We try to eliminate escaped newlines elsewhere (e.g. bracing import statements, conditions and tuple values is preferred to escaping newlines) I think dedent (or however it?s going to be called) should remove common whitespace and, if the first line is completely empty, that first line as well. Also for your consideration would be scala?s way: """|spam |eggs""".stripMargin() == "spam\neggs" """#spam #eggs""".stripMargin('#') == "spam\neggs" i.e. removal of all leading whitespace up to and including a margin character/prefix. It could be a kwarg to ?dedent?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbvfour at gmail.com Mon Jul 1 19:08:34 2013 From: nbvfour at gmail.com (nbv4) Date: Mon, 1 Jul 2013 10:08:34 -0700 (PDT) Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: <04cbad9f-5d8c-4274-8a56-b9cc7665042d@googlegroups.com> The only problem I see with explicitly passing in the number of characters to the dedent function is that you couple your code with th source of that code. What happens when you copy+paste that function to a class where the indention level does not match. You then will have to change that number, or else your code will break. Also, if you run your code through pylent or something and it changes your indenting from tabs to spaces. On Sunday, June 30, 2013 10:56:26 PM UTC-7, Ron Adam wrote: > > > > On 06/30/2013 08:57 PM, Guido van Rossum wrote: > > On Sun, Jun 30, 2013 at 6:47 PM, Nick Coghlan> > wrote: > >> >On 1 July 2013 11:09, Steven D'Aprano> > wrote: > >>> >>but in either case, I think the choice of --- as delimiter is ugly > and > >>> >>arbitrary, and very likely is ambiguous (currently, x = ---1 is > legal code). > >>> >>Similar suggestions to this have been made many times before, you > should > >>> >>search the archives: > >>> >> > >>> >>http://mail.python.org/mailman/listinfo/python-ideas > >> > > >> >I'm still partial to the idea of offering textwrap.indent() and > >> >textwrap.dedent() as string methods. > >> > > >> >1. You could add a ".dedent()" at the end of a triple quoted string > >> >for this kind of problem. For a lot of code, the runtime cost isn't an > >> >issue. > >> >2. A JIT would definitely be able to avoid recalculating the result > every time > >> >3. Even CPython may eventually gain constant folding for that kind of > >> >method applied directly to a string literal > >> >4. I dedent and indent long strings more often than I capitalize, > >> >center, tab expand, or perform various other operations which already > >> >grace the str type as methods. > > That's a compelling argument. Let's do it. (Assuming the definition of > > exactly how to indent or dedent is not up for discussion -- if there > > are good reasons to disagree with textwrap now's the time to bring it > > up.) > > It would be an improvement to have them as methods, but I'd actually like > to have Str.indent(n) method that takes a value for the leading white > space. > > The value to this method would always be a positive number, and any common > leading white space would be replaced by the new indent amount. > > S.indent(0) would be the same as S.dedent(). > > s = """\ > A multi-line string > with 4 leading spaces. > """.indent(4) > > > s = """\ > A multi-line string > with 4 leading spaces. > """.indent(4) > > > if cond: > s = """\ > Another multi-line string > with 4 leading spaces. > """.indent(4) > > > > The reason I prefer this is ... > > It's more relevant to what I'm going to use the string for and is not just > compensating for the block indention level, which has nothing to do with > how I'm going to use the string. > > It explicitly specifies the amount of leading white space I want in the > resulting string object. If I want a different indent level, I can just > change the value. Or call the indent method again with the new value. > > I don't need to know what the current leading white space is on the > string, > just what I want for my output. > > > > Strangely, the online docs for textwrap include an indent function that > works a bit different, but it is no longer present in textwrap. Looks like > an over site to me. > > Cheers, > Ron > > > > > > > > > > > > > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Jul 1 19:18:00 2013 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 01 Jul 2013 18:18:00 +0100 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: <51D1B9C8.7040500@mrabarnett.plus.com> On 01/07/2013 17:22, Philipp A. wrote: > 2013/7/1 Guido van Rossum guido at python.org > > That?s a compelling argument. Let?s do it. (Assuming the definition of > exactly how to indent or dedent is not up for discussion ? if there > are good reasons to disagree with textwrap now's the time to bring it > up.) > > I don?t know if it?s a good reason, but I?m of the opinion that the > required backslash at the beginning of to-be-dedented string is strange: > > We try to eliminate escaped newlines elsewhere (e.g. bracing import > statements, conditions and tuple values is preferred to escaping newlines) > > I think dedent (or however it?s going to be called) should remove common > whitespace and, if the first line is completely empty, that first line > as well. > > Also for your consideration would be scala?s way: > > """|spam > |eggs""".stripMargin() =="spam\neggs" > > """#spam > #eggs""".stripMargin('#') =="spam\neggs" > > i.e. removal of all leading whitespace up to and including a margin > character/prefix. It could be a kwarg to ?dedent?? > Or strip the common indent, and the initial empty line if present, and optionally add an indent: """ spam eggs """.reindent() == "spam\neggs\n" """ spam eggs """.reindent(4) == " spam\n eggs\n" From abarnert at yahoo.com Mon Jul 1 19:17:29 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jul 2013 10:17:29 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <31C2E72F-1BBA-4C84-BB25-35E49D756F6E@yahoo.com> On Jul 1, 2013, at 4:57, Oscar Benjamin wrote: > This algorithm is actually even poorer as it doesn't stop at sqrt(n). > We can fix that with takewhile: > > from itertools import count, takewhile > > def primes(): > primes_seen = [] > for n in count(2): > if all(n % p for p in takewhile(lambda p: p**2 < n, primes_seen)): > yield n > primes_seen.append(n) > > primes100 = {p for p in takewhile(lambda p: p < 100, primes()} > > Using for/while this becomes significantly clearer (in my opinion): > > from itertools import count > > def primes(): > primes_seen = [] > for n in count(2): > if all(n % p for p in primes_seen while p**2 <= n): > yield n > primes_seen.append(n) > > primes100 = {p for p in primes() while p < 100} There are already ways to improve the readability of that line: candidates = takewhile(lambda p: p**2 < n, primes_seen) if all(n % p for p in candidates): Or, better: def candidate(p): return p**2 < n if all(n % p for p in takewhile(candidate, primes_seen)): Yes, this means splitting the line in two just so you can avoid lambda, and it's exactly parallel to the case for if clauses vs. filter. I think the benefit to all of these solutions is pretty firmly established by this point. But, as Nick pointed out earlier, there's probably a reason that filter is a builtin and takewhile is not. It's not _as much_ benefit. And meanwhile, the cost is higher because there's no familiar, pre-existing syntax to borrow. I'll grant that it's entirely possible that the problem is just that I'm much more familiar with while statements than with for...while statements for historical reason, and after a bit of exposure the problem will go away (like ternary statements, which everyone gets pretty quickly, as opposed to for...else, which many stay confused by). But still, we can't expect that python programmers will ever be as familiar with for...while as they are with if (which works the same as in almost every other language, is one of the first constructs every novice is taught, etc.). From joshua.landau.ws at gmail.com Mon Jul 1 19:25:12 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 1 Jul 2013 18:25:12 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: On 1 July 2013 12:57, Oscar Benjamin wrote: > On 30 June 2013 00:51, Nick Coghlan wrote: >> [x for x in iterable; break if x is None] >> [x for x in data if x; break if x is None] >> >> One nice advantage of that notation is that: >> >> 1. The statement after the ";" is exactly the statement that would >> appear in the expanded loop >> 2. It can be combined unambiguously with a filtering clause >> 3. It clearly disallows its use with nested loops in the comprehension > > It has the significant disadvantage that Steven pointed out which is > that it doesn't read very well. The most important aspect of a > comprehension is its comprehensibility. Consider getting the prime > numbers less than 100: > > primes100 = {p for p in primes(); break if p >= 100} > > You need to invert the if condition to understand which primes are in > the resulting set. With for/while it reads properly and the condition > at the right expresses a true property of the elements in the > resulting set: > > primes100 = {p for p in primes() while p < 100} If you're telling me that "{p for p in primes() while p < 100}" reads better than "{p for p in primes(); break if p >= 100}" I have to disagree strongly. The "break if" form looks beautiful. I know that this involves my not-suggestion, so I might be biased, but that's what I think. > At the moment the obvious way to get the prime numbers less than 100 > would be to do something like: > So you have these: {p for p in takewhile(lambda p: p < 100, primes())} set(takewhile(lambda p: p < 100, primes()) {p for p in primes() while p < 100} {p for p in primes(); break if p >= 100} I like them most bottom-to-top, but I don't think new syntax is a cost worth having. If it is, I'm only accepting of the "break if" form, which adds the least grammar. But not that either, because srsly; a semicolon [youtu.be/M94ii6MVilw, contains profanity] in a comprehension? From ron3200 at gmail.com Mon Jul 1 20:12:30 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 01 Jul 2013 13:12:30 -0500 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <04cbad9f-5d8c-4274-8a56-b9cc7665042d@googlegroups.com> References: <51D0D6C3.2020502@pearwood.info> <04cbad9f-5d8c-4274-8a56-b9cc7665042d@googlegroups.com> Message-ID: On 07/01/2013 12:08 PM, nbv4 wrote: > The only problem I see with explicitly passing in the number of characters > to the dedent function is that you couple your code with th source of that > code. What happens when you copy+paste that function to a class where the > indention level does not match. You then will have to change that number, > or else your code will break. Also, if you run your code through pylent or > something and it changes your indenting from tabs to spaces. The number isn't how much to dedent. But how much white space each non-blank line should have at the beginning. s.margin(0) # remove common leading space like dedent(). s.margin(4) # specifies it to have 4 spaces. No matter how much (or little) it's indented in your code, it will still be exactly and explicitly what you specify in the string... s1 = """ This string is indented 4 spaces in the source code, but will have a margin of 24 spaces when printed.""".margin(24) s2 = """ This string is indented 12 spaces in the source code, but will have a margin of 4 spaces when it's printed.""".margin(4) Use .margin(0), and it removes all the common leading white space just like dedent. (Makes dedent redundant) Lets say you want to use a string in different places. For example on the screen you might want it to be 4 spaces, and on a printed out log you would like it to be 8 spaces. And in your source code it is indented 12 spaces. s3 = \ """ This is an output string I'm going to use in several places with different margins. """ print(s3.margin(4)) # to the screen with a margin of 4. print(s3.margin(8), file=logfile) # with a margin of 8. Notice that the indention level of the initial string doesn't matter. Cheers, Ron What you are thinking of is a relative indent/dedent method. That would be completely different. You would have to specify how much to remove or add, and each time you use it on the string it would remove or add that amount again. So no, that isn't what I'm suggesting. Cheers, Ron From nbvfour at gmail.com Mon Jul 1 21:34:13 2013 From: nbvfour at gmail.com (chris priest) Date: Mon, 1 Jul 2013 12:34:13 -0700 (PDT) Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> <04cbad9f-5d8c-4274-8a56-b9cc7665042d@googlegroups.com> Message-ID: If you want specific indentation, then use the regular triple quote and write it exactly how you want it. On the other hand, the 99% use case (in my experience at least) you don't care about eh indentation. All you care about is the words. This is the usecase for triple backtick. The most common example is exceptions. When I raise exceptions, I like to include long, verbose error messages. Right now I have three options: raise TypeError(Live with a really long string going past the 80 char limit, thus breaking PEP8 and being annoying") or raise TypeError(""" Use a triple string But then live with extra newlines and indentation. The suggestion to have a dedent method makes things a little better But imo still not ideal. """.strip().dedent().replace("/n', ' ')) raise TypeError("Another Option is to " "do it like this, but if you do it this way " "you run the risk of forgetting a trailing" "space at the end of each line and other errors." "Not to mention it's annoying when you want to " "add words because you have to rewrap and its " "generally really annoying to have to deal with") raise TypeError(``` The best solution is a new type of string literal that just deals with the text so I don't have to worry about a thing. This is only meant for text where I don't need explicit indenting and/or whitespace requirements. ```) On Monday, July 1, 2013 11:12:30 AM UTC-7, Ron Adam wrote: > > > > On 07/01/2013 12:08 PM, nbv4 wrote: > > The only problem I see with explicitly passing in the number of > characters > > to the dedent function is that you couple your code with th source of > that > > code. What happens when you copy+paste that function to a class where > the > > indention level does not match. You then will have to change that > number, > > or else your code will break. Also, if you run your code through pylent > or > > something and it changes your indenting from tabs to spaces. > > The number isn't how much to dedent. But how much white space each > non-blank line should have at the beginning. > > s.margin(0) # remove common leading space like dedent(). > > s.margin(4) # specifies it to have 4 spaces. > > No matter how much (or little) it's indented in your code, it will still > be > exactly and explicitly what you specify in the string... > > > s1 = """ > This string is indented > 4 spaces in the source code, > but will have a margin of > 24 spaces when printed.""".margin(24) > > > s2 = """ > This string is indented > 12 spaces in the source code, > but will have a margin of > 4 spaces when it's printed.""".margin(4) > > > Use .margin(0), and it removes all the common leading white space just > like > dedent. (Makes dedent redundant) > > > Lets say you want to use a string in different places. For example on the > screen you might want it to be 4 spaces, and on a printed out log you > would > like it to be 8 spaces. And in your source code it is indented 12 spaces. > > s3 = \ > """ > This is an output string I'm > going to use in several places with different margins. > """ > > print(s3.margin(4)) # to the screen with a margin of 4. > print(s3.margin(8), file=logfile) # with a margin of 8. > > Notice that the indention level of the initial string doesn't matter. > > Cheers, > Ron > > > > > > > > > > > > > What you are thinking of is a relative indent/dedent method. That would > be > completely different. You would have to specify how much to remove or > add, > and each time you use it on the string it would remove or add that amount > again. So no, that isn't what I'm suggesting. > > Cheers, > Ron > > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jul 1 22:29:51 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 02 Jul 2013 06:29:51 +1000 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <51D1E6BF.3010506@pearwood.info> On 02/07/13 03:25, Joshua Landau wrote: > If you're telling me that "{p for p in primes() while p < 100}" reads > better than "{p for p in primes(); break if p >= 100}" I have to > disagree strongly. The "break if" form looks beautiful. Beautiful it may be, but you have something which explicitly looks like *two statements* which actually represents *one expression*. When it comes to mimicry and camouflage, I think that it is wonderful and astonishing when insects look like sticks, plants look like insects, and harmless snakes look like deadly ones. But when it comes to Python code, I think that mimicry is a terrible idea. I am truly not looking forward to fielding questions from confused programmers who extrapolate from the above to multiple statements in a comprehension: {p for p in primes(); print(p); break if p >= 100} and other variations: {x for x in values(); raise ValueError if not condition(x)} And I'm even *less* looking forward to the debates on python-ideas from people who will want to introduce such syntax :-) "Clever" syntax is rarely a good idea. Good syntax should allow the reader to correctly extrapolate to other examples, not mislead them into making errors. -- Steven From mertz at gnosis.cx Mon Jul 1 22:29:09 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 1 Jul 2013 13:29:09 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: On Mon, Jul 1, 2013 at 10:25 AM, Joshua Landau wrote: > > primes100 = {p for p in primes(); break if p >= 100} > > primes100 = {p for p in primes() while p < 100} > > If you're telling me that "{p for p in primes() while p < 100}" reads > better than "{p for p in primes(); break if p >= 100}" I have to > disagree strongly. The "break if" form looks beautiful. > My own taste is in strong contrast to Joshua's. I think the semi-colon/break-if looks absolutely horrendous and ugly. The 'while' clause looks obvious, beautiful, and intuitive. However, I see the point made by a number of people that the 'while' clause has no straightforward translation into an unrolled loop, and is probably ruled out on that basis. That said, I think the version using Guido's suggestion (and already supported for generator comprehensions) looks fairly nice. I.e. given a support function: def stopif(x): if x: raise StopIteration return True We can express it as: primes100 = set(p for p in primes() if stopif(p >= 100)) Except I'm not sure I like the spelling of 'stopif()', since the repeating the 'if' in the function name reads oddly. I guess I might like this spelling better: primes100 = set(p for p in primes() if still(p < 100)) # Obvious implementation of still() The only change we need is the one Guido has declared as "do it" which is to make other comprehensions act the same as the instantiation with the generator. I.e. this should work: primes100 = {p for p in primes() if still(p < 100)} It doesn't *really* matter whether someone likes my spelling 'still()' or Guido's 'stopif()' better, since either one is trivially implementable by end users if they wish to do so. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Tue Jul 2 00:44:22 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 1 Jul 2013 23:44:22 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: On 1 July 2013 21:29, David Mertz wrote: > However, I see the point made by a number of people that the 'while' clause > has no straightforward translation into an unrolled loop, and is probably > ruled out on that basis. My thought (in keeping with the title of the thread) is that the comprehension data = [x for y in stuff while z] would unroll as the loop for y in stuff while z: data.append(x) which would also be valid syntax and have the obvious meaning. This is similar to Nick's suggestion that 'break if' be usable in the body of the loop so that data = [x for y in stuff; break if not z] would unroll as for y in stuff: break if not z data.append(y) Having a while clause on for loops is not just good because it saves a couple of lines but because it clearly separates the flow control from the body of the loop (another reason I dislike 'break if'). In other words I find the flow of the loop for p in primes() while p < 100: print(p) easier to understand (immediately) than for p in primes(): if p >= 100: break print(p) These are just trivially small examples. As the body of the loop grows in complexity the readability benefit of moving 'if not z: break' into the top line becomes more significant. You can get the same separation of concerns using takewhile at the expense of a different kind of readability for p in takewhile(lambda p: p < 100, primes()): print(p) However there is another problem with using takewhile in for loops which is that it discards an item from the iterable. Imagine parsing a file such as: csvfile = '''# data.csv # This file begins with an unspecified number of header lines. # Each header line begins with '#'. # I want to keep these lines but need to parse the separately. # The first non-comment line contains the column headers x y z 1 2 3 4 5 6 7 8 9'''.splitlines() You can do csvfile = iter(csvfile) headers = [] for line in csvfile: if not line.startswith('#'): break headers.append(line[1:].strip()) fieldnames = line.split() for line in csvfile: yield {name: int(val) for name, val in zip(fieldnames, line.split())} However if you use takewhile like for line in takewhile(lambda line: line.startswith('#'), csvfile): headers.append(line[1:].split()) then after the loop 'line' holds the last comment line. The discarded column header line is gone and cannot be recovered; takewhile is normally only used when the entire remainder of the iterator is to be discarded. I would propose that for line in csvfile while line.startwith('#'): headers.append(line) would result in 'line' referencing the item that failed the while predicate. Oscar From abarnert at yahoo.com Tue Jul 2 00:49:08 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jul 2013 15:49:08 -0700 (PDT) Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <1372718948.17525.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: David Mertz Sent: Monday, July 1, 2013 1:29 PM >On Mon, Jul 1, 2013 at 10:25 AM, Joshua Landau wrote: > >>> primes100 = {p for p in primes(); break if p >= 100} >>> primes100 = {p for p in primes() while p < 100} >> >>If you're telling me that "{p for p in primes() while p < 100}" reads >>better than "{p for p in primes(); break if p >= 100}" I have to >>disagree strongly. The "break if" form looks beautiful. > >My own taste is in strong contrast to Joshua's.? I think the semi-colon/break-if looks absolutely horrendous and ugly.? The 'while' clause looks obvious, beautiful, and intuitive. It isn't really that the semicolon is inherently ugly, but that it doesn't fit in with today's comprehension syntax. And the problem we're running into here is that, given today's comprehension syntax, there simply _is_ no clear way to extend it any further. Consider that if you try to translate a non-trivial comprehension into English, it's going to have at least commas in it, and likely semicolons. Also, consider that you really never want to nest arbitrary clauses; you just want to nest loops, and attach modifiers to them. So? Haskell-style comprehensions, as we have today, can only be taken so far. But I don't think that's a problem. See http://stupidpythonideas.blogspot.com/2013/07/syntactic-takewhile.html for more details and ramblings. From python at mrabarnett.plus.com Tue Jul 2 01:34:32 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 02 Jul 2013 00:34:32 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <51D21208.8050702@mrabarnett.plus.com> On 01/07/2013 23:44, Oscar Benjamin wrote: > On 1 July 2013 21:29, David Mertz wrote: >> However, I see the point made by a number of people that the 'while' clause >> has no straightforward translation into an unrolled loop, and is probably >> ruled out on that basis. > > My thought (in keeping with the title of the thread) is that the comprehension > > data = [x for y in stuff while z] > > would unroll as the loop > > for y in stuff while z: > data.append(x) > > which would also be valid syntax and have the obvious meaning. This is > similar to Nick's suggestion that 'break if' be usable in the body of > the loop so that > > data = [x for y in stuff; break if not z] > > would unroll as > > for y in stuff: > break if not z > data.append(y) > > Having a while clause on for loops is not just good because it saves a > couple of lines but because it clearly separates the flow control from > the body of the loop (another reason I dislike 'break if'). In other > words I find the flow of the loop > > for p in primes() while p < 100: > print(p) > > easier to understand (immediately) than > > for p in primes(): > if p >= 100: > break > print(p) > > These are just trivially small examples. As the body of the loop grows > in complexity the readability benefit of moving 'if not z: break' into > the top line becomes more significant. > > You can get the same separation of concerns using takewhile at the > expense of a different kind of readability > > for p in takewhile(lambda p: p < 100, primes()): > print(p) > > However there is another problem with using takewhile in for loops > which is that it discards an item from the iterable. Imagine parsing a > file such as: > > csvfile = '''# data.csv > # This file begins with an unspecified number of header lines. > # Each header line begins with '#'. > # I want to keep these lines but need to parse the separately. > # The first non-comment line contains the column headers > x y z > 1 2 3 > 4 5 6 > 7 8 9'''.splitlines() > > You can do > > csvfile = iter(csvfile) > headers = [] > for line in csvfile: > if not line.startswith('#'): > break > headers.append(line[1:].strip()) > fieldnames = line.split() > for line in csvfile: > yield {name: int(val) for name, val in zip(fieldnames, line.split())} > > However if you use takewhile like > > for line in takewhile(lambda line: line.startswith('#'), csvfile): > headers.append(line[1:].split()) > > then after the loop 'line' holds the last comment line. The discarded > column header line is gone and cannot be recovered; takewhile is > normally only used when the entire remainder of the iterator is to be > discarded. > > I would propose that > > for line in csvfile while line.startwith('#'): > headers.append(line) > > would result in 'line' referencing the item that failed the while predicate. > So: for item in generator while is_true(item): ... is equivalent to: for item in generator: if not is_true(item): break ... By similar reasoning(?): for item in generator if is_true(item): ... is equivalent to: for item in generator: if not is_true(item): continue ... If we have one, shouldn't we also have the other? If only comprehensions have the 'if' form (IIRC, it has already been rejected for multi-line 'for' loops), then shouldn't only comprehensions have the 'while' form? From zuo at chopin.edu.pl Tue Jul 2 04:03:50 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Tue, 02 Jul 2013 04:03:50 +0200 Subject: [Python-ideas] =?utf-8?q?Is_this_PEP-able=3F_for_X_in_ListY_while?= =?utf-8?q?_conditionZ=3A?= In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <29ce8c7ede4f8e92738ceea54713f001@chopin.edu.pl> 2013-07-02 00:44, Oscar Benjamin wrote: [...] > Having a while clause on for loops is not just good because it saves > a > couple of lines but because it clearly separates the flow control > from > the body of the loop (another reason I dislike 'break if'). In other > words I find the flow of the loop > > for p in primes() while p < 100: > print(p) > > easier to understand (immediately) than > > for p in primes(): > if p >= 100: > break > print(p) +1 Cheers. *j From steve at pearwood.info Tue Jul 2 04:04:30 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Jul 2013 12:04:30 +1000 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> Message-ID: <20130702020429.GA18929@ando> On Mon, Jul 01, 2013 at 06:22:25PM +0200, Philipp A. wrote: > 2013/7/1 Guido van Rossum guido at python.org > > That?s a compelling argument. Let?s do it. (Assuming the definition of > exactly how to indent or dedent is not up for discussion ? if there > are good reasons to disagree with textwrap now's the time to bring it > up.) > > I don?t know if it?s a good reason, but I?m of the opinion that the > required backslash at the beginning of to-be-dedented string is strange: It's not *required*. It's optional. You can live with an extra blank line: s = """ first line second line ... """ or you can manually indent the first line: s = """ first line second line ... """ or you can slice the string before calling dedent, or, in my opinion the neatest and best solution, just escape the opening newline with a backslash: s = """\ first line second line ... """ But in any case, I don't like the idea of making the proposed dedent method be a DWIM "format strings the way I want them to be formatted" method. It's called "dedent", not "dedent and strip leading newlines". I'm okay in principle with dedent taking additional arguments to customize the behaviour, such as amount of whitespace to keep or a margin character, so long as the default with no args matches textwrap.dedent. -- Steven From gottagetmac at gmail.com Tue Jul 2 04:28:24 2013 From: gottagetmac at gmail.com (Daniel Robinson) Date: Mon, 1 Jul 2013 22:28:24 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <29ce8c7ede4f8e92738ceea54713f001@chopin.edu.pl> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <29ce8c7ede4f8e92738ceea54713f001@chopin.edu.pl> Message-ID: While this looks attractive to me, and it's definitely better to change statement and comprehension syntax at the same time, this makes the comprehension ambiguous to human parsing. [f(x) for x in list if x > 10] basically can be read as for x in list: if x > 10: f(x) This kind of interpretation becomes critical if you nest more than two levels. But [f(x) for x in list while x < 10] could read either as for x in list while x < 10: f(x) which is how you want it to be read, or (more in line with earlier list comp habits): for x in list: while x < 10: f(x) which would be totally wrong. I don't think this is a very serious problem (certainly not for the interpreter), but it's a stumbling block. On Mon, Jul 1, 2013 at 10:03 PM, Jan Kaliszewski wrote: > 2013-07-02 00:44, Oscar Benjamin wrote: > [...] > > Having a while clause on for loops is not just good because it saves a >> couple of lines but because it clearly separates the flow control from >> the body of the loop (another reason I dislike 'break if'). In other >> words I find the flow of the loop >> >> for p in primes() while p < 100: >> print(p) >> >> easier to understand (immediately) than >> >> for p in primes(): >> if p >= 100: >> break >> print(p) >> > > +1 > > Cheers. > *j > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spaghettitoastbook at gmail.com Tue Jul 2 04:53:55 2013 From: spaghettitoastbook at gmail.com (SpaghettiToastBook .) Date: Mon, 1 Jul 2013 22:53:55 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <29ce8c7ede4f8e92738ceea54713f001@chopin.edu.pl> Message-ID: What if the while clause went after the rest of the comprehension, preceded by a comma? [f(x) for x in list, while x < 10] ? SpaghettiToastBook On Mon, Jul 1, 2013 at 10:28 PM, Daniel Robinson wrote: > While this looks attractive to me, and it's definitely better to change > statement and comprehension syntax at the same time, this makes the > comprehension ambiguous to human parsing. > > [f(x) for x in list if x > 10] basically can be read as > > for x in list: > if x > 10: > f(x) > > This kind of interpretation becomes critical if you nest more than two > levels. But [f(x) for x in list while x < 10] could read either as > > for x in list while x < 10: > f(x) > > which is how you want it to be read, or (more in line with earlier list comp > habits): > > for x in list: > while x < 10: > f(x) > > which would be totally wrong. > > I don't think this is a very serious problem (certainly not for the > interpreter), but it's a stumbling block. > > On Mon, Jul 1, 2013 at 10:03 PM, Jan Kaliszewski wrote: >> >> 2013-07-02 00:44, Oscar Benjamin wrote: >> [...] >> >>> Having a while clause on for loops is not just good because it saves a >>> couple of lines but because it clearly separates the flow control from >>> the body of the loop (another reason I dislike 'break if'). In other >>> words I find the flow of the loop >>> >>> for p in primes() while p < 100: >>> print(p) >>> >>> easier to understand (immediately) than >>> >>> for p in primes(): >>> if p >= 100: >>> break >>> print(p) >> >> >> +1 >> >> Cheers. >> *j >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From gottagetmac at gmail.com Tue Jul 2 05:27:51 2013 From: gottagetmac at gmail.com (Daniel Robinson) Date: Mon, 1 Jul 2013 23:27:51 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <29ce8c7ede4f8e92738ceea54713f001@chopin.edu.pl> Message-ID: In addition to defying the ordinary order of the syntax, that would make this statement totally ambiguous: [f(x), f(y) for x in list1 for y in list2, while x < y] And I definitely don't think commas (other than the f(x), f(y) usage shown above) or semicolons are good ideas. On Mon, Jul 1, 2013 at 10:53 PM, SpaghettiToastBook . < spaghettitoastbook at gmail.com> wrote: > What if the while clause went after the rest of the comprehension, > preceded by a comma? > > [f(x) for x in list, while x < 10] > > ? SpaghettiToastBook > > > On Mon, Jul 1, 2013 at 10:28 PM, Daniel Robinson > wrote: > > While this looks attractive to me, and it's definitely better to change > > statement and comprehension syntax at the same time, this makes the > > comprehension ambiguous to human parsing. > > > > [f(x) for x in list if x > 10] basically can be read as > > > > for x in list: > > if x > 10: > > f(x) > > > > This kind of interpretation becomes critical if you nest more than two > > levels. But [f(x) for x in list while x < 10] could read either as > > > > for x in list while x < 10: > > f(x) > > > > which is how you want it to be read, or (more in line with earlier list > comp > > habits): > > > > for x in list: > > while x < 10: > > f(x) > > > > which would be totally wrong. > > > > I don't think this is a very serious problem (certainly not for the > > interpreter), but it's a stumbling block. > > > > On Mon, Jul 1, 2013 at 10:03 PM, Jan Kaliszewski > wrote: > >> > >> 2013-07-02 00:44, Oscar Benjamin wrote: > >> [...] > >> > >>> Having a while clause on for loops is not just good because it saves a > >>> couple of lines but because it clearly separates the flow control from > >>> the body of the loop (another reason I dislike 'break if'). In other > >>> words I find the flow of the loop > >>> > >>> for p in primes() while p < 100: > >>> print(p) > >>> > >>> easier to understand (immediately) than > >>> > >>> for p in primes(): > >>> if p >= 100: > >>> break > >>> print(p) > >> > >> > >> +1 > >> > >> Cheers. > >> *j > >> > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Tue Jul 2 08:03:38 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 02 Jul 2013 01:03:38 -0500 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <29ce8c7ede4f8e92738ceea54713f001@chopin.edu.pl> Message-ID: On 07/01/2013 10:27 PM, Daniel Robinson wrote: > In addition to defying the ordinary order of the syntax, that would make > this statement totally ambiguous: > > [f(x), f(y) for x in list1 for y in list2, while x < y] > > And I definitely don't think commas (other than the f(x), f(y) usage shown > above) or semicolons are good ideas. I agree. There is also the problem of how to spell it with both a filter test, and the end test at the same time. You can write it as one or the other with a stop() function, but they don't work independently. [x for x in values if x<50 or stop()] # will work in 3.4 And of course this works now. [x for x in values if x%2==0] To do both is tricky and took me a while to come up with... [x for x in values if stop(x>50) or x%2==0] Where stop(x>50) raises StopIteration if the test is True or False otherwise. Which make the 'or expression' needed even if it's just a 'or True'. It should work, but seems awkward to me. Cheers, Ron From robert.kern at gmail.com Tue Jul 2 12:32:22 2013 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 02 Jul 2013 11:32:22 +0100 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <20130702020429.GA18929@ando> References: <51D0D6C3.2020502@pearwood.info> <20130702020429.GA18929@ando> Message-ID: On 2013-07-02 03:04, Steven D'Aprano wrote: > On Mon, Jul 01, 2013 at 06:22:25PM +0200, Philipp A. wrote: >> 2013/7/1 Guido van Rossum guido at python.org >> >> That?s a compelling argument. Let?s do it. (Assuming the definition of >> exactly how to indent or dedent is not up for discussion ? if there >> are good reasons to disagree with textwrap now's the time to bring it >> up.) >> >> I don?t know if it?s a good reason, but I?m of the opinion that the >> required backslash at the beginning of to-be-dedented string is strange: > > It's not *required*. It's optional. You can live with an extra blank > line: > > s = """ > first line > second line ... """ > > or you can manually indent the first line: > > s = """ first line > second line ... """ > > or you can slice the string before calling dedent, or, in my > opinion the neatest and best solution, just escape the opening > newline with a backslash: > > s = """\ > first line > second line ... """ > > But in any case, I don't like the idea of making the proposed dedent > method be a DWIM "format strings the way I want them to be formatted" > method. It's called "dedent", not "dedent and strip leading newlines". > > I'm okay in principle with dedent taking additional arguments to > customize the behaviour, such as amount of whitespace to keep or a > margin character, so long as the default with no args matches > textwrap.dedent. How about an option to ignore the first line? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ronaldoussoren at mac.com Tue Jul 2 14:08:29 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 2 Jul 2013 14:08:29 +0200 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> Message-ID: <04628C39-5B95-4809-A75C-C5557650DE91@mac.com> On 29 Jun, 2013, at 12:09, Nick Coghlan wrote: ... > > > Rather than adding a new keyword, we could simply expand the syntax > for the existing break statement to be this: > > break [if ] That saves types two characters as compared with current status-quo ;-) > > This would simplify the above two standard idioms to the following: > > while True: > # Iteration setup > break if termination_condition > # Remainder of iteration > > for x in data: > break if desired_value(x) > else: > raise ValueError("Value not found in {:100!r}".format(data)) > > A "bare" break would then be equivalent to "break if True". The "else" > clause on the loop could then be *explicitly* documented as associated > with the "break if " form - the else only executes if the break > clause is never true. (That also becomes the justification for only > allowing this for break, and not for continue or return: those have no > corresponding "else" clause) > > Once the break statement has been redefined this way, it *then* > becomes reasonable to allow the following in comprehensions: > > data = [x for x in iterable break if x is None] > > As with other proposals, I would suggest limiting this truncating form > to disallow combination with the filtering and nested loop forms (at > least initially). The dual use of "if" would make the filtering > combination quite hard to read, and the nested loop form would be > quite ambiguous as to which loop was being broken. If we start with > the syntax restricted, we can relax those restrictions later if we > find them too limiting, while if we start off being permissive, > backwards compatibility would prevent us from adding restrictions > later. Jikes. I don't like restricting this to loops without filtering and without nesting at all, that makes the language more complex. Has anyone tried to inventory how often this new syntax would be appropriate (with and without the restriction on filtering and nested loops)? > > I'd be very keen to see this written up as a PEP - it's the first > proposal that I feel actually *simplifies* the language in any way > (mostly by doing something about those perplexing-to-many else clauses > on for and while loops). That's very optimistic, I don't think this helps with the for:else: confusion at all. The else part of a for loop already has a straightforward explanation, and that doesn't seem to help at all. An if-component on a 'break' statement has a pretty loose connection to the corresponding 'else', especially when the 'if' part is optional. Ronald From oscar.j.benjamin at gmail.com Tue Jul 2 14:28:53 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 2 Jul 2013 13:28:53 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51D21208.8050702@mrabarnett.plus.com> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <51D21208.8050702@mrabarnett.plus.com> Message-ID: On 2 July 2013 00:34, MRAB wrote: > > So: > > for item in generator while is_true(item): > ... > > is equivalent to: > > for item in generator: > if not is_true(item): > break > ... > > By similar reasoning(?): > > for item in generator if is_true(item): > ... > > is equivalent to: > > for item in generator: > if not is_true(item): > continue > ... > > If we have one, shouldn't we also have the other? > > If only comprehensions have the 'if' form (IIRC, it has already been > rejected for multi-line 'for' loops), then shouldn't only > comprehensions have the 'while' form? is allowed in comprehensions and loops. I'm only suggesting that for/while wouldn't unroll the same way that for/if does. To me it seems natural that "binds" more tightly to than does. The clause in a comprehension is about per item logic so it belongs in the body of the loop. The proposed clause would affect the flow of the loop globally so it does not. Also I wouldn't propose that for/while is equivalent to for/if/break in the case that there is an else clause. This is one area where takewhile is better than for/if/break since you can do for item in takewhile(lambda: keep_going, iterable): if acceptable(item): break else: raise Error('No acceptable items') instead of for x in iterable: if not keep_going: raise Error('No acceptable items') elif acceptable(item): break else: raise Error('No acceptable items') I would want to be able to write for item in iterable while keep_going: if acceptable(item): break else: raise Error('No acceptable items') and know that either an error was raised or item is bound to an acceptable object. Oscar From ron3200 at gmail.com Tue Jul 2 15:27:57 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 02 Jul 2013 08:27:57 -0500 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> <20130702020429.GA18929@ando> Message-ID: On 07/02/2013 05:32 AM, Robert Kern wrote: >> s = """\ >> first line >> second line ... """ >> >> But in any case, I don't like the idea of making the proposed dedent >> method be a DWIM "format strings the way I want them to be formatted" >> method. It's called "dedent", not "dedent and strip leading newlines". >> >> I'm okay in principle with dedent taking additional arguments to >> customize the behaviour, such as amount of whitespace to keep or a >> margin character, so long as the default with no args matches >> textwrap.dedent. > > How about an option to ignore the first line? A method that only adjusts leading space should only do that. I wrote a function to split a text string into paragraphs, and I found that it made sense for that to strip leading space and trailing white space. The reason is, it is not clear if the leading or trailing white space is an empty paragraph or not. If you treat those as empty paragraphs at the beginning and end, then you end up with too much white space when you join them back together after re-flowing the paragraphs. It seems the wrap and fill functions in textwrap have some issues too. There is an issue on the bug tracker, but the exact nature of the problems and weather or not anyone is depending on the current behavior. The textwrap module, doesn't remove extra white space inside the string, but only does a strip, removing leading and trailing white space. So leading white space on lines gets folded into the reflowed text. That can't be what was originally intended. from textwrap import * s = """ This is a multi-line string that has no meaningful formatting, to see what fill and wrap do to it. """ print('"""' + fill(s) + '"""') """ This is a multi-line string that has no meaningful formatting, to see what fill and wrap do to it.""" print for x in wrap(s, 40): print('"""'+x+'"""') """ This is a multi-line string that has no meaningful formatting, to see what fill and wrap do to it.""" """ This is a multi-line string""" """that has no""" """meaningful formatting, to see what""" """fill and wrap do to it.""" Cheers, Ron From ron3200 at gmail.com Tue Jul 2 15:38:04 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 02 Jul 2013 08:38:04 -0500 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> <20130702020429.GA18929@ando> Message-ID: Ooops,, please pardon the few too many editing glitches due to not yet being fully awake. Getting that second cup of coffee, Ron From greg.ewing at canterbury.ac.nz Wed Jul 3 00:35:13 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 03 Jul 2013 10:35:13 +1200 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <51D355A1.1050708@canterbury.ac.nz> Oscar Benjamin wrote: > def primes(): > primes_seen = [] > for n in count(2): > if all(n % p for p in primes_seen): > yield n > primes_seen.append(n) > > This algorithm is actually even poorer as it doesn't stop at sqrt(n). Nor should it! When you're only dividing by primes, you can't stop at sqrt(n), you have to divide by *all* the primes less than n. Otherise you could miss a prime factor greater than sqrt(n) whose cofactor is not prime. (Not relevant to the original disussion, I know, but my inner mathematician couldn't restrain himself.) -- Greg From rob.cliffe at btinternet.com Wed Jul 3 01:17:39 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Wed, 03 Jul 2013 00:17:39 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51D355A1.1050708@canterbury.ac.nz> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <51D355A1.1050708@canterbury.ac.nz> Message-ID: <51D35F93.7030209@btinternet.com> On 02/07/2013 23:35, Greg Ewing wrote: > Oscar Benjamin wrote: >> def primes(): >> primes_seen = [] >> for n in count(2): >> if all(n % p for p in primes_seen): >> yield n >> primes_seen.append(n) >> >> This algorithm is actually even poorer as it doesn't stop at sqrt(n). > > Nor should it! When you're only dividing by primes, you > can't stop at sqrt(n), you have to divide by *all* the > primes less than n. Otherise you could miss a prime > factor greater than sqrt(n) whose cofactor is not prime. > > (Not relevant to the original disussion, I know, but > my inner mathematician couldn't restrain himself.) > But you could at least stop at n/4. Rob Cliffe From joshua.landau.ws at gmail.com Wed Jul 3 01:20:54 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 3 Jul 2013 00:20:54 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51D355A1.1050708@canterbury.ac.nz> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <51D355A1.1050708@canterbury.ac.nz> Message-ID: On 2 July 2013 23:35, Greg Ewing wrote: > Oscar Benjamin wrote: >> >> def primes(): >> primes_seen = [] >> for n in count(2): >> if all(n % p for p in primes_seen): >> yield n >> primes_seen.append(n) >> >> This algorithm is actually even poorer as it doesn't stop at sqrt(n). > > > Nor should it! When you're only dividing by primes, you > can't stop at sqrt(n), you have to divide by *all* the > primes less than n. Otherise you could miss a prime > factor greater than sqrt(n) whose cofactor is not prime. I'm not convinced. Say you have 7 * 6. That is 7 * 3 * 2, so if 7 has a cofactor of 6 then 2 is a factor. The proof can be generalised. From ckaynor at zindagigames.com Wed Jul 3 01:22:28 2013 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Tue, 2 Jul 2013 16:22:28 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51D355A1.1050708@canterbury.ac.nz> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <51D355A1.1050708@canterbury.ac.nz> Message-ID: On Tue, Jul 2, 2013 at 3:35 PM, Greg Ewing wrote: > Oscar Benjamin wrote: > >> def primes(): >> primes_seen = [] >> for n in count(2): >> if all(n % p for p in primes_seen): >> yield n >> primes_seen.append(n) >> >> This algorithm is actually even poorer as it doesn't stop at sqrt(n). >> > > Nor should it! When you're only dividing by primes, you > can't stop at sqrt(n), you have to divide by *all* the > primes less than n. Otherise you could miss a prime > factor greater than sqrt(n) whose cofactor is not prime. > > (Not relevant to the original disussion, I know, but > my inner mathematician couldn't restrain himself.) That would imply that the cofactor has no prime factors (or else you would have hit them), which would mean it must be prime itself (and therefore you hit the cofactor earilier). > > > -- > Greg > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gottagetmac at gmail.com Wed Jul 3 01:06:42 2013 From: gottagetmac at gmail.com (Daniel Robinson) Date: Tue, 2 Jul 2013 19:06:42 -0400 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51D355A1.1050708@canterbury.ac.nz> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <51D355A1.1050708@canterbury.ac.nz> Message-ID: That cofactor would have to have a prime factor less than sqrt(n). On Tuesday, July 2, 2013, Greg Ewing wrote: > Oscar Benjamin wrote: > >> def primes(): >> primes_seen = [] >> for n in count(2): >> if all(n % p for p in primes_seen): >> yield n >> primes_seen.append(n) >> >> This algorithm is actually even poorer as it doesn't stop at sqrt(n). >> > > Nor should it! When you're only dividing by primes, you > can't stop at sqrt(n), you have to divide by *all* the > primes less than n. Otherise you could miss a prime > factor greater than sqrt(n) whose cofactor is not prime. > > (Not relevant to the original disussion, I know, but > my inner mathematician couldn't restrain himself.) > > -- > Greg > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Mon Jul 1 17:21:24 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 1 Jul 2013 08:21:24 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> Message-ID: <0C35164B-A9A1-4A78-8474-99166337BE50@umbrellacode.com> Having a bit of a hard time following the status of this as I dropped out for a while, but I?m not really for the semi-colon separator approach. Of all the options already available in list comprehensions, this one actually seems to be one of the most easily because it starts with a keyword and ends at the end. Not that syntax highlighting should be taken into account in general, it?s worth noting the syntax highlighted version really makes this distinction quite clear: One thing I could see doing would be to allow semi-colons anywhere in a list comprehension that?s a boundary between statements of the expanded form. Then they behave just like the optional semi-colons you could put at the end of a line. Sorry if that?s precisely what?s been promoted. . On Jul 1, 2013, at 4:57 AM, Oscar Benjamin wrote: > On 30 June 2013 00:51, Nick Coghlan wrote: >> [x for x in iterable; break if x is None] >> [x for x in data if x; break if x is None] >> >> One nice advantage of that notation is that: >> >> 1. The statement after the ";" is exactly the statement that would >> appear in the expanded loop >> 2. It can be combined unambiguously with a filtering clause >> 3. It clearly disallows its use with nested loops in the comprehension > > It has the significant disadvantage that Steven pointed out which is > that it doesn't read very well. The most important aspect of a > comprehension is its comprehensibility. Consider getting the prime > numbers less than 100: > > primes100 = {p for p in primes(); break if p >= 100} > > You need to invert the if condition to understand which primes are in > the resulting set. With for/while it reads properly and the condition > at the right expresses a true property of the elements in the > resulting set: > > primes100 = {p for p in primes() while p < 100} > > At the moment the obvious way to get the prime numbers less than 100 > would be to do something like: > > from math import ceil, sqrt > > def isprime(N): > return N > 1 and all(N % n for n in range(2, ceil(sqrt(N)))) > > primes100 = [p for p in range(1, 100) if isprime(p)] > > However this is a suboptimal algorithm. At the point when we want to > determine if the number N is prime we have already found all the > primes less than N. We only need to check modulo division against > those but this construction doesn't give us an easy way to do that. > > It's better to have a primes() generator that can keep track of this > information: > > from itertools import count > > def primes(): > primes_seen = [] > for n in count(2): > if all(n % p for p in primes_seen): > yield n > primes_seen.append(n) > > This algorithm is actually even poorer as it doesn't stop at sqrt(n). > We can fix that with takewhile: > > from itertools import count, takewhile > > def primes(): > primes_seen = [] > for n in count(2): > if all(n % p for p in takewhile(lambda p: p**2 < n, primes_seen)): > yield n > primes_seen.append(n) > > primes100 = {p for p in takewhile(lambda p: p < 100, primes()} > > Using for/while this becomes significantly clearer (in my opinion): > > from itertools import count > > def primes(): > primes_seen = [] > for n in count(2): > if all(n % p for p in primes_seen while p**2 <= n): > yield n > primes_seen.append(n) > > primes100 = {p for p in primes() while p < 100} > > The main objection to for/while seems to be that it doesn't unroll in > the same way as current comprehensions. I think that for/while is just > as useful for an ordinary for loop as it is for a comprehension. In C > you can easily add anything to the termination condition for a loop > e.g.: > > for (i=0; i p[i] = next_prime(); > > But in Python having multiple termination conditions (or having any > termination with an inifinite iterator) means using a break or > takewhile/lambda. > > for n, p in enumerate(primes()): > if p > 100: > break > print('%sth prime is %s' % (n, p)) > > or perhaps > > for n, p in enumerate(takewhile(lambda p: p < 100, primes())): > print('%sth prime is %s' % (n, p)) > > or even worse > > for n, p in enumerate(takewhile((100).__gt__, primes())): > print('%sth prime is %s' % (n, p)) > > I think that it would be better if this could be spelled as > > for n, p in enumerate(primes()) while p < 100: > print('%sth prime is %s' % (n, p)) > > If that were the case then a for/while comprehension could unroll into > a for/while loop just as with current comprehensions: > > result = [x for y in stuff while z] > > becomes: > > result = [] > for y in stuff while z: > result.append(x) > > > Oscar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.tiff Type: image/tiff Size: 41742 bytes Desc: not available URL: From sergemp at mail.ru Tue Jul 2 20:12:09 2013 From: sergemp at mail.ru (Sergey) Date: Tue, 2 Jul 2013 21:12:09 +0300 Subject: [Python-ideas] Fast sum() for non-numbers Message-ID: <20130702211209.6dbde663@sergey> Hello, python-ideas. Trying to cover everything in one shot, so the message is long, sorry. sum() is a great function. It is the "obvious way" to add things. Unfortunately sometimes it's slower than it could be. The problem is that code: sum([[1,2,3]]*1000000, []) takes forever to complete. Let's fix that! This problem was there since the sum() introduction, but it is almost unknown among python-beginners. When people look at sum(seq,start=0) signature they most probably expect it to be like this: def sum(seq, start = 0): for item in seq: start += item return start But it is not, because in cases like: empty = [] result = sum(list_of_lists, empty) such implementation would modify content of "empty". This use-case is known, it is checked in test_builtin.py, and explicitly mentioned in comments inside sum() code. So actually sum() looks like this: def sum(seq, start = 0): for item in seq: start = start + item return start it creates a copy of the partial result on every "start + item". What I suggest is instead of making a copy for every item make just one copy and reuse it as many times as needed. For example: def sum(seq, start = 0): start = start + seq[0] for item in seq[1:]: start += item return start Patch implementing this idea attached to issue18305. Patch is simple, it just rearranges existing code. It should work for python 2.7, 3.3 and hg-tip. Except sum() becoming faster there should be no behavior change. Can this implementation break anything? Advantages: * Performance boost is like 200000% [1], nothing else becomes slower Disadvantages (inspired by Terry J. Reedy): * Other pythons (pypy, jython, older cpython versions) do not have such optimisation, people will move code that depends on the internal optimization to pythons that do not have it. And they will complain about their program 'freezing'. * It discourages people from carefully thinking about whether they actually need a concrete list or merely the iterator for a virtual list. Alternatives to sum() for this use-case: * list comprehension is 270% slower than patched sum [2] * itertools.chain is 50%-100% slower than patched sum [3] Main questions: * Whether to accept the patch * If yes, whether it should go to the new version or to a bugfix release Alternatives to this patch: * Reject the idea as a whole. Intentionally keep the sum() slow. * Accept the idea, but reject this particular implementation, instead use something different, e.g. implement special case optimization or use different approach (I thought about start=copy.copy(start)) In my opinion (looking though python changelog) performance patches were accepted for bugfix releases many times before. So as long as this patch does not break anything, it may go even to python 2.7. I think bad performance is a bug that should be fixed. Rejecting performance fixes just because older versions don't have it would mean no performance improvements to anything, ever. Sum is a great basic function. It keeps the simple code simple. This patch puts it on par with more advanced features like itertools, and allows people to use sum for simple things and itertools for complex memory-efficient algorithms. Questions that may arise if the patch is accepted: * sum() was rejecting strings because of this bug. If the bug gets fixed should another patch allow sum() to accept strings? * maybe in some distant future drop the second parameter (or make it None by default) and allow calling sum for everything, making sum() "the one obvious way" to sum up things? It would be nice if sum "just worked" for everything (e.g. sum() of empty sequence would return None, i.e. if there's nothing to sum then nothing is returned). But I think it needs more work for that, because even with this patch sum() is still ~20 times slower than "".join() [4] That's all. Any suggestions welcome. -- [1] Python 2.7.5 Before patch: $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" 10 loops, best of 3: 885 msec per loop After patch: $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" 1000 loops, best of 3: 524 usec per loop [2] Python 2.7.5 with patch: $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "[i for l in x for i in l]" 1000 loops, best of 3: 1.94 msec per loop [3] Python 2.7.5 with patch: $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools import chain" "list(chain.from_iterable(x))" 1000 loops, best of 3: 821 usec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools import chain" "list(chain(*x))" 1000 loops, best of 3: 1.03 msec per loop [4] Python 2.7.5 with patch and string check removed: $ ./python -mtimeit --setup="x=['a']*10000" "sum(x,'')" 100 loops, best of 3: 3.98 msec per loop $ ./python -mtimeit --setup="x=['a']*10000" "''.join(x)" 10000 loops, best of 3: 170 usec per loop From mclefavor at gmail.com Wed Jul 3 00:37:29 2013 From: mclefavor at gmail.com (Matthew Lefavor) Date: Tue, 2 Jul 2013 18:37:29 -0400 Subject: [Python-ideas] Parenthesized Compound With Statement Message-ID: As you all know, Python supports a compound "with" statement to avoid the necessity of nesting these statements. Unfortunately, I find that using this feature often leads to exceeding the 79-character recommendation set forward by PEP 8. # The following is over 79 characters with open("/long/path/to/file1") as file1, open("/long/path/to/file2") as file2: pass This can be avoided by using the line continuation character, like so: with open("/long/path/to/file1") as file1, \ open("/long/path/to/file2") as file2: pass But PEP-8 prefers using implicit continuation with parentheses over line continuation. PEP 328 states that using the line continuation character is "unpalatable", which was the justification for allowing multi-line imports using parentheses: from package.subpackage import (UsefulClass1, UsefulClass2, ModuleVariable1, ModuleVariable2) Is there a reason we cannot do the same thing with compound with statements? Has this been suggested before? If so, why was it rejected? with (open("/long/path/to/file1") as file1, open("/long/path/to/file2") as file2): pass I would be happy to write the PEP for this and get plugged in to the Python development process if this is an idea worth pursuing. ML -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jul 3 05:04:56 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Jul 2013 20:04:56 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: Message-ID: It would be easier to update PEP 8 to be less dogmatic about backslashes. Where parentheses work equally well, I prefer them. But for this and similar cases (asserts come to mind) I don't see why you can't use a backslash. I would certainly prefer that over a syntax change. On Tue, Jul 2, 2013 at 3:37 PM, Matthew Lefavor wrote: > As you all know, Python supports a compound "with" statement to avoid the > necessity of nesting these statements. > > Unfortunately, I find that using this feature often leads to exceeding the > 79-character recommendation set forward by PEP 8. > > # The following is over 79 characters > with open("/long/path/to/file1") as file1, open("/long/path/to/file2") as > file2: > pass > > This can be avoided by using the line continuation character, like so: > > with open("/long/path/to/file1") as file1, \ > open("/long/path/to/file2") as file2: > pass > > But PEP-8 prefers using implicit continuation with parentheses over line > continuation. PEP 328 states that using the line continuation character is > "unpalatable", which was the justification for allowing multi-line imports > using parentheses: > > from package.subpackage import (UsefulClass1, UsefulClass2, > ModuleVariable1, ModuleVariable2) > > Is there a reason we cannot do the same thing with compound with statements? > Has this been suggested before? If so, why was it rejected? > > with (open("/long/path/to/file1") as file1, > open("/long/path/to/file2") as file2): > pass > > I would be happy to write the PEP for this and get plugged in to the Python > development process if this is an idea worth pursuing. > > ML > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From joshua.landau.ws at gmail.com Wed Jul 3 05:05:27 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 3 Jul 2013 04:05:27 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <0C35164B-A9A1-4A78-8474-99166337BE50@umbrellacode.com> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <0C35164B-A9A1-4A78-8474-99166337BE50@umbrellacode.com> Message-ID: On 1 July 2013 16:21, Shane Green wrote: > > Having a bit of a hard time following the status of this as I dropped out for a while, but I?m not really for the semi-colon separator approach. Of all the options already available in list comprehensions, this one actually seems to be one of the most easily because it starts with a keyword and ends at the end. Not that syntax highlighting should be taken into account in general, it?s worth noting the syntax highlighted version really makes this distinction quite clear: That's somewhat convincing. I'm still not convinced we need new syntax though. \ > One thing I could see doing would be to allow semi-colons anywhere in a list comprehension that?s a boundary between statements of the expanded form. > > Then they behave just like the optional semi-colons you could put at the end of a line. > > Sorry if that?s precisely what?s been promoted. . So (I'm just adding examples to your idea): [item for sublist in data for item in sublist; if item] can become [item; for sublist in data; for item in sublist; if item] In a pathological case this is a transform from: [item for sublist in data if sublist for item in sublist if item%3 if item-1] to [item for sublist in data; if sublist; for item in sublist; if item%3; if item-1] But you can always do: [item for sublist in data if sublist for item in sublist if item%3 if item-1] That said, it's not like people *do* those sorts of pathological expressions, is it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosser.meister.morti at gmx.net Wed Jul 3 05:24:02 2013 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Wed, 03 Jul 2013 05:24:02 +0200 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130702211209.6dbde663@sergey> References: <20130702211209.6dbde663@sergey> Message-ID: <51D39952.1030607@gmx.net> Such a function is very tiny: >>> import operator >>> isum = lambda *args: reduce(operator.iadd,*args) But this might be unexpected: >>> l = [] >>> l2 = isum([[1,2,3]]*1000000, l) l is now changed. In fact l == l2. I guess this could cause more troubles than it's good for if the user expects tha current behaviour. I don't think such an incompatible API change can ever be made. But one could maybe include isum, maybe just as recipe in the documentation or in itertools or somewhere. On 07/02/2013 08:12 PM, Sergey wrote: > Hello, python-ideas. Trying to cover everything in one shot, so > the message is long, sorry. > > sum() is a great function. It is the "obvious way" to add things. > Unfortunately sometimes it's slower than it could be. > > The problem is that code: > sum([[1,2,3]]*1000000, []) > takes forever to complete. Let's fix that! > > This problem was there since the sum() introduction, but it is almost > unknown among python-beginners. > > When people look at sum(seq,start=0) signature they most probably > expect it to be like this: > def sum(seq, start = 0): > for item in seq: > start += item > return start > > But it is not, because in cases like: > empty = [] > result = sum(list_of_lists, empty) > such implementation would modify content of "empty". This use-case is > known, it is checked in test_builtin.py, and explicitly mentioned in > comments inside sum() code. > > So actually sum() looks like this: > def sum(seq, start = 0): > for item in seq: > start = start + item > return start > it creates a copy of the partial result on every "start + item". > > What I suggest is instead of making a copy for every item make just one > copy and reuse it as many times as needed. For example: > def sum(seq, start = 0): > start = start + seq[0] > for item in seq[1:]: > start += item > return start > > Patch implementing this idea attached to issue18305. Patch is simple, it > just rearranges existing code. It should work for python 2.7, 3.3 and > hg-tip. Except sum() becoming faster there should be no behavior change. > Can this implementation break anything? > > Advantages: > * Performance boost is like 200000% [1], nothing else becomes slower > > Disadvantages (inspired by Terry J. Reedy): > * Other pythons (pypy, jython, older cpython versions) do not have > such optimisation, people will move code that depends on the internal > optimization to pythons that do not have it. And they will complain > about their program 'freezing'. > > * It discourages people from carefully thinking about whether they > actually need a concrete list or merely the iterator for a virtual > list. > > Alternatives to sum() for this use-case: > * list comprehension is 270% slower than patched sum [2] > * itertools.chain is 50%-100% slower than patched sum [3] > > Main questions: > * Whether to accept the patch > * If yes, whether it should go to the new version or to a bugfix release > > Alternatives to this patch: > * Reject the idea as a whole. Intentionally keep the sum() slow. > * Accept the idea, but reject this particular implementation, instead use > something different, e.g. implement special case optimization or use > different approach (I thought about start=copy.copy(start)) > > In my opinion (looking though python changelog) performance patches were > accepted for bugfix releases many times before. So as long as this patch > does not break anything, it may go even to python 2.7. > > I think bad performance is a bug that should be fixed. Rejecting > performance fixes just because older versions don't have it would mean > no performance improvements to anything, ever. > > Sum is a great basic function. It keeps the simple code simple. This > patch puts it on par with more advanced features like itertools, and > allows people to use sum for simple things and itertools for complex > memory-efficient algorithms. > > Questions that may arise if the patch is accepted: > * sum() was rejecting strings because of this bug. If the bug gets fixed > should another patch allow sum() to accept strings? > * maybe in some distant future drop the second parameter (or make it > None by default) and allow calling sum for everything, making sum() > "the one obvious way" to sum up things? > > It would be nice if sum "just worked" for everything (e.g. sum() of > empty sequence would return None, i.e. if there's nothing to sum then > nothing is returned). But I think it needs more work for that, because > even with this patch sum() is still ~20 times slower than "".join() [4] > > That's all. Any suggestions welcome. > From abarnert at yahoo.com Wed Jul 3 06:16:36 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jul 2013 21:16:36 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: Message-ID: First, you can always do this: with open("/long/path/to/file1") as file1, open( "/long/path/to/file2") as file2: Not exactly beautiful, but perfectly legal and PEP-8-compliant. Or, of course: with open("/long/path/to/file1") as file1: with open("/long/path/to/file2") as file2: But the best solution is probably: path1 = "/long/path/to/file1" path2 = "/long/path/to/file2" with open(path1) as file1, open(path2) as file2: Meanwhile, PEP 8 doesn't say never to use backslashes no matter what. While it says parens "should be used in preference to using a backslash for line continuation", it also says, "But most importantly: know when to be inconsistent -- sometimes the style guide just doesn't apply. When in doubt, use your best judgment." In particular, it specifically says that "When applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules", you should break it. So you've got four options in today's language. Is it worth changing the syntax to add a fifth? On Tue, Jul 2, 2013 at 3:37 PM, Matthew Lefavor wrote: >> As you all know, Python supports a compound "with" statement to avoid the >> necessity of nesting these statements. >> >> Unfortunately, I find that using this feature often leads to exceeding the >> 79-character recommendation set forward by PEP 8. >> >> # The following is over 79 characters >> with open("/long/path/to/file1") as file1, open("/long/path/to/file2") as >> file2: >> pass >> >> This can be avoided by using the line continuation character, like so: >> >> with open("/long/path/to/file1") as file1, \ >> open("/long/path/to/file2") as file2: >> pass >> >> But PEP-8 prefers using implicit continuation with parentheses over line >> continuation. PEP 328 states that using the line continuation character is >> "unpalatable", which was the justification for allowing multi-line imports >> using parentheses: >> >> from package.subpackage import (UsefulClass1, UsefulClass2, >> ModuleVariable1, ModuleVariable2) >> >> Is there a reason we cannot do the same thing with compound with statements? >> Has this been suggested before? If so, why was it rejected? >> >> with (open("/long/path/to/file1") as file1, >> open("/long/path/to/file2") as file2): >> pass >> >> I would be happy to write the PEP for this and get plugged in to the Python >> development process if this is an idea worth pursuing. >> >> ML >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From abarnert at yahoo.com Wed Jul 3 06:32:11 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jul 2013 21:32:11 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130702211209.6dbde663@sergey> References: <20130702211209.6dbde663@sergey> Message-ID: <53BA8317-71E4-4CF4-A59D-7EDEF7E6ED50@yahoo.com> On Jul 2, 2013, at 11:12, Sergey wrote: > Questions that may arise if the patch is accepted: > * sum() was rejecting strings because of this bug. If the bug gets fixed > should another patch allow sum() to accept strings? Does it actually speed up strings? I would have thought it only helps for types that have a mutating __iadd__ (and one that's faster than non mutating __add__, of course). > * maybe in some distant future drop the second parameter (or make it > None by default) and allow calling sum for everything, making sum() > "the one obvious way" to sum up things? sum can't guess the unified type of all of the elements in the iterable. The best it could do is what reduce does in that case: start with the first element, and add from there. That's not always the magical DWIM you seem to be expecting. Most importantly, how could it possibly work for iterables that might be empty? ''.join(lines), or sum(lines, ''), will work when there are no lines; sum(lines) can't possibly know that you expected a string. Meanwhile, if you're going to add optional "start from the first item" functionality, I think you'll also want to make the operator/function overridable. And then you've just re-invented reduce, with a slightly different signature: sum(iterable, start=None, function=operator.iadd): return reduce(function, iterable, start) > > It would be nice if sum "just worked" for everything (e.g. sum() of > empty sequence would return None, i.e. if there's nothing to sum then > nothing is returned). But I think it needs more work for that, because > even with this patch sum() is still ~20 times slower than "".join() [4] You could always special case it when start is a str (or when start is defaulted and the first value is a str). But then what happens if the second value is something that can be added to a str, but not actually a str? Or, for that matter, the start value? From abarnert at yahoo.com Wed Jul 3 06:40:23 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jul 2013 21:40:23 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D39952.1030607@gmx.net> References: <20130702211209.6dbde663@sergey> <51D39952.1030607@gmx.net> Message-ID: <7C90874E-D838-44B6-BAE0-59EAB89730D2@yahoo.com> On Jul 2, 2013, at 20:24, Mathias Panzenb?ck wrote: > Such a function is very tiny: > > >>> import operator > >>> isum = lambda *args: reduce(operator.iadd,*args) > > But this might be unexpected: > > >>> l = [] > >>> l2 = isum([[1,2,3]]*1000000, l) > > l is now changed. In fact l == l2. He explicitly suggested making one copy, before looping. So, this: sum([[1,2,3]]*1000000, l) Has to mean: reduce(operator.iadd, [[1,2,3]]*1000000, copy.copy(l)) So, it's not quite a one-liner, and it doesn't have this problem. > But one could maybe include isum, > maybe just as recipe in the documentation or in itertools or somewhere. This sounds like a good idea. Possibly even both versions, with and without the copy. While we're at it, I've always wished sum and reduce used the same name for their start/initial parameter. If we're going to be changing one (which I suspect we probably aren't, but just in case...), is this a good time to campaign for renaming the param? > On 07/02/2013 08:12 PM, Sergey wrote: >> Hello, python-ideas. Trying to cover everything in one shot, so >> the message is long, sorry. >> >> sum() is a great function. It is the "obvious way" to add things. >> Unfortunately sometimes it's slower than it could be. >> >> The problem is that code: >> sum([[1,2,3]]*1000000, []) >> takes forever to complete. Let's fix that! >> >> This problem was there since the sum() introduction, but it is almost >> unknown among python-beginners. >> >> When people look at sum(seq,start=0) signature they most probably >> expect it to be like this: >> def sum(seq, start = 0): >> for item in seq: >> start += item >> return start >> >> But it is not, because in cases like: >> empty = [] >> result = sum(list_of_lists, empty) >> such implementation would modify content of "empty". This use-case is >> known, it is checked in test_builtin.py, and explicitly mentioned in >> comments inside sum() code. >> >> So actually sum() looks like this: >> def sum(seq, start = 0): >> for item in seq: >> start = start + item >> return start >> it creates a copy of the partial result on every "start + item". >> >> What I suggest is instead of making a copy for every item make just one >> copy and reuse it as many times as needed. For example: >> def sum(seq, start = 0): >> start = start + seq[0] >> for item in seq[1:]: >> start += item >> return start >> >> Patch implementing this idea attached to issue18305. Patch is simple, it >> just rearranges existing code. It should work for python 2.7, 3.3 and >> hg-tip. Except sum() becoming faster there should be no behavior change. >> Can this implementation break anything? >> >> Advantages: >> * Performance boost is like 200000% [1], nothing else becomes slower >> >> Disadvantages (inspired by Terry J. Reedy): >> * Other pythons (pypy, jython, older cpython versions) do not have >> such optimisation, people will move code that depends on the internal >> optimization to pythons that do not have it. And they will complain >> about their program 'freezing'. >> >> * It discourages people from carefully thinking about whether they >> actually need a concrete list or merely the iterator for a virtual >> list. >> >> Alternatives to sum() for this use-case: >> * list comprehension is 270% slower than patched sum [2] >> * itertools.chain is 50%-100% slower than patched sum [3] >> >> Main questions: >> * Whether to accept the patch >> * If yes, whether it should go to the new version or to a bugfix release >> >> Alternatives to this patch: >> * Reject the idea as a whole. Intentionally keep the sum() slow. >> * Accept the idea, but reject this particular implementation, instead use >> something different, e.g. implement special case optimization or use >> different approach (I thought about start=copy.copy(start)) >> >> In my opinion (looking though python changelog) performance patches were >> accepted for bugfix releases many times before. So as long as this patch >> does not break anything, it may go even to python 2.7. >> >> I think bad performance is a bug that should be fixed. Rejecting >> performance fixes just because older versions don't have it would mean >> no performance improvements to anything, ever. >> >> Sum is a great basic function. It keeps the simple code simple. This >> patch puts it on par with more advanced features like itertools, and >> allows people to use sum for simple things and itertools for complex >> memory-efficient algorithms. >> >> Questions that may arise if the patch is accepted: >> * sum() was rejecting strings because of this bug. If the bug gets fixed >> should another patch allow sum() to accept strings? >> * maybe in some distant future drop the second parameter (or make it >> None by default) and allow calling sum for everything, making sum() >> "the one obvious way" to sum up things? >> >> It would be nice if sum "just worked" for everything (e.g. sum() of >> empty sequence would return None, i.e. if there's nothing to sum then >> nothing is returned). But I think it needs more work for that, because >> even with this patch sum() is still ~20 times slower than "".join() [4] >> >> That's all. Any suggestions welcome. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From guido at python.org Wed Jul 3 06:50:29 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 2 Jul 2013 21:50:29 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: Message-ID: Even though you are responding to my message, you omit what I wrote, and only weakly support it. So let me repeat: the backslash is the best solution and te PEP should be amended to prefer it *in this case*. On Tuesday, July 2, 2013, Andrew Barnert wrote: > First, you can always do this: > > with open("/long/path/to/file1") as file1, open( > "/long/path/to/file2") as file2: > > Not exactly beautiful, but perfectly legal and PEP-8-compliant. > > Or, of course: > > with open("/long/path/to/file1") as file1: > with open("/long/path/to/file2") as file2: > > But the best solution is probably: > > path1 = "/long/path/to/file1" > path2 = "/long/path/to/file2" > with open(path1) as file1, open(path2) as file2: > > Meanwhile, PEP 8 doesn't say never to use backslashes no matter what. > While it says parens "should be used in preference to using a backslash for > line continuation", it also says, "But most importantly: know when to be > inconsistent -- sometimes the style guide just doesn't apply. When in > doubt, use your best judgment." In particular, it specifically says that > "When applying the rule would make the code less readable, even for someone > who is used to reading code that follows the rules", you should break it. > > So you've got four options in today's language. Is it worth changing the > syntax to add a fifth? > > On Tue, Jul 2, 2013 at 3:37 PM, Matthew Lefavor > > wrote: > >> As you all know, Python supports a compound "with" statement to avoid > the > >> necessity of nesting these statements. > >> > >> Unfortunately, I find that using this feature often leads to exceeding > the > >> 79-character recommendation set forward by PEP 8. > >> > >> # The following is over 79 characters > >> with open("/long/path/to/file1") as file1, open("/long/path/to/file2") > as > >> file2: > >> pass > >> > >> This can be avoided by using the line continuation character, like so: > >> > >> with open("/long/path/to/file1") as file1, \ > >> open("/long/path/to/file2") as file2: > >> pass > >> > >> But PEP-8 prefers using implicit continuation with parentheses over line > >> continuation. PEP 328 states that using the line continuation character > is > >> "unpalatable", which was the justification for allowing multi-line > imports > >> using parentheses: > >> > >> from package.subpackage import (UsefulClass1, UsefulClass2, > >> ModuleVariable1, ModuleVariable2) > >> > >> Is there a reason we cannot do the same thing with compound with > statements? > >> Has this been suggested before? If so, why was it rejected? > >> > >> with (open("/long/path/to/file1") as file1, > >> open("/long/path/to/file2") as file2): > >> pass > >> > >> I would be happy to write the PEP for this and get plugged in to the > Python > >> development process if this is an idea worth pursuing. > >> > >> ML > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > > > > > > > > -- > > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mclefavor at gmail.com Wed Jul 3 07:29:15 2013 From: mclefavor at gmail.com (Matthew Lefavor) Date: Wed, 3 Jul 2013 01:29:15 -0400 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: Message-ID: Works for me; I'll rescind my suggestion. The following is not intended as an objection, but rather as an opportunity for me to be educated about Python design decisions: Why were parenthetical import statements added? Is it simply because import statements are more common? On Wed, Jul 3, 2013 at 12:50 AM, Guido van Rossum wrote: > Even though you are responding to my message, you omit what I wrote, and > only weakly support it. So let me repeat: the backslash is the best > solution and te PEP should be amended to prefer it *in this case*. > > > On Tuesday, July 2, 2013, Andrew Barnert wrote: > >> First, you can always do this: >> >> with open("/long/path/to/file1") as file1, open( >> "/long/path/to/file2") as file2: >> >> Not exactly beautiful, but perfectly legal and PEP-8-compliant. >> >> Or, of course: >> >> with open("/long/path/to/file1") as file1: >> with open("/long/path/to/file2") as file2: >> >> But the best solution is probably: >> >> path1 = "/long/path/to/file1" >> path2 = "/long/path/to/file2" >> with open(path1) as file1, open(path2) as file2: >> >> Meanwhile, PEP 8 doesn't say never to use backslashes no matter what. >> While it says parens "should be used in preference to using a backslash for >> line continuation", it also says, "But most importantly: know when to be >> inconsistent -- sometimes the style guide just doesn't apply. When in >> doubt, use your best judgment." In particular, it specifically says that >> "When applying the rule would make the code less readable, even for someone >> who is used to reading code that follows the rules", you should break it. >> >> So you've got four options in today's language. Is it worth changing the >> syntax to add a fifth? >> >> On Tue, Jul 2, 2013 at 3:37 PM, Matthew Lefavor >> wrote: >> >> As you all know, Python supports a compound "with" statement to avoid >> the >> >> necessity of nesting these statements. >> >> >> >> Unfortunately, I find that using this feature often leads to exceeding >> the >> >> 79-character recommendation set forward by PEP 8. >> >> >> >> # The following is over 79 characters >> >> with open("/long/path/to/file1") as file1, open("/long/path/to/file2") >> as >> >> file2: >> >> pass >> >> >> >> This can be avoided by using the line continuation character, like so: >> >> >> >> with open("/long/path/to/file1") as file1, \ >> >> open("/long/path/to/file2") as file2: >> >> pass >> >> >> >> But PEP-8 prefers using implicit continuation with parentheses over >> line >> >> continuation. PEP 328 states that using the line continuation >> character is >> >> "unpalatable", which was the justification for allowing multi-line >> imports >> >> using parentheses: >> >> >> >> from package.subpackage import (UsefulClass1, UsefulClass2, >> >> ModuleVariable1, ModuleVariable2) >> >> >> >> Is there a reason we cannot do the same thing with compound with >> statements? >> >> Has this been suggested before? If so, why was it rejected? >> >> >> >> with (open("/long/path/to/file1") as file1, >> >> open("/long/path/to/file2") as file2): >> >> pass >> >> >> >> I would be happy to write the PEP for this and get plugged in to the >> Python >> >> development process if this is an idea worth pursuing. >> >> >> >> ML >> >> >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> http://mail.python.org/mailman/listinfo/python-ideas >> > >> > >> > >> > -- >> > --Guido van Rossum (python.org/~guido) >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > http://mail.python.org/mailman/listinfo/python-ideas >> > > > -- > --Guido van Rossum (on iPad) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Jul 3 07:52:38 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 03 Jul 2013 17:52:38 +1200 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <51D355A1.1050708@canterbury.ac.nz> Message-ID: <51D3BC26.7040900@canterbury.ac.nz> Daniel Robinson wrote: > That cofactor would have to have a prime factor less than sqrt(n). You're right, of course. I feel suitably enfoolished. But there is a slight error in the second implementation: if all(n % p for p in takewhile(lambda p: p**2 < n, primes_seen)): should be if all(n % p for p in takewhile(lambda p: p**2 <= n, primes_seen)): otherwise it thinks that perfect squares are primes. -- Greg From ncoghlan at gmail.com Wed Jul 3 10:19:44 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 3 Jul 2013 18:19:44 +1000 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: Message-ID: On 3 July 2013 15:29, Matthew Lefavor wrote: > Works for me; I'll rescind my suggestion. > > The following is not intended as an objection, but rather as an > opportunity for me to be educated about Python design decisions: Why were > parenthetical import statements added? Is it simply because import > statements are more common? > Different times, and no syntactical ambiguity. with statements and asserts are different, as they already support parens (they just mean something different, and even though that meaning isn't useful, it's still valid syntax). For with statements, long ones can either be broken up with backslashes, or (in 3.3+) modified to use contextlib.ExitStack. Assert statements and escaping the first newline when dedenting an indented multi-line string literal are the other two main use cases where the escaped newline is the preferred (or only) option. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jul 3 10:14:46 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 3 Jul 2013 18:14:46 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <7C90874E-D838-44B6-BAE0-59EAB89730D2@yahoo.com> References: <20130702211209.6dbde663@sergey> <51D39952.1030607@gmx.net> <7C90874E-D838-44B6-BAE0-59EAB89730D2@yahoo.com> Message-ID: On 3 July 2013 14:40, Andrew Barnert wrote: > On Jul 2, 2013, at 20:24, Mathias Panzenb?ck < > grosser.meister.morti at gmx.net> wrote: > > > Such a function is very tiny: > > > > >>> import operator > > >>> isum = lambda *args: reduce(operator.iadd,*args) > > > > But this might be unexpected: > > > > >>> l = [] > > >>> l2 = isum([[1,2,3]]*1000000, l) > > > > l is now changed. In fact l == l2. > > He explicitly suggested making one copy, before looping. So, this: > > sum([[1,2,3]]*1000000, l) > > Has to mean: > > reduce(operator.iadd, [[1,2,3]]*1000000, copy.copy(l)) > > So, it's not quite a one-liner, and it doesn't have this problem. > > > But one could maybe include isum, > > maybe just as recipe in the documentation or in itertools or somewhere. > > This sounds like a good idea. Possibly even both versions, with and > without the copy. > > While we're at it, I've always wished sum and reduce used the same name > for their start/initial parameter. If we're going to be changing one (which > I suspect we probably aren't, but just in case...), is this a good time to > campaign for renaming the param? > reduce only just survived being culled completely in the Python 3 transition, so if one was going to change it would be reduce. Our perspective is that if you're considering using reduce, you should remember you're writing Python rather than a pure functional language and switch to a loop instead. (By contrast, map and filter are nice alternatives to a generator expression if you are just applying an existing function) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Jul 3 11:19:02 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 03 Jul 2013 12:19:02 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D39952.1030607@gmx.net> References: <20130702211209.6dbde663@sergey> <51D39952.1030607@gmx.net> Message-ID: 03.07.13 06:24, Mathias Panzenb?ck ???????(??): > Such a function is very tiny: > > >>> import operator > >>> isum = lambda *args: reduce(operator.iadd,*args) > > But this might be unexpected: > > >>> l = [] > >>> l2 = isum([[1,2,3]]*1000000, l) > > l is now changed. In fact l == l2. I guess this could cause more > troubles than > it's good for if the user expects tha current behaviour. I don't think > such an > incompatible API change can ever be made. But one could maybe include isum, > maybe just as recipe in the documentation or in itertools or somewhere. Sergey's code is more smart. It equals to the following Python code: def sum(iterable, start=0): it = iter(iterable) try: x = next(it) except StopIteration: return start result = start + x for x in it: result += x return result From storchaka at gmail.com Wed Jul 3 12:10:59 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 03 Jul 2013 13:10:59 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130702211209.6dbde663@sergey> References: <20130702211209.6dbde663@sergey> Message-ID: 02.07.13 21:12, Sergey ???????(??): > Alternatives to sum() for this use-case: > * list comprehension is 270% slower than patched sum [2] > * itertools.chain is 50%-100% slower than patched sum [3] You forgot most straightforward alternative: result = [] for x in iterable: result.extend(x) I'm sure this is most popular solution for this problem. > Questions that may arise if the patch is accepted: > * sum() was rejecting strings because of this bug. If the bug gets fixed > should another patch allow sum() to accept strings? No. Strings are immutable. > * maybe in some distant future drop the second parameter (or make it > None by default) and allow calling sum for everything, making sum() > "the one obvious way" to sum up things? No. sum() should work for empty sequences. From sergemp at mail.ru Wed Jul 3 13:02:53 2013 From: sergemp at mail.ru (Sergey) Date: Wed, 3 Jul 2013 14:02:53 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <53BA8317-71E4-4CF4-A59D-7EDEF7E6ED50@yahoo.com> References: <20130702211209.6dbde663@sergey> <53BA8317-71E4-4CF4-A59D-7EDEF7E6ED50@yahoo.com> Message-ID: <20130703140253.61dabcb4@sergey> On Jul 2, 2013 Andrew Barnert wrote: I'm trying to make sum() more SIMPLE, not more complex. Currently sum() "just works" only for numbers. So I was thinking how to make it "just work" for everything. Check this: sum([something]) sum([something1, something2]) and this: sum([], something) sum([something2], something1) First one looks obvious even for those who does not know python, while to understand the second one, you have to actually read the manuals. And even after that it may be unclear why second parameter is mandatory for non-numbers. "Readability counts." But, anyway, that's not the point of THIS bugreport/patch. This patch is to fix the slowness, that is easy to fix, without introducing any changes. > Does it actually speed up strings? Unfortunately it does not, sum() is still much slower than join(). That's because join() was optimised for strings, it walks through the list twice, first time to calculate total size of all the strings and allocate the entire string in one shot, and second time to copy all the strings into allocated memory. > sum can't guess the unified type of all of the elements in the > iterable. The best it could do is what reduce does in that case: > start with the first element, and add from there. That's not always > the magical DWIM you seem to be expecting. Why not? I wouldn't expect anything else from sum(). Well, sum() is so common and useful, I was just thinking of how to turn it into "the one obvious way", something that "just works", something like: def justsum(seq): result = None try: seq = iter(seq) result = next(seq) result = result + next(seq) while True: result += next(seq) except StopIteration: return result It would work for everything: >>> justsum(['a', 'bb', 'ccc']) 'abbccc' >>> justsum([1, 2, 3]) 6 >>> justsum([[1,2], [3,4], [5,6]]) [1, 2, 3, 4, 5, 6] >>> justsum([3, 0.1, 0.04]) 3.14 It would still work even for weird cases. For example, what should be the sum of one element? It should be the element itself, no matter what element would it be, right? For example: >>> justsum([list]) >>> g = (i*i for i in range(10)) >>> justsum([g]) at 0xa93280> I know, these are unusual use-cases, but they work as expected. > Most importantly, how could it possibly work for iterables that > might be empty? ''.join(lines), or sum(lines, ''), will work when > there are no lines; sum(lines) can't possibly know that you expected > a string. If there're no elements to sum it would return "nothing" i.e. None. That looks rather obvious to me. However I'm not Dutch, so I can be wrong. :) > Meanwhile, if you're going to add optional "start from the first > item" functionality, I think you'll also want to make the > operator/function overridable. > > And then you've just re-invented reduce, with a slightly different signature: > sum(iterable, start=None, function=operator.iadd): > return reduce(function, iterable, start) Does not work even for: sum([1, 2, 3, 4]) not to mention: lists = [[1, 2], [3, 4]] sum(lists) where reduce-based code fails, because it modifies original list. > You could always special case it when start is a str (or when > start is defaulted and the first value is a str). But then what > happens if the second value is something that can be added to a str, > but not actually a str? Or, for that matter, the start value? Fallback to general code. This is how sum() works right now. First it attempts to store result in integer. If that fails it tries float. If that fails too, even if it fails in the middle of the list, sum() falls back to general PyNumber_Add(). My patch only adjusts that general part so that it first tries PyNumber_InPlaceAdd(), and only if that fails uses PyNumber_Add(). Well, PyNumber_InPlaceAdd() does that for me. -- From steve at pearwood.info Wed Jul 3 14:43:46 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 03 Jul 2013 22:43:46 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130702211209.6dbde663@sergey> References: <20130702211209.6dbde663@sergey> Message-ID: <51D41C82.2040301@pearwood.info> On 03/07/13 04:12, Sergey wrote: > Hello, python-ideas. Trying to cover everything in one shot, so > the message is long, sorry. > > sum() is a great function. It is the "obvious way" to add things. > Unfortunately sometimes it's slower than it could be. > > The problem is that code: > sum([[1,2,3]]*1000000, []) > takes forever to complete. Let's fix that! I'm not sure that sum() is the Obvious Way to concatenate lists, and I don't think that concatenating many lists is a common thing to do. Traditionally, sum() works only on numbers, and I think we wouldn't be having this discussion if Python used & for concatenation instead of +. So I don't care that sum() has quadratic performance on lists (and tuples), and I must admit that having a simple quadratic algorithm in the built-ins is sometimes useful for teaching purposes, so I'm -0 on optimizing this case. [...] > When people look at sum(seq,start=0) signature they most probably > expect it to be like this: > def sum(seq, start = 0): > for item in seq: > start += item > return start That's not what I expect, since += risks modifying its argument in place. [...] > What I suggest is instead of making a copy for every item make just one > copy and reuse it as many times as needed. For example: > def sum(seq, start = 0): > start = start + seq[0] > for item in seq[1:]: > start += item > return start [...] > Can this implementation break anything? That will still have quadratic performance for tuples, and it will break if seq is an iterator. It will also be expensive to make a slice of seq if it is a huge list, and could even run out of memory to hold both the original and the slice. I would be annoyed if sum started failing with a MemoryError here: big_seq = list(range(1000))*1000000 # assume this succeeds total = sum(big_seq) # could fail [...] > Alternatives to sum() for this use-case: > * list comprehension is 270% slower than patched sum [2] > * itertools.chain is 50%-100% slower than patched sum [3] How about the "Obvious Way" to concatenate lists? new = [] for x in seq: new.extend(x) > Main questions: > * Whether to accept the patch > * If yes, whether it should go to the new version or to a bugfix release -0 on the general idea, -1 on the specific implementation. I'd rather have sum of lists be slow than risk sum of numbers raise MemoryError. -- Steven From jbvsmo at gmail.com Wed Jul 3 14:55:11 2013 From: jbvsmo at gmail.com (=?ISO-8859-1?Q?Jo=E3o_Bernardo?=) Date: Wed, 3 Jul 2013 09:55:11 -0300 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: Message-ID: 2013/7/3 Guido van Rossum > It would be easier to update PEP 8 to be less dogmatic about > backslashes. > +1 I found myself using backslashes a lot more than parentesis for such things. I would like the change on PEP8. Jo?o Bernardo -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Jul 3 16:05:37 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 3 Jul 2013 15:05:37 +0100 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: <51D3BC26.7040900@canterbury.ac.nz> References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <51D355A1.1050708@canterbury.ac.nz> <51D3BC26.7040900@canterbury.ac.nz> Message-ID: On 3 July 2013 06:52, Greg Ewing wrote: > Daniel Robinson wrote: >> >> That cofactor would have to have a prime factor less than sqrt(n). > > > > You're right, of course. I feel suitably enfoolished. > It took me a little while to understand what you were getting at. My way of thinking about this is that in every pair of factors either both are equal to sqrt(n) or one is less than sqrt(n) and the other greater. > > But there is a slight error in the second implementation: > > > if all(n % p for p in takewhile(lambda p: p**2 < n, primes_seen)): > > should be > > if all(n % p for p in takewhile(lambda p: p**2 <= n, primes_seen)): > > otherwise it thinks that perfect squares are primes. Ah, yes. I had it right in the for/while version so I'm going to blame this typo on takewhile/lambda for obfuscating my code. :) Oscar From barry at python.org Wed Jul 3 17:01:47 2013 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Jul 2013 11:01:47 -0400 Subject: [Python-ideas] Parenthesized Compound With Statement References: Message-ID: <20130703110147.4c6a8be4@anarchist> On Jul 02, 2013, at 06:37 PM, Matthew Lefavor wrote: >As you all know, Python supports a compound "with" statement to avoid the >necessity of nesting these statements. > >Unfortunately, I find that using this feature often leads to exceeding the >79-character recommendation set forward by PEP 8. Yeah, I noticed and brought this up a while ago. I thought I filed a bug but can't find it. It would have been closed Won't Fix anyway. As others have pointed out, you can usually rewrite such long with-statements to conform to the PEP 8 line lengths by e.g. saving the paths in a shorter named local variable), using Python 3.3's ExitStack, which is awesome btw, or just using backslashes. Remember PEP 8 isn't a weapon to be wielded by a stern Auntie Tim Peters. As for relaxing PEP 8, well, I'm not sure what more you'd want it to say. It currently reads: The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation. which seems about right to me. It is preferred to use implied continuation over backslashes, but doesn't say "don't use backslashes". They're not evil, just not preferred, so use them where appropriate and with good judgment. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Wed Jul 3 17:38:04 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Jul 2013 08:38:04 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: <20130703110147.4c6a8be4@anarchist> References: <20130703110147.4c6a8be4@anarchist> Message-ID: Clearly some people have interpreted this dogmatically as "never use backslashes", so I think we should explicitly update the PEP with more moderate language and an example. On Wednesday, July 3, 2013, Barry Warsaw wrote: > On Jul 02, 2013, at 06:37 PM, Matthew Lefavor wrote: > > >As you all know, Python supports a compound "with" statement to avoid the > >necessity of nesting these statements. > > > >Unfortunately, I find that using this feature often leads to exceeding the > >79-character recommendation set forward by PEP 8. > > Yeah, I noticed and brought this up a while ago. I thought I filed a bug > but > can't find it. It would have been closed Won't Fix anyway. > > As others have pointed out, you can usually rewrite such long > with-statements > to conform to the PEP 8 line lengths by e.g. saving the paths in a shorter > named local variable), using Python 3.3's ExitStack, which is awesome btw, > or > just using backslashes. Remember PEP 8 isn't a weapon to be wielded by a > stern Auntie Tim Peters. > > As for relaxing PEP 8, well, I'm not sure what more you'd want it to say. > It > currently reads: > > The preferred way of wrapping long lines is by using Python's implied > line continuation inside parentheses, brackets and braces. Long lines > can be broken over multiple lines by wrapping expressions in > parentheses. These should be used in preference to using a backslash > for line continuation. > > which seems about right to me. It is preferred to use implied continuation > over backslashes, but doesn't say "don't use backslashes". They're not > evil, > just not preferred, so use them where appropriate and with good judgment. > > Cheers, > -Barry > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jul 3 17:48:07 2013 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Jul 2013 11:48:07 -0400 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: <20130703110147.4c6a8be4@anarchist> Message-ID: <20130703114807.636254aa@anarchist> On Jul 03, 2013, at 08:38 AM, Guido van Rossum wrote: >Clearly some people have interpreted this dogmatically as "never use >backslashes", so I think we should explicitly update the PEP with more >moderate language and an example. How about this: diff -r bd8e5c86fb27 pep-0008.txt --- a/pep-0008.txt Wed Jul 03 00:44:58 2013 +0200 +++ b/pep-0008.txt Wed Jul 03 11:46:07 2013 -0400 @@ -158,9 +158,19 @@ line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash -for line continuation. Make sure to indent the continued line -appropriately. The preferred place to break around a binary operator -is *after* the operator, not before it. Some examples:: +for line continuation. + +Backslashes may still be appropriate at times. For example, long, +multiple ``with``-statements cannot use implicit continuation, so +backslashes are acceptable:: + + with open('/path/to/some/file/you/want/to/read') as file_1, \ + open('/path/to/some/file/being/written', 'w') as file_2: + file_2.write(file_1.read()) + +Make sure to indent the continued line appropriately. The preferred +place to break around a binary operator is *after* the operator, not +before it. Some examples:: class Rectangle(Blob): -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ron3200 at gmail.com Wed Jul 3 18:56:09 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 03 Jul 2013 11:56:09 -0500 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: <20130703110147.4c6a8be4@anarchist> References: <20130703110147.4c6a8be4@anarchist> Message-ID: On 07/03/2013 10:01 AM, Barry Warsaw wrote: > As for relaxing PEP 8, well, I'm not sure what more you'd want it to say. It > currently reads: > > The preferred way of wrapping long lines is by using Python's implied > line continuation inside parentheses, brackets and braces. Long lines > can be broken over multiple lines by wrapping expressions in > parentheses. These should be used in preference to using a backslash > for line continuation. > > which seems about right to me. It is preferred to use implied continuation > over backslashes, but doesn't say "don't use backslashes". They're not evil, > just not preferred, so use them where appropriate and with good judgment. I think this can be improved on. The second part should be a positive statement expressing when backslashes should be used over an implied continuation, rather than a blanket negative statement. Use a backslash if parentheses cause the code to be less clear. My own preference is: Use what obviously reads better. This can happen when continuing expressions which contain tuples, or strings with backslashes. Using what is not in the expression can help make the intention clearer. Use what takes less characters. For an expression that span 2 lines, use a single backslash. For longer expressions spanning 4 or more lines parentheses are preferred. For continued lines inside of function calls, and container literals, indenting the continued lines is enough in most cases. Sometimes adding a backslash within a container can make a continued line between many single line items stand out better. (*) (* There have been "silent" bugs where a comma was omitted in cases where some lines are continued and some aren't.) Cheers, Ron From joshua.landau.ws at gmail.com Wed Jul 3 18:58:04 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 3 Jul 2013 17:58:04 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D41C82.2040301@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: On 3 July 2013 13:43, Steven D'Aprano wrote: > On 03/07/13 04:12, Sergey wrote: >> >> Hello, python-ideas. Trying to cover everything in one shot, so >> the message is long, sorry. >> >> sum() is a great function. It is the "obvious way" to add things. >> Unfortunately sometimes it's slower than it could be. >> >> The problem is that code: >> sum([[1,2,3]]*1000000, []) >> takes forever to complete. Let's fix that! > > > I'm not sure that sum() is the Obvious Way to concatenate lists, and I don't > think that concatenating many lists is a common thing to do. Traditionally, > sum() works only on numbers, and I think we wouldn't be having this > discussion if Python used & for concatenation instead of +. So I don't care > that sum() has quadratic performance on lists (and tuples), and I must admit > that having a simple quadratic algorithm in the built-ins is sometimes > useful for teaching purposes, so I'm -0 on optimizing this case. This optimises all circumstances where iadd is faster than add. It makes sense to me. >> What I suggest is instead of making a copy for every item make just one >> copy and reuse it as many times as needed. For example: >> def sum(seq, start = 0): >> start = start + seq[0] >> for item in seq[1:]: >> start += item >> return start > > [...] > >> Can this implementation break anything? > > > That will still have quadratic performance for tuples, and it will break if > seq is an iterator. > > It will also be expensive to make a slice of seq if it is a huge list, and > could even run out of memory to hold both the original and the slice. I > would be annoyed if sum started failing with a MemoryError here: > > big_seq = list(range(1000))*1000000 # assume this succeeds > total = sum(big_seq) # could fail Then you can just make it support iterators: def sum(seq, start = 0): seq = iter(seq) start = start + next(seq) for item in seq: start += item return start >> Alternatives to sum() for this use-case: >> * list comprehension is 270% slower than patched sum [2] >> * itertools.chain is 50%-100% slower than patched sum [3] > > > How about the "Obvious Way" to concatenate lists? > > new = [] > for x in seq: > new.extend(x) What's wrong with sum, if sum is fast? > -0 on the general idea, -1 on the specific implementation. I'd rather have > sum of lists be slow than risk sum of numbers raise MemoryError. How about with my version? I see no obvious downsides, personally. From guido at python.org Wed Jul 3 19:22:51 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Jul 2013 10:22:51 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: <20130703114807.636254aa@anarchist> References: <20130703110147.4c6a8be4@anarchist> <20130703114807.636254aa@anarchist> Message-ID: Looks good, but I would add a mention of assert (without example) as another case where using a backslash may make sense. On Wed, Jul 3, 2013 at 8:48 AM, Barry Warsaw wrote: > On Jul 03, 2013, at 08:38 AM, Guido van Rossum wrote: > >>Clearly some people have interpreted this dogmatically as "never use >>backslashes", so I think we should explicitly update the PEP with more >>moderate language and an example. > > How about this: > > diff -r bd8e5c86fb27 pep-0008.txt > --- a/pep-0008.txt Wed Jul 03 00:44:58 2013 +0200 > +++ b/pep-0008.txt Wed Jul 03 11:46:07 2013 -0400 > @@ -158,9 +158,19 @@ > line continuation inside parentheses, brackets and braces. Long lines > can be broken over multiple lines by wrapping expressions in > parentheses. These should be used in preference to using a backslash > -for line continuation. Make sure to indent the continued line > -appropriately. The preferred place to break around a binary operator > -is *after* the operator, not before it. Some examples:: > +for line continuation. > + > +Backslashes may still be appropriate at times. For example, long, > +multiple ``with``-statements cannot use implicit continuation, so > +backslashes are acceptable:: > + > + with open('/path/to/some/file/you/want/to/read') as file_1, \ > + open('/path/to/some/file/being/written', 'w') as file_2: > + file_2.write(file_1.read()) > + > +Make sure to indent the continued line appropriately. The preferred > +place to break around a binary operator is *after* the operator, not > +before it. Some examples:: > > class Rectangle(Blob): > -- --Guido van Rossum (python.org/~guido) From barry at python.org Wed Jul 3 19:27:01 2013 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Jul 2013 13:27:01 -0400 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: <20130703110147.4c6a8be4@anarchist> <20130703114807.636254aa@anarchist> Message-ID: <20130703132701.41469077@anarchist> On Jul 03, 2013, at 10:22 AM, Guido van Rossum wrote: >Looks good, but I would add a mention of assert (without example) as >another case where using a backslash may make sense. Done. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Wed Jul 3 19:34:33 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Jul 2013 10:34:33 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: <20130703110147.4c6a8be4@anarchist> Message-ID: On Wed, Jul 3, 2013 at 9:56 AM, Ron Adam wrote: > > > On 07/03/2013 10:01 AM, Barry Warsaw wrote: >> >> As for relaxing PEP 8, well, I'm not sure what more you'd want it to say. >> It >> currently reads: >> >> The preferred way of wrapping long lines is by using Python's implied >> line continuation inside parentheses, brackets and braces. Long >> lines >> can be broken over multiple lines by wrapping expressions in >> parentheses. These should be used in preference to using a backslash >> for line continuation. >> >> which seems about right to me. It is preferred to use implied >> continuation >> over backslashes, but doesn't say "don't use backslashes". They're not >> evil, >> just not preferred, so use them where appropriate and with good judgment. > > > > I think this can be improved on. The second part should be a positive > statement expressing when backslashes should be used over an implied > continuation, rather than a blanket negative statement. > > Use a backslash if parentheses cause the code to be less clear. > > > > > My own preference is: > > Use what obviously reads better. Too subjective. > This can happen when continuing > expressions which contain tuples, or strings with backslashes. Using what > is not in the expression can help make the intention clearer. > > Use what takes less characters. For an expression that span 2 lines, use a > single backslash. For longer expressions spanning 4 or more lines > parentheses are preferred. Wrong. For breaking expressions, it's almost always better to use parentheses, even if you have to add them, because they can always be added. The backslashes remain preferred in cases where the natural break point is such that you cannot add parentheses syntactically. In Python 3 this is mainly assert and with (in Python 2 it included import). What I am trying to avoid here is that people break their lines at an unnatural place just so they can avoid the backslash. An examples of an unnatural break: assert some_long_expression_fitting_on_one_line(), "foo {} bar".format( "argument that easily fits on a line") Here, the natural (highest-level) breakpoint is right after the comma, and the (bad) solution used in this example is to take the expression "foo {} bar".format("argument that easily fits on a line") and break it after the parentheses. The resulting layout is a mess: the first line contains one-and-a-half item, the second line half an item. Note that when breaking expressions or argument lists you won't have to do this: if instead of assert we had a call, you would break at the comma: whatever(some_long_expression_fitting_on_one_line(), "foo {} bar".format("argument that easily fits on a line")) (Leaving aside not the issue of how many spaces to use to indent the second line -- I am typing this in a variable-width font so I don't want to play ASCII-art and align the first " under the 's' of 'some', even though that's what Emacs would do and I usually let it.) > For continued lines inside of function calls, and container literals, > indenting the continued lines is enough in most cases. Again, don't break in the middle of an argument that would fit on a line by itself, even if you end up needing an extra line in total. I strongly prefer foo(expr1, expr2a + expr2b, expr3) over foo(expr1, expr2a + expr2b, expr3) Similar if expr2 has its own parentheses. > Sometimes adding a backslash within a container can make a continued line > between many single line items stand out better. (*) > > > (* There have been "silent" bugs where a comma was omitted in cases where > some lines are continued and some aren't.) I don't understand what you are referring to here, but I don't think we should invoke backslashes to solve the problem of significant trailing commas. -- --Guido van Rossum (python.org/~guido) From mertz at gnosis.cx Wed Jul 3 19:43:31 2013 From: mertz at gnosis.cx (David Mertz) Date: Wed, 3 Jul 2013 10:43:31 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: <20130703110147.4c6a8be4@anarchist> Message-ID: This looks good... EXCEPT: On Wed, Jul 3, 2013 at 9:56 AM, Ron Adam wrote: > Use what obviously reads better. This can happen when continuing > expressions which contain tuples, or strings with backslashes. Using what > is not in the expression can help make the intention clearer. > > Use what takes less characters. For an expression that span 2 lines, use > a single backslash. For longer expressions spanning 4 or more lines > parentheses are preferred. > FEWER characters! Also "spanS 2 lines" > > For continued lines inside of function calls, and container literals, > indenting the continued lines is enough in most cases. > > Sometimes adding a backslash within a container can make a continued line > between many single line items stand out better. (*) > > (* There have been "silent" bugs where a comma was omitted in cases where > some lines are continued and some aren't.) > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Jul 3 19:58:31 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 04 Jul 2013 03:58:31 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: <51D46647.3080100@pearwood.info> On 04/07/13 02:58, Joshua Landau wrote: > What's wrong with sum, if sum is fast? sum simply cannot *always* be fast. E.g. summing tuples will still be slow even with your suggestion. Using sum() on anything other than numbers is conceptually dubious; yes, sum() is intended for addition, but it's intended for *numeric* addition. It's unfortunate that Python uses + for concatenation, we're stuck with it until Python 4000, but if you're using sum() to add lists, tuples, or other non-numbers, you're living in a state of sin. (A venal sin rather than a mortal sin.) It's allowed, but we shouldn't encourage it, and treating sum() as if it were the One Obvious Way to concatenate data is, in my strong opinion, the wrong thing to do. In my opinion, the One Obvious Way to concatenate a lot of arbitrary data is list.extend, not sum. I can't gather any enthusiasm for optimizing sum to speed up concatenation. I'm at best indifferent towards the specific proposal to speed up sum of lists; I'm hostile to any attempt to encourage people to treat sum() as the preferred way to concatenate large amounts of data, because that will surely lead them into bad habits and will end with them trying to sum() a lot of tuples or linked lists or something and getting O(n**2) performance. -- Steven From ron3200 at gmail.com Wed Jul 3 22:55:09 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 03 Jul 2013 15:55:09 -0500 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: <20130703110147.4c6a8be4@anarchist> Message-ID: On 07/03/2013 12:34 PM, Guido van Rossum wrote: > On Wed, Jul 3, 2013 at 9:56 AM, Ron Adam wrote: >> >> >> On 07/03/2013 10:01 AM, Barry Warsaw wrote: >>> >>> As for relaxing PEP 8, well, I'm not sure what more you'd want it to say. >>> It >>> currently reads: >>> >>> The preferred way of wrapping long lines is by using Python's implied >>> line continuation inside parentheses, brackets and braces. Long >>> lines >>> can be broken over multiple lines by wrapping expressions in >>> parentheses. These should be used in preference to using a backslash >>> for line continuation. >>> >>> which seems about right to me. It is preferred to use implied >>> continuation >>> over backslashes, but doesn't say "don't use backslashes". They're not >>> evil, >>> just not preferred, so use them where appropriate and with good judgment. >> >> >> >> I think this can be improved on. The second part should be a positive >> statement expressing when backslashes should be used over an implied >> continuation, rather than a blanket negative statement. >> >> Use a backslash if parentheses cause the code to be less clear. >> >> >> >> >> My own preference is: >> >> Use what obviously reads better. > > Too subjective. And is why I added the clarification just below. >> This can happen when continuing >> expressions which contain tuples, or strings with backslashes. Using what >> is not in the expression can help make the intention clearer. >> Use what takes less characters. For an expression that span 2 lines, use a >> single backslash. For longer expressions spanning 4 or more lines >> parentheses are preferred. > > Wrong. For breaking expressions, it's almost always better to use > parentheses, even if you have to add them, because they can always be > added. Is there any time a '\' can't be added? > The backslashes remain preferred in cases where the natural break > point is such that you cannot add parentheses syntactically. In Python > 3 this is mainly assert and with (in Python 2 it included import). In this case, it's preferred over extending the line over 80 chars. > What I am trying to avoid here is that people break their lines at an > unnatural place just so they can avoid the backslash. An examples of > an unnatural break: The difficult part of communicating concepts like these is knowing when to put in additional supporting details and when to leave them out. I agree with the above and include it in the 'what reads better' category, but maybe it needs to be noted in PEP 8. Fortunately improving documentation can be viewed a process, and we can make adjustments as it's needed. > assert some_long_expression_fitting_on_one_line(), "foo {} bar".format( > "argument that easily fits on a line") > > Here, the natural (highest-level) breakpoint is right after the comma, > and the (bad) solution used in this example is to take the expression > > "foo {} bar".format("argument that easily fits on a line") > > and break it after the parentheses. The resulting layout is a mess: > the first line contains one-and-a-half item, the second line half an > item. Note that when breaking expressions or argument lists you won't > have to do this: if instead of assert we had a call, you would break > at the comma: > > whatever(some_long_expression_fitting_on_one_line(), > "foo {} bar".format("argument that easily fits on a line")) > > (Leaving aside not the issue of how many spaces to use to indent the > second line -- I am typing this in a variable-width font so I don't > want to play ASCII-art and align the first " under the 's' of 'some', > even though that's what Emacs would do and I usually let it.) > >> For continued lines inside of function calls, and container literals, >> indenting the continued lines is enough in most cases. > > Again, don't break in the middle of an argument that would fit on a > line by itself, even if you end up needing an extra line in total. I > strongly prefer > > foo(expr1, > expr2a + expr2b, > expr3) > > over > > foo(expr1, expr2a + > expr2b, expr3) > > Similar if expr2 has its own parentheses. I agree. >> Sometimes adding a backslash within a container can make a continued line >> between many single line items stand out better. (*) >> >> >> (* There have been "silent" bugs where a comma was omitted in cases where >> some lines are continued and some aren't.) > > I don't understand what you are referring to here, but I don't think > we should invoke backslashes to solve the problem of significant > trailing commas. I'll see if I can find some real examples of this. It's a case where there is a long list, and every once in a while there is a long line that is implicitly continued. So an occasional missing comma is what is supposed to be there. Which makes seeing an actual missing comma is harder to do. When there is a long list with only a few continuations, I don't see how a few backslashes would be harmful, they would explicitly show those line are meant to be continued and are not missing a comma. I don't think this comes up that often, so it probably doesn't need any special treatment. Cheers, Ron From guido at python.org Wed Jul 3 23:39:37 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Jul 2013 14:39:37 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: <20130703110147.4c6a8be4@anarchist> Message-ID: On Wed, Jul 3, 2013 at 1:55 PM, Ron Adam wrote: > On 07/03/2013 12:34 PM, Guido van Rossum wrote: > >> On Wed, Jul 3, 2013 at 9:56 AM, Ron Adam wrote: >> > > Sometimes adding a backslash within a container can make a continued line >>> between many single line items stand out better. (*) >>> >>> (* There have been "silent" bugs where a comma was omitted in cases where >>> some lines are continued and some aren't.) >>> >> >> I don't understand what you are referring to here, but I don't think >> we should invoke backslashes to solve the problem of significant >> trailing commas. >> > > I'll see if I can find some real examples of this. > > It's a case where there is a long list, and every once in a while there is > a long line that is implicitly continued. So an occasional missing comma > is what is supposed to be there. Which makes seeing an actual missing > comma is harder to do. > > When there is a long list with only a few continuations, I don't see how a > few backslashes would be harmful, they would explicitly show those line are > meant to be continued and are not missing a comma. > > I don't think this comes up that often, so it probably doesn't need any > special treatment. Maybe you're talking about this case: instructions = [ 'foo', 'bar', 'very long line' # comma intentionally missing ' which is continued here', 'not so long line' # comma accidentally missing 'accidentally continued', 'etc.', ] I brought this up a few weeks or months ago, hoping to convince people that implicit string concatenation is evil enough to ban. I didn't get very far. But allowing backslashes here won't accomplish anything, because they are redundant. (In fact I get really mad when people use backslashes for continuation inside parens/brackets/braces. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Thu Jul 4 00:02:00 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 3 Jul 2013 23:02:00 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D46647.3080100@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> Message-ID: On 3 July 2013 18:58, Steven D'Aprano wrote: > On 04/07/13 02:58, Joshua Landau wrote: > >> What's wrong with sum, if sum is fast? > > sum simply cannot *always* be fast. E.g. summing tuples will still be slow > even with your suggestion. > > Using sum() on anything other than numbers is conceptually dubious; yes, > sum() is intended for addition, but it's intended for *numeric* addition. Sum: A quantity obtained by addition or aggregation. [http://en.wiktionary.org/wiki/sum] I don't see why it should be constrained. > It's unfortunate that Python uses + for concatenation, Is it? I'm very happy with it myself. > we're stuck with it > until Python 4000, but if you're using sum() to add lists, tuples, or other > non-numbers, you're living in a state of sin. Why? There are a lot of things it makes sense to take the sum of - a lot of which have constant-ish performance on addition. > (A venal sin rather than a > mortal sin.) It's allowed, but we shouldn't encourage it, and treating sum() > as if it were the One Obvious Way to concatenate data is, in my strong > opinion, the wrong thing to do. No, no. A fast sum() is not the one obvious way to concatenate data, much as sum() is not the one obvious way to add data. It just means that if conceptually you're looking for the sum, you'd be able to do it without shooting yourself in the foot. I'd also point out there are a *lot* of times when iadd is faster than add, not just contancation. > In my opinion, the One Obvious Way to concatenate a lot of arbitrary data is > list.extend, not sum. Is it? Maybe for lists. Often itertools.chain is better. It really depends on your data. I tend to use itertools.chain a lot more than list.extend, because I sum iterators more than I sum lists. Maybe I'm just weird. > I can't gather any enthusiasm for optimizing sum to speed up concatenation. > I'm at best indifferent towards the specific proposal to speed up sum of > lists; I'm hostile to any attempt to encourage people to treat sum() as the > preferred way to concatenate large amounts of data, because that will surely > lead them into bad habits and will end with them trying to sum() a lot of > tuples or linked lists or something and getting O(n**2) performance. Maybe the difference is that I look at this as an optimisation to every use of sum where iadd is faster than add, not just list contancation. If you're summing tuples, either you know what to do (sum into a list and then convert back) or you're going to do it wrong *anyway*. I can't remember the last time I had to sum tuples. Also, if you're summing linked lists, why would it be O(n**2) performance? Surely it's still O(n), given the new implementation? From abarnert at yahoo.com Thu Jul 4 00:15:50 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jul 2013 15:15:50 -0700 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: <20130703110147.4c6a8be4@anarchist> Message-ID: <7F83BD7B-A38E-4C68-A34B-34C29D6C704A@yahoo.com> On Jul 3, 2013, at 13:55, Ron Adam wrote: >> Wrong. For breaking expressions, it's almost always better to use >> parentheses, even if you have to add them, because they can always be >> added. > > Is there any time a '\' can't be added? Well, If you want inline comments on each line. But I don't think this is a serious issue, because I can't think of a case where it's likely to happen that implicit continuation in parens doesn't already take care of. (For example, calling a builtin with 8 non-keyword params, you probably want a comment next to each argument, but the arguments are already in parens.) From abarnert at yahoo.com Thu Jul 4 00:22:44 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jul 2013 15:22:44 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> Message-ID: <97689389-7EA2-4C97-AC50-2A11322F285E@yahoo.com> On Jul 3, 2013, at 15:02, Joshua Landau wrote: > Also, if you're summing linked lists, why would it be O(n**2) > performance? Surely it's still O(n), given the new implementation? Not for single-linked lists, where extend is O(n). (Of course maybe this is a good thing--it's a perfect opportunity for someone to show whoever wrote that code how to actually write his algorithm in python rather than trying to use python as lisp without even knowing lisp...) From shane at umbrellacode.com Thu Jul 4 00:48:10 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 3 Jul 2013 15:48:10 -0700 Subject: [Python-ideas] Is this PEP-able? for X in ListY while conditionZ: In-Reply-To: References: <001c01ce73d8$4884dc80$d98e9580$@biologie.uni-freiburg.de> <51CF03B0.8080508@pearwood.info> <0C35164B-A9A1-4A78-8474-99166337BE50@umbrellacode.com> Message-ID: Thank you, you are exactly right about what I meant, and yes, there is the pathological extreme?like using a semicolon to terminate every line in a Python program. The approach exhibits a Pythonnistic consistency, IMHO. The break terminator isn?t a special case, semi colons are optional just like they are for terminating statements, and the semicolons correspond functionally to where they would be had you used them to flatten the expansion onto a single line (without conversion into a comprehension). Lastly, while I think the terminating break will typically be one of the easier components of a comprehension to parse, I have come across some list comprehensions that were challenging to parse visually. If my preferred approach to make them more parsable were semicolons, then applying them at several spots in the might be helpful to achieving that goal. My personal preference is likely to be to break apart the comprehension across lines or, if still shorter than 80 cols, use an extra space or two between sub-statements. On Jul 2, 2013, at 8:05 PM, Joshua Landau wrote: > On 1 July 2013 16:21, Shane Green wrote: > > > > Having a bit of a hard time following the status of this as I dropped out for a while, but I?m not really for the semi-colon separator approach. Of all the options already available in list comprehensions, this one actually seems to be one of the most easily because it starts with a keyword and ends at the end. Not that syntax highlighting should be taken into account in general, it?s worth noting the syntax highlighted version really makes this distinction quite clear: > > > That's somewhat convincing. I'm still not convinced we need new syntax though. \ > > > One thing I could see doing would be to allow semi-colons anywhere in a list comprehension that?s a boundary between statements of the expanded form. > > > > Then they behave just like the optional semi-colons you could put at the end of a line. > > > > Sorry if that?s precisely what?s been promoted. . > > So (I'm just adding examples to your idea): > > [item for sublist in data for item in sublist; if item] > > can become > > [item; for sublist in data; for item in sublist; if item] > > > In a pathological case this is a transform from: > > [item for sublist in data if sublist for item in sublist if item%3 if item-1] > > to > > [item for sublist in data; if sublist; for item in sublist; if item%3; if item-1] > > But you can always do: > > [item > for sublist in data if sublist > for item in sublist if item%3 > if item-1] > > > That said, it's not like people *do* those sorts of pathological expressions, is it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From paddy3118 at gmail.com Wed Jul 3 22:50:35 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Wed, 3 Jul 2013 13:50:35 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 Message-ID: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> I found myself repeating something that I know I have used before, several times: I get two sets of results, may be sets of the passing tests when a design has changed, and I need to work out what has changed so work out 1. What passed first time round 2. What passed both times. 3. What passed only the second time round. I usually use something like the set equations in the title to do this but I recognise that this requires both sets to be traversed at least three times which seems wasteful. I wondered if their was am algorithm to partition the two sets of data into three as above, but cutting down on the number of set traversals? I also wondered that if such an algorithm existed, would it be useful enough to be worth incorporating into the Python library? Maybe defined as: exclusively1, common, exclusively2 = set1.partition(set2) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Thu Jul 4 01:57:46 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 03 Jul 2013 18:57:46 -0500 Subject: [Python-ideas] Parenthesized Compound With Statement In-Reply-To: References: <20130703110147.4c6a8be4@anarchist> Message-ID: On 07/03/2013 04:39 PM, Guido van Rossum wrote: > > Maybe you're talking about this case: > > instructions = [ > 'foo', > 'bar', > 'very long line' # comma intentionally missing > ' which is continued here', > 'not so long line' # comma accidentally missing > 'accidentally continued', > 'etc.', > ] > > I brought this up a few weeks or months ago, hoping to convince people that > implicit string concatenation is evil enough to ban. I didn't get very far. I'm +1 for that change BTW. It falls under the management category of how difficult it can be to get people to want to change how they think. > But allowing backslashes here won't accomplish anything, because they are > redundant. (In fact I get really mad when people use backslashes for > continuation inside parens/brackets/braces. :-) But you did apparently feel the need to put verbose comments there to clarify the example. ;) Yes, the backslashes are redundant as far as the compiler goes, but they serve as hints to the human reader, (in the same way your comments did.), so I don't think they are completely redundant. New users would probably find the following easier to read because they don't yet think in terms of what the CPU is actually doing or not doing. That probably takes a few years of programming to reach that point. instructions = [ 'foo', 'bar', 'very long line' \ ' which is continued here', 'not so long line' \ ' accidentally continued', 'etc.', ] This doesn't bother me, each line has a meaningful ending to the reader, and there is no need for comments to make it clearer. And the number of backslashes is not excessive. I'm neutral on this, not need to recommend, or discourage it. It helps some people while, others find it unnecessary. It's a human thing, the compiler doesn't care. But this ... instructions = ( 'very, ' \ 'long, ' \ 'line with commas in, ' \ 'it to, ' \ 'seperate individual, ' \ 'items, ' \ 'etc, ' \ ) Now, this gets under my skin! This is definitely redundant. The missing comma case I saw was in a list several screen pages long with only a few lines continued, and one missing comma. I just can't recall which file it is. Meant to put it in the tracker, and still do once I find it again. I think I saw it in one of the encoding files. Cheers, Ron From joshua.landau.ws at gmail.com Thu Jul 4 02:00:37 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 4 Jul 2013 01:00:37 +0100 Subject: [Python-ideas] Fwd: exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: ---------- Forwarded message (apologies, the CC says it all) ---------- From: Joshua Landau Date: 4 July 2013 00:57 Subject: Re: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 To: Paddy3118 Cc: python-ideas at googlegroups.com On 3 July 2013 21:50, Paddy3118 wrote: > I found myself repeating something that I know I have used before, several > times: I get two sets of results, may be sets of the passing tests when a > design has changed, and I need to work out what has changed so work out > > 1. What passed first time round > 2. What passed both times. > 3. What passed only the second time round. > > I usually use something like the set equations in the title to do this but I > recognise that this requires both sets to be traversed at least three times > which seems wasteful. As far as I understand, this requires only 3 traversals in total. > I wondered if their was am algorithm to partition the two sets of data into > three as above, but cutting down on the number of set traversals? You could cut it down to two, AFAIK. This seems like a minor gain. > I also wondered that if such an algorithm existed, would it be useful enough > to be worth incorporating into the Python library? > > Maybe defined as: > > exclusively1, common, exclusively2 = set1.partition(set2) Something more useful, as it's just as good in this case could be: set1.partition(set2) === set1 - set2, set1 & set 2 similarly to how we have divmod. From python at mrabarnett.plus.com Thu Jul 4 02:15:32 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 04 Jul 2013 01:15:32 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: <51D4BEA4.1090509@mrabarnett.plus.com> On 03/07/2013 21:50, Paddy3118 wrote: > I found myself repeating something that I know I have used before, > several times: I get two sets of results, may be sets of the passing > tests when a design has changed, and I need to work out what has changed > so work out > > 1. What passed first time round > 2. What passed both times. > 3. What passed only the second time round. > > I usually use something like the set equations in the title to do this > but I recognise that this requires both sets to be traversed at least > three times which seems wasteful. > > I wondered if their was am algorithm to partition the two sets of data > into three as above, but cutting down on the number of set traversals? > > I also wondered that if such an algorithm existed, would it be useful > enough to be worth incorporating into the Python library? > > Maybe defined as: > > exclusively1, common, exclusively2 = set1.partition(set2) > How common is this need? I suspect that it's not that common. As to "seems wasteful", is it a performance bottleneck? I also suspect not (you even say "seems"! :-)). From oscar.j.benjamin at gmail.com Thu Jul 4 02:16:07 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 4 Jul 2013 01:16:07 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: On 3 July 2013 21:50, Paddy3118 wrote: > I found myself repeating something that I know I have used before, several > times: I get two sets of results, may be sets of the passing tests when a > design has changed, and I need to work out what has changed so work out > > 1. What passed first time round > 2. What passed both times. > 3. What passed only the second time round. > > I usually use something like the set equations in the title to do this but I > recognise that this requires both sets to be traversed at least three times > which seems wasteful. > > I wondered if their was am algorithm to partition the two sets of data into > three as above, but cutting down on the number of set traversals? You can do it in one traversal of each set: def partition(setx, sety): xonly, xandy, yonly = set(), set(), set() for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: for val in set1: if val in set2: xandy.add(val) else: setn.add(val) return xonly, xandy, yonly Oscar From apalala at gmail.com Thu Jul 4 03:41:21 2013 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Wed, 3 Jul 2013 21:11:21 -0430 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: On Wed, Jul 3, 2013 at 7:46 PM, Oscar Benjamin wrote: > def partition(setx, sety): > xonly, xandy, yonly = set(), set(), set() > for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: > for val in set1: > if val in set2: > xandy.add(val) > else: > setn.add(val) > return xonly, xandy, yonly > I don't understand why that can be more efficient that using the built-in operations: def partition(setx, sety): common = setx & sety return setx - common, common, sety - common I assume that the built-in operations traverse over the set with the smallest size, and preallocate the result to that size. -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jul 4 04:10:28 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 04 Jul 2013 12:10:28 +1000 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: <51D4D994.7020000@pearwood.info> Hi Paddy, long time since I've read something from you, welcome back. On 04/07/13 06:50, Paddy3118 wrote: > I found myself repeating something that I know I have used before, several > times: I get two sets of results, may be sets of the passing tests when a > design has changed, and I need to work out what has changed so work out > > 1. What passed first time round > 2. What passed both times. > 3. What passed only the second time round. > > I usually use something like the set equations in the title to do this but > I recognise that this requires both sets to be traversed at least three > times which seems wasteful. > > I wondered if their was am algorithm to partition the two sets of data into > three as above, but cutting down on the number of set traversals? You could go from: exclusively1 = set1 - set2 common = set1 & set2 exclusively2 = set2 - set1 (which I expect ends up traversing each set twice) to this: common, exclusively1, exclusively2 = set(), set(), set() for item in set1: if item in set2: common.add(item) else: exclusively1.add(item) for item in set2: if item in set1: common.add(item) else: exclusively2.add(item) which only does two set traversals. But I would expect that using the set operators would be faster. (Iterating over the sets 3 or 4 times in C is likely to be faster than iterating over them 2 times in pure Python.) > I also wondered that if such an algorithm existed, would it be useful > enough to be worth incorporating into the Python library? > > Maybe defined as: > > exclusively1, common, exclusively2 = set1.partition(set2) [bikeshed] I would expect common, only1, only2 in that order. Does any other language with sets offer this as a set primitive? I too have needed it often enough that I'd be +1 on adding it. -- Steven From steve at pearwood.info Thu Jul 4 04:33:47 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 04 Jul 2013 12:33:47 +1000 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: <51D4DF0B.4060604@pearwood.info> On 04/07/13 11:41, Juancarlo A?ez wrote: > On Wed, Jul 3, 2013 at 7:46 PM, Oscar Benjamin > wrote: > >> def partition(setx, sety): >> xonly, xandy, yonly = set(), set(), set() >> for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: >> for val in set1: >> if val in set2: >> xandy.add(val) >> else: >> setn.add(val) >> return xonly, xandy, yonly >> > > > I don't understand why that can be more efficient that using the built-in > operations: > > def partition(setx, sety): > common = setx & sety > return setx - common, common, sety - common > > > I assume that the built-in operations traverse over the set with the > smallest size, and preallocate the result to that size. I don't think you can preallocate sets (or dicts) that way, since the offset of each item can depend on what items were already added in which order. So you can't know how many gaps to leave between items ahead of time, or where to put them. The obvious algorithm for set1 - set2 would be: def diff(set1, set2): result = set() for item in set1: if item not in set2: result.add(item) return result Are you suggesting this instead? def diff(set1, set2): if len(set1) <= len(set2): result = set() for item in set1: if item not in set2: result.add(item) else: result = set1.copy() for item in set2: if item in set1: result.remove(item) return result Either way, you still have to traverse set1 in full, either explicitly, or when you copy it. The overhead of copying the set when you might end up removing all the items suggests to me that there is no benefit in iterating over the smaller set. Likewise, set2 - set1 has to traverse set2 in full, regardless of which is smaller. set1 & set2 has to check every element of both sets. So that's four traversals in total, each set getting traversed twice. Oscar's version above only traverses each set once, making two in total. However, being in pure Python, it's probably an order of magnitude slower than calling the C set methods. That's my guess, I leave it to someone else to actually time the code and see :-) -- Steven From steve at pearwood.info Thu Jul 4 05:35:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 04 Jul 2013 13:35:15 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> Message-ID: <51D4ED73.7010204@pearwood.info> On 04/07/13 08:02, Joshua Landau wrote: > On 3 July 2013 18:58, Steven D'Aprano wrote: >> On 04/07/13 02:58, Joshua Landau wrote: >> >>> What's wrong with sum, if sum is fast? >> >> sum simply cannot *always* be fast. E.g. summing tuples will still be slow >> even with your suggestion. >> >> Using sum() on anything other than numbers is conceptually dubious; yes, >> sum() is intended for addition, but it's intended for *numeric* addition. > > Sum: A quantity obtained by addition or aggregation. > [http://en.wiktionary.org/wiki/sum] > > I don't see why it should be constrained. I didn't say is should be constrained. But there is a big difference between "don't prevent people calling sum() on lists, if they insist" and "encourage people to use sum() on lists, in preference to list.extend". >> It's unfortunate that Python uses + for concatenation, > > Is it? I'm very happy with it myself. Yes. If Python used & for concatenation, we wouldn't have to worry about sum(lists or strings or tuples) being quadratic, because people wouldn't call sum on lists, strings or tuples. >> we're stuck with it >> until Python 4000, but if you're using sum() to add lists, tuples, or other >> non-numbers, you're living in a state of sin. > > Why? There are a lot of things it makes sense to take the sum of - a > lot of which have constant-ish performance on addition. A "lot" of things? There are numbers (int, float, Decimal, Fraction, complex), and numpy arrays of numbers. And you probably shouldn't call sum on floats if you care about precision. (Use math.fsum instead.) Strings? Quadratic. Tuples? Quadratic. Lists? Currently quadratic, could be optimized, if we care. I can't think of anything else that supports + and has near-constant time repeated addition. What are you thinking of? >> (A venal sin rather than a >> mortal sin.) It's allowed, but we shouldn't encourage it, and treating sum() >> as if it were the One Obvious Way to concatenate data is, in my strong >> opinion, the wrong thing to do. > > No, no. A fast sum() is not the one obvious way to concatenate data, The Original Poster seems to think it is. > much as sum() is not the one obvious way to add data. It just means > that if conceptually you're looking for the sum, you'd be able to do > it without shooting yourself in the foot. > > I'd also point out there are a *lot* of times when iadd is faster than > add, not just contancation. Examples please. >> In my opinion, the One Obvious Way to concatenate a lot of arbitrary data is >> list.extend, not sum. > > Is it? Maybe for lists. Often itertools.chain is better. It really > depends on your data. I tend to use itertools.chain a lot more than > list.extend, because I sum iterators more than I sum lists. Maybe I'm > just weird. I didn't say *the Best Way*, I said *the One Obvious Way*. The obvious way to have a lot of data of some arbitrary type is a list; the obvious way to concatenate a whole lot of lists into one list is using the extend method. If you have some special type or some special need, you may be able to do better. > I can't remember the last time I had to sum tuples. And I can't remember the last time I had to sum lists. Optimizing this case will solve exactly none of my problems. > Also, if you're summing linked lists, why would it be O(n**2) > performance? Surely it's still O(n), given the new implementation? Actually, more like O(n*m). You have n linked lists each with an average of m nodes, you have to copy each node, so that's O(n*m). Sorry, simplifying n*m to n**2 is sloppy. -- Steven From vito.detullio at gmail.com Thu Jul 4 06:32:55 2013 From: vito.detullio at gmail.com (Vito De Tullio) Date: Thu, 04 Jul 2013 06:32:55 +0200 Subject: [Python-ideas] Fast sum() for non-numbers References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <51D4ED73.7010204@pearwood.info> Message-ID: Steven D'Aprano wrote: >>> It's unfortunate that Python uses + for concatenation, >> >> Is it? I'm very happy with it myself. > > Yes. If Python used & for concatenation, we wouldn't have to worry about > sum(lists or strings or tuples) being quadratic, because people wouldn't > call sum on lists, strings or tuples. yeah, they would try with all() :D -- ZeD From paddy3118 at gmail.com Thu Jul 4 07:22:36 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Wed, 3 Jul 2013 22:22:36 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51D4D994.7020000@pearwood.info> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> Message-ID: <56bef5c4-a308-4a1f-873d-b40ea73e0ca8@googlegroups.com> (Thanks Steven for the welcome back. I've been advocating Python through the Rosetta Code site). I couldn't think of Oscar and your solution but now that we have a Python algorithm, I understand that it could only be really tested if there were an implementation in C to test against the incumbent method. To give a bit more info on how much it is used: I have even incorporated the partition-type functionality in Python in throw-away bash shell scripts of the form: munge.sh data1 | sort > results1.txt munge.sh data2 | sort > results2 .txt python -c ' set1 = set(file("results1.txt")) set2 = set(file("results2.txt")) print([len(p) for p in (set1 - set2, set1 & set2, set2 - set1)]) ' I have used the above several times when my boss touches base with me and asks the informal "hi, and how are you getting on"? The above isn't my only use. I have used the construct enough times for me to think of it as a "design pattern". Before I posted, I looked up set partition on Wikipedia. It has a different meaning for a partition of a set, but doesn't apply any meaning to the partition of two sets leaving the dyadic meaning of partition vacant :-) - Paddy. On Thursday, 4 July 2013 03:10:28 UTC+1, Steven D'Aprano wrote: > > Hi Paddy, long time since I've read something from you, welcome back. > > On 04/07/13 06:50, Paddy3118 wrote: > > I found myself repeating something that I know I have used before, > several > > times: I get two sets of results, may be sets of the passing tests when > a > > design has changed, and I need to work out what has changed so work out > > > > 1. What passed first time round > > 2. What passed both times. > > 3. What passed only the second time round. > > > > I usually use something like the set equations in the title to do this > but > > I recognise that this requires both sets to be traversed at least three > > times which seems wasteful. > > > > I wondered if their was am algorithm to partition the two sets of data > into > > three as above, but cutting down on the number of set traversals? > > You could go from: > > exclusively1 = set1 - set2 > common = set1 & set2 > exclusively2 = set2 - set1 > > (which I expect ends up traversing each set twice) to this: > > > common, exclusively1, exclusively2 = set(), set(), set() > for item in set1: > if item in set2: common.add(item) > else: exclusively1.add(item) > for item in set2: > if item in set1: common.add(item) > else: exclusively2.add(item) > > > which only does two set traversals. But I would expect that using the set > operators would be faster. (Iterating over the sets 3 or 4 times in C is > likely to be faster than iterating over them 2 times in pure Python.) > > > > I also wondered that if such an algorithm existed, would it be useful > > enough to be worth incorporating into the Python library? > > > > Maybe defined as: > > > > exclusively1, common, exclusively2 = set1.partition(set2) > > [bikeshed] > I would expect common, only1, only2 in that order. > > > Does any other language with sets offer this as a set primitive? I too > have needed it often enough that I'd be +1 on adding it. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python... at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Thu Jul 4 07:27:04 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 4 Jul 2013 06:27:04 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D4ED73.7010204@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <51D4ED73.7010204@pearwood.info> Message-ID: On 4 July 2013 04:35, Steven D'Aprano wrote: > On 04/07/13 08:02, Joshua Landau wrote: >> On 3 July 2013 18:58, Steven D'Aprano wrote: >>> On 04/07/13 02:58, Joshua Landau wrote: >>>> >> Why? There are a lot of things it makes sense to take the sum of - a >> lot of which have constant-ish performance on addition. > > A "lot" of things? ... > I can't think of anything else that supports + and has near-constant time > repeated addition. What are you thinking of? ... >> I'd also point out there are a *lot* of times when iadd is faster than >> add, not just contancation. > > Examples please. I'm going to retreat from this point as my argument is weaker than I thought it was. Mostly it comes from an internal delusion that I overload operators more than I really do -- when I do it's normally not for a final product. Since most of the following argument is based around this, I put this at the top. To be clear, I don't feel I'm wrong -- but I feel I'm less right than you. >> I don't see why it should be constrained. > > I didn't say is should be constrained. But there is a big difference between > "don't prevent people calling sum() on lists, if they insist" and "encourage > people to use sum() on lists, in preference to list.extend". I meant constrained in a more general term. Regardless of the OP's intent, I don't think we need to encourage sum() on lists if the patch goes through. I bet you a lot of people still do it though. >>> It's unfortunate that Python uses + for concatenation, >> >> Is it? I'm very happy with it myself. > > Yes. If Python used & for concatenation, we wouldn't have to worry about > sum(lists or strings or tuples) being quadratic, because people wouldn't > call sum on lists, strings or tuples. Optimising for the people who don't know what they're doing seems kind of backwards. I like "+" to add lists. I also think "data & [elem]" looks stupid. That's possibly just me, but it's true for me nonetheless. >> A fast sum() is not the one obvious way to concatenate data, > > The Original Poster seems to think it is. I think he's wrong. >>> In my opinion, the One Obvious Way to concatenate a lot of arbitrary data >>> is list.extend, not sum. >> >> Is it? Maybe for lists. Often itertools.chain is better. It really >> depends on your data. I tend to use itertools.chain a lot more than >> list.extend, because I sum iterators more than I sum lists. Maybe I'm >> just weird. > > I didn't say *the Best Way*, I said *the One Obvious Way*. The obvious way > to have a lot of data of some arbitrary type is a list; the obvious way to > concatenate a whole lot of lists into one list is using the extend method. > If you have some special type or some special need, you may be able to do > better. I know what you said, and I raised a disagreement. The OOW to have a lot of data of some arbitrary type is context-dependent. Sometimes it's an iterator. Sometimes it's a list. Sometimes it's a set. Sometimes it's deque (OK, pushing it). From paddy3118 at gmail.com Thu Jul 4 07:32:20 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Wed, 3 Jul 2013 22:32:20 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: P.S. A big thank you to the Python developers. You're appreciated! -------------- next part -------------- An HTML attachment was scrubbed... URL: From paddy3118 at gmail.com Thu Jul 4 07:41:37 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Wed, 3 Jul 2013 22:41:37 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: <55316229-1e99-43b5-8b14-9865b77c2124@googlegroups.com> Thanks Oscar. That is quite elegant. -------------- next part -------------- An HTML attachment was scrubbed... URL: From __peter__ at web.de Thu Jul 4 09:14:00 2013 From: __peter__ at web.de (Peter Otten) Date: Thu, 04 Jul 2013 09:14 +0200 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: Oscar Benjamin wrote: > On 3 July 2013 21:50, Paddy3118 > wrote: >> I found myself repeating something that I know I have used before, >> several times: I get two sets of results, may be sets of the passing >> tests when a design has changed, and I need to work out what has changed >> so work out >> >> 1. What passed first time round >> 2. What passed both times. >> 3. What passed only the second time round. >> >> I usually use something like the set equations in the title to do this >> but I recognise that this requires both sets to be traversed at least >> three times which seems wasteful. >> >> I wondered if their was am algorithm to partition the two sets of data >> into three as above, but cutting down on the number of set traversals? > > You can do it in one traversal of each set: > > def partition(setx, sety): > xonly, xandy, yonly = set(), set(), set() > for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: > for val in set1: > if val in set2: > xandy.add(val) > else: > setn.add(val) > return xonly, xandy, yonly The price you pay for the symmetry is that for the second iteration of the outer loop xandy.add(val) is a noop. When you give up that symmetry you can replace one inner loop with a set operation: def partition(old, new): removed = old - new common = set() added = set() for item in new: if item in old: common.add(item) else: added.add(item) return removed, common, added From haoyi.sg at gmail.com Thu Jul 4 11:50:08 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 4 Jul 2013 17:50:08 +0800 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <51D4ED73.7010204@pearwood.info> Message-ID: Not just you! I like using + for lists too! I don't see at all how '&' is any better than '+' for concatenating strings. I don't have any perf numbers to back me up, but I don't agree with the "let's leave something with poor performance in the builtins so we can encourage people not to use them like that" point. I mean, if people really shouldn't use it like that, it should throw an error. If they should use it, it should work as best it can. Having quietly, unnecessarily-O(n^2)-performing builtins just "for teaching purposes" seems absolute madness. There is a place for intentionally-crippled teaching-code and it's not in the standard library's builtins. Of course, without any perf numbers, I'm just speaking in hypotheticals. On Thu, Jul 4, 2013 at 1:27 PM, Joshua Landau wrote: > On 4 July 2013 04:35, Steven D'Aprano wrote: > > On 04/07/13 08:02, Joshua Landau wrote: > >> On 3 July 2013 18:58, Steven D'Aprano wrote: > >>> On 04/07/13 02:58, Joshua Landau wrote: > >>>> > > >> Why? There are a lot of things it makes sense to take the sum of - a > >> lot of which have constant-ish performance on addition. > > > > A "lot" of things? > ... > > I can't think of anything else that supports + and has near-constant time > > repeated addition. What are you thinking of? > ... > >> I'd also point out there are a *lot* of times when iadd is faster than > >> add, not just contancation. > > > > Examples please. > > I'm going to retreat from this point as my argument is weaker than I > thought it was. Mostly it comes from an internal delusion that I > overload operators more than I really do -- when I do it's normally > not for a final product. Since most of the following argument is based > around this, I put this at the top. > > To be clear, I don't feel I'm wrong -- but I feel I'm less right than you. > > > >> I don't see why it should be constrained. > > > > I didn't say is should be constrained. But there is a big difference > between > > "don't prevent people calling sum() on lists, if they insist" and > "encourage > > people to use sum() on lists, in preference to list.extend". > > I meant constrained in a more general term. Regardless of the OP's > intent, I don't think we need to encourage sum() on lists if the patch > goes through. I bet you a lot of people still do it though. > > >>> It's unfortunate that Python uses + for concatenation, > >> > >> Is it? I'm very happy with it myself. > > > > Yes. If Python used & for concatenation, we wouldn't have to worry about > > sum(lists or strings or tuples) being quadratic, because people wouldn't > > call sum on lists, strings or tuples. > > Optimising for the people who don't know what they're doing seems kind > of backwards. I like "+" to add lists. > > I also think "data & [elem]" looks stupid. That's possibly just me, > but it's true for me nonetheless. > > >> A fast sum() is not the one obvious way to concatenate data, > > > > The Original Poster seems to think it is. > > I think he's wrong. > > >>> In my opinion, the One Obvious Way to concatenate a lot of arbitrary > data > >>> is list.extend, not sum. > >> > >> Is it? Maybe for lists. Often itertools.chain is better. It really > >> depends on your data. I tend to use itertools.chain a lot more than > >> list.extend, because I sum iterators more than I sum lists. Maybe I'm > >> just weird. > > > > I didn't say *the Best Way*, I said *the One Obvious Way*. The obvious > way > > to have a lot of data of some arbitrary type is a list; the obvious way > to > > concatenate a whole lot of lists into one list is using the extend > method. > > If you have some special type or some special need, you may be able to do > > better. > > I know what you said, and I raised a disagreement. The OOW to have a > lot of data of some arbitrary type is context-dependent. Sometimes > it's an iterator. Sometimes it's a list. Sometimes it's a set. > Sometimes it's deque (OK, pushing it). > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Thu Jul 4 11:54:19 2013 From: sergemp at mail.ru (Sergey) Date: Thu, 4 Jul 2013 12:54:19 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D46647.3080100@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> Message-ID: <20130704125419.6230332d@sergey> On Jul 04, 2013 Steven D'Aprano wrote: This message is long, so here's its short summary: * Unfortunately list.extend does not look like the obvious way, and its slower than alternatives. * Original patch reduces memory, not increases it, there can be no MemoryError because of it. * sum() can *always* be fast! (patch and tests) * linked list is O(n) where n is number of lists to add * using __add__/__iadd__ for containers is a useful feature > How about the "Obvious Way" to concatenate lists? > new = [] > for x in seq: > new.extend(x) 200% slower than patched sum, 50-100% slower than both itertools and 25% faster than list comprehension. [1] It's basically not even mentioned among most popular answers to list flattening: http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python Slower, less known, it's much more characters to type, hey, it's not even one-liner! :) What makes you think this is the "Obvious Way"? > -0 on the general idea, -1 on the specific implementation. I'd rather have > sum of lists be slow than risk sum of numbers raise MemoryError. You must be misunderstanding something. Or maybe I've explained it poorly. Numbers have different code path in sum() that my patch does not touch. But even if it did, my patch never makes a copy of original list, it may only reduce amount of memory used, not increase it. There was an alternative idea (that I never implemented), suggesting to make a copy of `start` variable, but not the list. > sum simply cannot *always* be fast. E.g. summing tuples will still > be slow even with your suggestion. Yes, it can! That's the point of the original idea! The original patch [2] optimizes lists, because it was easy to do. But nothing stops you from optimizing other (two?) types. For example this patch [3] optimizes lists, tuples and strings. Basically it works by storing temporary result in a list, and then converts it to tuple or string in one shot if needed. Using this patch you get the O(n) sum for lists, tuples and strings [4]. And combining it with original patch you get the fastest sum() possible. Even being O(n) for strings, it's slower than ''.join(), but it is constantly slower now. I can't beat ''.join() because of function call overhead. Internally join() converts the sequence into tuple, thus saving a lot of calls, but using a lot of additional memory, that's why: ''.join('' for _ in xrange(100000000)) eats ~1GB of RAM before giving you an empty string. > I can't gather any enthusiasm for optimizing sum to speed up > concatenation. No problem, just tell what approach you think is the best and I'll try implementing it. > I'm hostile to any attempt to encourage people to treat sum() as > the preferred way to concatenate large amounts of data, because > that will surely lead them into bad habits and will end with them > trying to sum() a lot of tuples or linked lists or something and > getting O(n**2) performance. Why not just make sum O(n) for everything? I've already done that for lists, tuples and strings. As for linked lists, the point of linked list is to insert items fast. So any decent implementation of it should store a pointer to its head and tail, should implement a O(1) __iadd__ using tail pointer, and thus falls under my first patch. There's not much sence in [single] linked list if it has no __iadd__. > Yes. If Python used & for concatenation, we wouldn't have to worry > about sum(lists or strings or tuples) being quadratic, because people > wouldn't call sum on lists, strings or tuples. Heh. If Python had no sum() we wouldn't have to worry about people using it. If Python had no lists we wouldn't have to worry about people concatenating them. If there was no Python we wouldn't have to worry at all. But the world would be poor without all these great things... Seriously, I miss add for set(). I needed it when I had a dictionary like {x:set(...), ...} and needed a set of all the values from it. I wanted to use sum(dict.values()), that would be easy and obvious, but I couldn't, because set() does not support __add__. So I had to write a few lines of loop instead of a few characters. :( -- [1] Python 2.7.5 with patch, testing list.extend(): $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" 1000 loops, best of 3: 531 usec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "[i for l in x for i in l]" 1000 loops, best of 3: 1.94 msec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools import chain" "list(chain.from_iterable(x))" 1000 loops, best of 3: 820 usec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools import chain" "list(chain(*x))" 1000 loops, best of 3: 1.03 msec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "l = []" "for i in x: l.extend(i)" 1000 loops, best of 3: 1.53 msec per loop [2] http://bugs.python.org/file30705/fastsum.patch [3] http://bugs.python.org/file30769/fastsum-special.patch http://bugs.python.org/issue18305#msg192281 [4] Python 2.7.5, testing new fastsum-special.patch: === Lists === No patch: $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" 10 loops, best of 3: 885 msec per loop fastsum.patch: $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" 1000 loops, best of 3: 524 usec per loop fastsum-special.patch: $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" 1000 loops, best of 3: 298 usec per loop Result: 3000 times faster. === Tuples === No patch: $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())" 10 loops, best of 3: 585 msec per loop fastsum.patch: $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())" 10 loops, best of 3: 585 msec per loop fastsum-special.patch: $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())" 1000 loops, best of 3: 536 usec per loop Result: 1000 times faster. === Strings === No patch (just string check removed): $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')" 10 loops, best of 3: 1.52 sec per loop fastsum.patch (+ string check removed): $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')" 10 loops, best of 3: 1.52 sec per loop fastsum-special.patch $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')" 10 loops, best of 3: 27.8 msec per loop join: $ ./python -mtimeit --setup="x=['abc']*100000" "''.join(x)" 1000 loops, best of 3: 1.66 msec per loop Result: 50 times faster, but still constantly slower than join. From haoyi.sg at gmail.com Thu Jul 4 12:01:45 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 4 Jul 2013 18:01:45 +0800 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130704125419.6230332d@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> Message-ID: Random thought: this kind of "different implementations for different types" seems exactly what PEP443 (http://www.python.org/dev/peps/pep-0443/) is about; it would save you having the nasty big chunk of if-elses within sum() itself, and would let other people incrementally implement special sums for their own special data types without having to muck with the std lib code. On Thu, Jul 4, 2013 at 5:54 PM, Sergey wrote: > On Jul 04, 2013 Steven D'Aprano wrote: > > This message is long, so here's its short summary: > * Unfortunately list.extend does not look like the obvious way, and > its slower than alternatives. > * Original patch reduces memory, not increases it, there can be no > MemoryError because of it. > * sum() can *always* be fast! (patch and tests) > * linked list is O(n) where n is number of lists to add > * using __add__/__iadd__ for containers is a useful feature > > > How about the "Obvious Way" to concatenate lists? > > new = [] > > for x in seq: > > new.extend(x) > > 200% slower than patched sum, 50-100% slower than both itertools and > 25% faster than list comprehension. [1] It's basically not even mentioned > among most popular answers to list flattening: > > http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python > > http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python > Slower, less known, it's much more characters to type, hey, it's not > even one-liner! :) What makes you think this is the "Obvious Way"? > > > -0 on the general idea, -1 on the specific implementation. I'd rather > have > > sum of lists be slow than risk sum of numbers raise MemoryError. > > You must be misunderstanding something. Or maybe I've explained it > poorly. Numbers have different code path in sum() that my patch does > not touch. But even if it did, my patch never makes a copy of original > list, it may only reduce amount of memory used, not increase it. > There was an alternative idea (that I never implemented), suggesting > to make a copy of `start` variable, but not the list. > > > sum simply cannot *always* be fast. E.g. summing tuples will still > > be slow even with your suggestion. > > Yes, it can! That's the point of the original idea! > > The original patch [2] optimizes lists, because it was easy to do. > But nothing stops you from optimizing other (two?) types. For example > this patch [3] optimizes lists, tuples and strings. > > Basically it works by storing temporary result in a list, and then > converts it to tuple or string in one shot if needed. > > Using this patch you get the O(n) sum for lists, tuples and strings > [4]. And combining it with original patch you get the fastest sum() > possible. > > Even being O(n) for strings, it's slower than ''.join(), but it is > constantly slower now. I can't beat ''.join() because of function > call overhead. Internally join() converts the sequence into tuple, > thus saving a lot of calls, but using a lot of additional memory, > that's why: > ''.join('' for _ in xrange(100000000)) > eats ~1GB of RAM before giving you an empty string. > > > I can't gather any enthusiasm for optimizing sum to speed up > > concatenation. > > No problem, just tell what approach you think is the best and I'll > try implementing it. > > > I'm hostile to any attempt to encourage people to treat sum() as > > the preferred way to concatenate large amounts of data, because > > that will surely lead them into bad habits and will end with them > > trying to sum() a lot of tuples or linked lists or something and > > getting O(n**2) performance. > > Why not just make sum O(n) for everything? I've already done that for > lists, tuples and strings. As for linked lists, the point of linked > list is to insert items fast. So any decent implementation of it > should store a pointer to its head and tail, should implement a O(1) > __iadd__ using tail pointer, and thus falls under my first patch. > There's not much sence in [single] linked list if it has no __iadd__. > > > Yes. If Python used & for concatenation, we wouldn't have to worry > > about sum(lists or strings or tuples) being quadratic, because people > > wouldn't call sum on lists, strings or tuples. > > Heh. If Python had no sum() we wouldn't have to worry about people > using it. If Python had no lists we wouldn't have to worry about > people concatenating them. If there was no Python we wouldn't have > to worry at all. But the world would be poor without all these great > things... > > Seriously, I miss add for set(). I needed it when I had a dictionary > like {x:set(...), ...} and needed a set of all the values from it. > I wanted to use sum(dict.values()), that would be easy and obvious, > but I couldn't, because set() does not support __add__. So I had > to write a few lines of loop instead of a few characters. :( > > -- > > [1] Python 2.7.5 with patch, testing list.extend(): > $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" > 1000 loops, best of 3: 531 usec per loop > > $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "[i for l in x for i in l]" > 1000 loops, best of 3: 1.94 msec per loop > > $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools > import chain" "list(chain.from_iterable(x))" > 1000 loops, best of 3: 820 usec per loop > > $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" --setup="from itertools > import chain" "list(chain(*x))" > 1000 loops, best of 3: 1.03 msec per loop > > $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "l = []" "for i in x: > l.extend(i)" > 1000 loops, best of 3: 1.53 msec per loop > > [2] http://bugs.python.org/file30705/fastsum.patch > > [3] http://bugs.python.org/file30769/fastsum-special.patch > http://bugs.python.org/issue18305#msg192281 > > [4] Python 2.7.5, testing new fastsum-special.patch: > === Lists === > No patch: > $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" > 10 loops, best of 3: 885 msec per loop > fastsum.patch: > $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" > 1000 loops, best of 3: 524 usec per loop > fastsum-special.patch: > $ ./python -mtimeit --setup="x=[[1,2,3]]*10000" "sum(x,[])" > 1000 loops, best of 3: 298 usec per loop > Result: 3000 times faster. > > === Tuples === > No patch: > $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())" > 10 loops, best of 3: 585 msec per loop > fastsum.patch: > $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())" > 10 loops, best of 3: 585 msec per loop > fastsum-special.patch: > $ ./python -mtimeit --setup="x=[(1,2,3)]*10000" "sum(x,())" > 1000 loops, best of 3: 536 usec per loop > Result: 1000 times faster. > > === Strings === > No patch (just string check removed): > $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')" > 10 loops, best of 3: 1.52 sec per loop > fastsum.patch (+ string check removed): > $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')" > 10 loops, best of 3: 1.52 sec per loop > fastsum-special.patch > $ ./python -mtimeit --setup="x=['abc']*100000" "sum(x,'')" > 10 loops, best of 3: 27.8 msec per loop > join: > $ ./python -mtimeit --setup="x=['abc']*100000" "''.join(x)" > 1000 loops, best of 3: 1.66 msec per loop > Result: 50 times faster, but still constantly slower than join. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Thu Jul 4 14:33:35 2013 From: sergemp at mail.ru (Sergey) Date: Thu, 4 Jul 2013 15:33:35 +0300 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: <20130704153335.3ae70c9e@sergey> On Jul 4, 2013 Oscar Benjamin wrote: >> I usually use something like the set equations in the title to do this but I >> recognise that this requires both sets to be traversed at least three times >> which seems wasteful. >> >> I wondered if their was am algorithm to partition the two sets of data into >> three as above, but cutting down on the number of set traversals? > > You can do it in one traversal of each set: > > def partition(setx, sety): > xonly, xandy, yonly = set(), set(), set() > for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: > for val in set1: > if val in set2: > xandy.add(val) > else: > setn.add(val) > return xonly, xandy, yonly JFYI, despite using two passes this long partition() is twice slower than simple: def partition(set1, set2): common = set1 & set2 return set1 - common, common, set2 - common That's because both functions are O(n) but the short one is just a few native calls, while long one uses lots of small calls, and thus overhead of so many function calls *significantly* exceeds time of one additional pass. Simple is better than complex. Readability counts. :) -- $ ./partition-test.py 2-lines partition: 0.5874 seconds 8-lines partition: 1.4138 seconds 9-lines partition: 0.8485 seconds #partition-test.py from time import time def partition1(set1, set2): common = set1 & set2 return set1 - common, common, set2 - common def partition2(setx, sety): xonly, xandy, yonly = set(), set(), set() for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: for val in set1: if val in set2: xandy.add(val) else: setn.add(val) return xonly, xandy, yonly def partition3(old, new): removed = old - new common = set() added = set() for item in new: if item in old: common.add(item) else: added.add(item) return removed, common, added if __name__ == "__main__": set1 = set(range(0, 3000000)) set2 = set(range(2000000,4000000)) t = time() partition1(set1, set2) print "2-lines partition:", time()-t, "seconds" t = time() partition2(set1, set2) print "8-lines partition:", time()-t, "seconds" t = time() partition3(set1, set2) print "9-lines partition:", time()-t, "seconds" From oscar.j.benjamin at gmail.com Thu Jul 4 16:15:28 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 4 Jul 2013 15:15:28 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <20130704153335.3ae70c9e@sergey> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <20130704153335.3ae70c9e@sergey> Message-ID: On 4 July 2013 13:33, Sergey wrote: > On Jul 4, 2013 Oscar Benjamin wrote: You've missed the attribution line for the OP here so I'll add it back: >> Paddy3118 wrote: >>> I usually use something like the set equations in the title to do this but I >>> recognise that this requires both sets to be traversed at least three times >>> which seems wasteful. >>> Note the question I'm responding to below. >>> I wondered if their was am algorithm to partition the two sets of data into >>> three as above, but cutting down on the number of set traversals? >> >> You can do it in one traversal of each set: >> >> def partition(setx, sety): >> xonly, xandy, yonly = set(), set(), set() >> for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: >> for val in set1: >> if val in set2: >> xandy.add(val) >> else: >> setn.add(val) >> return xonly, xandy, yonly > > JFYI, despite using two passes this long partition() is twice slower > than simple: > def partition(set1, set2): > common = set1 & set2 > return set1 - common, common, set2 - common > > That's because both functions are O(n) but the short one is just > a few native calls, while long one uses lots of small calls, and > thus overhead of so many function calls *significantly* exceeds > time of one additional pass. > > Simple is better than complex. Readability counts. :) Since two or three people seem to have misunderstood what I wrote I'll clarify that the intention was to demonstrate the existence of a single pass algorithm. I was not trying to show the fastest CPython code for this problem. As Peter Otten has pointed out the algorithm is a little redundant so I'll improve it: def partition(setx, sety): xonly, xandy, yonly = set(), set(), set() for x in setx: if x in sety: xandy.add(x) else: xonly.add(x) for y in sety: if y not in setx: yonly.add(y) return xonly, xandy, yonly If you're worried about optimal branch flow for large sets then you might consider swapping so that the first loop is over the smaller of the two sets (assuming that the for/if branch is slightly faster than the for/else branch). However the swapping difference is not really an algorithmic difference in the sense that it would be for computing only the intersection. Oscar From apalala at gmail.com Thu Jul 4 16:32:08 2013 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Thu, 4 Jul 2013 10:02:08 -0430 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> Message-ID: On Thu, Jul 4, 2013 at 5:31 AM, Haoyi Li wrote: > Random thought: this kind of "different implementations for different > types" seems exactly what PEP443 (http://www.python.org/dev/peps/pep-0443/) > is about; it would save you having the nasty big chunk of if-elses within > sum() itself, and would let other people incrementally implement special > sums for their own special data types without having to muck with the std > lib code. > Exactly. Though 443 allows dispatching only on simple type names. Python doesn't have the concept of list, or list. -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Thu Jul 4 16:33:04 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 4 Jul 2013 15:33:04 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51D4D994.7020000@pearwood.info> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> Message-ID: On 4 July 2013 03:10, Steven D'Aprano wrote: > >> I also wondered that if such an algorithm existed, would it be useful >> enough to be worth incorporating into the Python library? >> >> Maybe defined as: >> >> exclusively1, common, exclusively2 = set1.partition(set2) > > [bikeshed] > I would expect common, only1, only2 in that order. I like the other order. It reads like a horizontal Venn diagram e.g.: http://upload.wikimedia.org/wikipedia/commons/9/99/Venn0001.svg Although it probably wouldn't be the best order if the function were somehow generalised to more than two sets (not that I can think of a useful and intelligible generalisation). > Does any other language with sets offer this as a set primitive? I'd be interested to know what they call it if they do. "Partition" seems wrong to me. If a set had a partition method I would expect it give me a partition of that set i.e. a set of disjoint subsets covering the original set. This is related to the partition of the set in the sense that if xonly, xandy, yonly = set.partition(setx, sety) then {xonly, xandy} is a partition of setx and {yonly, xandy} is a partition of sety and {xonly, xandy, yonly} is a partition of the union setx | sety. I think, though, that I would expect a set.partition method to work something like true_set, false_set = setx.partition(lambda x: predicate(x)) with the obvious semantics. Then you could get a single pass algorithm for the original problem with xandy, xonly = setx.partition(sety.__contains__) yonly = sety - xandy Oscar From paddy3118 at gmail.com Thu Jul 4 16:33:34 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Thu, 4 Jul 2013 07:33:34 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <20130704153335.3ae70c9e@sergey> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <20130704153335.3ae70c9e@sergey> Message-ID: Hi Sergey, I thought that set partitioning might be a worthy candidate for implementation in C if an algorithm that cut down on the set traversals could be found. Yep, I didn't think such an algorithm in Python would likely be any quicker. I guess there are two parts to my question: 1. Does an algorithm using less passes exist? Yes, thanks. 2. Would a C implementation be a worthwhile addition to Python? - Ongoing... - Paddy. On Thursday, 4 July 2013 13:33:35 UTC+1, Sergey wrote: > > On Jul 4, 2013 Oscar Benjamin wrote: > > >> I usually use something like the set equations in the title to do this > but I > >> recognise that this requires both sets to be traversed at least three > times > >> which seems wasteful. > >> > >> I wondered if their was am algorithm to partition the two sets of data > into > >> three as above, but cutting down on the number of set traversals? > > > > You can do it in one traversal of each set: > > > > def partition(setx, sety): > > xonly, xandy, yonly = set(), set(), set() > > for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: > > for val in set1: > > if val in set2: > > xandy.add(val) > > else: > > setn.add(val) > > return xonly, xandy, yonly > > JFYI, despite using two passes this long partition() is twice slower > than simple: > def partition(set1, set2): > common = set1 & set2 > return set1 - common, common, set2 - common > > That's because both functions are O(n) but the short one is just > a few native calls, while long one uses lots of small calls, and > thus overhead of so many function calls *significantly* exceeds > time of one additional pass. > > Simple is better than complex. Readability counts. :) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Thu Jul 4 16:48:11 2013 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Thu, 4 Jul 2013 10:18:11 -0430 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <20130704153335.3ae70c9e@sergey> Message-ID: On Thu, Jul 4, 2013 at 9:45 AM, Oscar Benjamin wrote: > If you're worried about optimal branch flow for large sets then you > might consider swapping so that the first loop is over the smaller of > the two sets (assuming that the for/if branch is slightly faster than > the for/else branch). However the swapping difference is not really an > algorithmic difference in the sense that it would be for computing > only the intersection. > One of my points was (I haven't looked) that the native set operations probably do the swapping, and preallocate the result sets. Preallocation is an important performance booster, and may be part of an algorithm. In the double-loop implementation, one could try to copy() to the resulting sets, and then use remove() instead of add(). The swapping based on length is also easy to do. I guess that the point is that, in algorithm analysis, when the O() is good, like O(N), one should take a look at the constant parts part of the actual O(N*C+M*D) (for example), as they make all the difference. These kind of considerations went into Python's choice of implementation for sort(), for example. -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jul 5 00:25:12 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jul 2013 15:25:12 -0700 (PDT) Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130704125419.6230332d@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> Message-ID: <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Sergey Sent: Thursday, July 4, 2013 2:54 AM > On Jul 04, 2013 Steven D'Aprano wrote: [to avoid confusion: this is Sergey's summary, not Steven's] > This message is long, so here's its short summary: Right on some points, but you're making unwarranted assumptions for these two: > * sum() can *always* be fast! (patch and tests) > * linked list is O(n) where n is number of lists to add >> sum simply cannot *always* be fast. E.g. summing tuples will still >> be slow even with your suggestion. > > Yes, it can! That's the point of the original idea! > > The original patch [2] optimizes lists, because it was easy to do. > But nothing stops you from optimizing other (two?) types. It's not two other types, it's every type anyone ever has built or will build that doesn't have a fast __iadd__.?If I create a new type,?I can very easily make it?addable, iterable, displayable, pickleable, etc., with simple Python code. But I can't make it summable without?patching the builtin C function and recompiling the interpreter.?So, as Steven said, sum cannot always be fast. >?As for linked lists, the point of linked > list is to insert items fast. So any decent implementation of it > should store a pointer to its head and tail, should implement a O(1) > __iadd__ using tail pointer, and thus falls under my first patch. > There's not much sence in [single] linked list if it has no __iadd__. Have you never heard of Lisp or any of its successors? A cons is a single-linked list with no tail pointer and an O(N) __iadd__. The node type is the list type; it's just a head value and a next pointer to another list. This data structure various kinds of O(1) operations (including mutating operations) that other types do not. For a trivial example: ? ? >>> a = linkedlist([0, 1, 2]) ? ? >>> b = a.next ? ? >>> b.next = linkedlist([10, 20]) ? ? >>> a ? ? linkedlist([0, 1, 10, 20]) There is no way to make this work is a holds a tail pointer. The trivial design would leave a pointing at the 2, which isn't even a member of a. Making a.next be a copying operation gives the wrong answer (and, more generally, breaks every mutating operation on cons-lists) and makes?almost everything O(N) instead of O(1). Storing head and tail pointers at each node makes every mutating operation at least O(N) instead of O(1), and also means that two different lists can't share the same tail, which breaks most mutating tree operations. Storing a stack of lists that refer to each node makes everything O(N^2). Integrating that stack into the list means you have a double-linked list instead of a single-linked list. Not having a tail pointer is inherent in the cons-list data structure, and all the algorithms built for that type depend on it. >> Yes. If Python used & for concatenation, we wouldn't have to worry >> about sum(lists or strings or tuples) being quadratic, because people >> wouldn't call sum on lists, strings or tuples. > > Heh. If Python had no sum() we wouldn't have to worry about people > using it. Well, there's always the Java-ish solution of adding a Summable ABC and having it only work on matching values (or just using numbers.Number). But I don't think it's a problem that there are types that are addable and aren't (fast-)summable, or that this isn't expressed directly in the language. Python?doesn't avoid providing tools that are useful in their obvious uses, even if they can be harmful in silly uses.?In a case like this, you don't ban the screwdriver, or pile weight onto it so it's also a good hammer; you just try to make the hammer easier to spot so people don't pick up the screwdriver. If people really do need to concatenate a bunch of tuples this often, maybe the answer is to move chain to builtins.?If it's not important enough to move chain to builtins, it's probably not important enough to do anything else. ?If Python had no lists we wouldn't have to worry about > people concatenating them. If there was no Python we wouldn't have > to worry at all. But the world would be poor without all these great > things... > > Seriously, I miss add for set(). I needed it when I had a dictionary > like {x:set(...), ...} and needed a set of all the values from it. > I wanted to use sum(dict.values()), that would be easy and obvious, > but I couldn't, because set() does not support __add__. So I had > to write a few lines of loop instead of a few characters. :( So what you really want is to be able to combine things using any arbitrary operation, instead of just addition?? That's the point I was making in my earlier email: What you want is reduce. It's just like sum, but you specify the operation, and you can start with next(iterator) instead of having to specify a start value. Why is sum a builtin, while reduce isn't? Sum is usable in a pretty restricted set of cases, and in most of those cases, it's the obvious way to do it. Reduce is usable in a much broader set of cases, but in most of those cases, there's a better way to do it. Broadening the usability of sum to make it more like reduce is not a positive change. From joshua.landau.ws at gmail.com Fri Jul 5 00:54:18 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Thu, 4 Jul 2013 23:54:18 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> Message-ID: On 4 July 2013 11:01, Haoyi Li wrote: > Random thought: this kind of "different implementations for different types" > seems exactly what PEP443 (http://www.python.org/dev/peps/pep-0443/) is > about; it would save you having the nasty big chunk of if-elses within sum() > itself, and would let other people incrementally implement special sums for > their own special data types without having to muck with the std lib code. One thing I didn't understand from the original post was that this was a "special case". I don't think I'll support any special-casing without appropriate support from Python, such as with PEP443, first. From steve at pearwood.info Fri Jul 5 06:06:23 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 05 Jul 2013 14:06:23 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130704125419.6230332d@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> Message-ID: <51D6463F.3050800@pearwood.info> On 04/07/13 19:54, Sergey wrote: > On Jul 04, 2013 Steven D'Aprano wrote: > > This message is long, so here's its short summary: > * Unfortunately list.extend does not look like the obvious way, and > its slower than alternatives. [snip] I'm not going to get into a long and tedious point-by-point argument about this, but I will just re-iterate: I don't accept that using sum on non-numbers ought to be encouraged, although I don't think it should be prohibited either. So I'm at best neutral on speeding up sum for lists. However, some things do need to be said: 1) Alex Martelli is, or at least was, against using sum on non-numbers: [quote] I was responsible for designing sum and doing its first implementation in the Python runtime, and I still wish I had found a way to effectively restrict it to summing numbers (what it's really good at) and block the "attractive nuisance" it offers to people who want to "sum" lists;-). ? Alex Martelli Jun 4 '09 at 21:07 [end quote] This quote is from the Stackoverflow question you linked to earlier: http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python 2) You should not be benchmarking against 2.7, but 3.3. On my machine, using 3.3, list.extend is just as fast as using itertools, and both are faster than a list comprehension. (I have not tested your patched version of sum.) Augmented assignment is faster, but faster still is an optimized version using extend that avoids a name look-up: The tests I ran: python3.3 -mtimeit --setup="x=[[1,2,3]]*100000" "[i for l in x for i in l]" python3.3 -mtimeit --setup="x=[[1,2,3]]*100000" \ --setup="from itertools import chain" \ "list(chain.from_iterable(x))" python3.3 -mtimeit --setup="x=[[1,2,3]]*100000" \ --setup="from itertools import chain" \ "list(chain(*x))" python3.3 -mtimeit --setup="x=[[1,2,3]]*100000" --setup="l = []" \ "for i in x: l += i" python3.3 -mtimeit --setup="x=[[1,2,3]]*100000" --setup="l = []" \ "for i in x: l.extend(i)" python3.3 -mtimeit --setup="x=[[1,2,3]]*100000" --setup="l = []" \ --setup="extend=l.extend" \ "for i in x: extend(i)" with results, from slowest to fastest: list comp: 32.8 msec per loop chain: 21 msec per loop extend: 20.4 msec per loop chain_from_iterable: 19.8 msec per loop augmented assignment: 12.8 msec per loop optimized extend: 11.7 msec per loop On your machine, results may differ. -- Steven From abarnert at yahoo.com Fri Jul 5 07:18:36 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jul 2013 22:18:36 -0700 (PDT) Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D6463F.3050800@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <51D6463F.3050800@pearwood.info> Message-ID: <1373001516.10436.YahooMailNeo@web184706.mail.ne1.yahoo.com> From: Steven D'Aprano Sent: Thursday, July 4, 2013 9:06 PM > 1) Alex Martelli is, or at least was, against using sum on non-numbers: > > [quote] > I was responsible for designing sum and doing its first implementation in the > Python runtime, and I still wish I had found a way to effectively restrict it to > summing numbers (what it's really good at) and block the "attractive > nuisance" it offers to people who want to "sum" lists;-). ? Alex > Martelli Jun 4 '09 at 21:07 > [end quote] I don't know if I agree with Alex Martelli that this would be a good idea, but? if it is, there's a really easy way to do it. Just check the start parameter (if a non-default value is passed). After all, if the start value is numeric or not a sequence or whatever the rule is, presumably anything that can be added to it is also numeric enough to be summable. (If you really want to sneak in a start value that passes as numeric but knows how to add itself to a tuple and return a tuple, go for it.) We already have three checks for the builtin string types?(Python/bltinmodule.c:1950); adding a PySequence_Check(start) or PyNumber_Add(zero, start) or the C equivalent of isinstance(start, numbers.Number) or?whatever (the details are of course bikesheddable) wouldn't hurt performance. (In fact, because the three string checks only need to be done if the one is-numeric or is-not-sequence or whatever fails, it might even speed things up?but either way, not enough to matter.) From greg.ewing at canterbury.ac.nz Fri Jul 5 08:19:24 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 05 Jul 2013 18:19:24 +1200 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> Message-ID: <51D6656C.1090506@canterbury.ac.nz> Oscar Benjamin wrote: > On 4 July 2013 03:10, Steven D'Aprano wrote: > >>Does any other language with sets offer this as a set primitive? > > I'd be interested to know what they call it if they do. I think it should be called vennerate(). -- Greg From sergemp at mail.ru Fri Jul 5 08:43:41 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 5 Jul 2013 09:43:41 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <20130705094341.7c1c84de@sergey> On Jul 4, 2013 Andrew Barnert wrote: > Right on some points, but you're making unwarranted assumptions > for these two: >> * sum() can *always* be fast! (patch and tests) >> * linked list is O(n) where n is number of lists to add > It's not two other types, it's every type anyone ever has built or > will build that doesn't have a fast __iadd__.?If I create a new > type,?I can very easily make it?addable, iterable, displayable, > pickleable, etc., with simple Python code. But I can't make it > summable without?patching the builtin C function and recompiling > the interpreter. Yes, you can! You just need to implement a fast __iadd__ or __add__ for your type to make it O(n) summable. And you can always do that, can't you? If that is not obvious we can make it explicit in sum() description. E.g.: """ function:: sum(iterable[, start]) Sums *start* and the items of an *iterable* from left to right and returns the total. *start* defaults to ``0``. To make advantage of faster __iadd__ implementation sum() uses it if possible. sum() has linear time complexity for built-in types and all types providing constant-time `__iadd__` or `__add__`. For some use cases, there are good alternatives to sum(). [...] """ Of course that's assuming that patches making sum O(n) are accepted. > A cons is a single-linked list with no tail pointer and an O(N) > __iadd__. The node type is the list type; it's just a head value and > a next pointer to another list. This data structure various kinds of > O(1) operations (including mutating operations) that other types do > not. For a trivial example: > ? ? >>> a = linkedlist([0, 1, 2]) > ? ? >>> b = a.next > ? ? >>> b.next = linkedlist([10, 20]) > ? ? >>> a > ? ? linkedlist([0, 1, 10, 20]) Thank you for detailed explanation, now I understand what you mean. Personally I don't think such implementation would be very useful. (It's certainly unusual for python's ?batteries included? philosophy). What I would call a linked-list is a separate type where your nodes are just its internal representation. Something like (second link from google "python linked list"): http://alextrle.blogspot.com/2011/05/write-linked-list-in-python.html (I would also add 'length' to 'head' and 'tail' to have O(1) len). > There is no way to make this work is a holds a tail pointer. There is. You just need to separate the nodes from the list object, and turn nodes into internal details, invisible outside. So that your sample would turn into: >>> a = linkedlist([0, 1, 2]) >>> a.replacefrom(2, linkedlist([10, 20])) >>> a linkedlist([0, 1, 10, 20]) Or maybe even: >>> a = linkedlist([0, 1, 2]) >>> b = a[1:] >>> b[1:] = linkedlist([10, 20]) >>> a linkedlist([0, 1, 2]) >>> b linkedlist([1, 10, 20]) I don't like the idea that `a` implicitly changes when I change `b`. And yes, b=a[1:] would be O(N) since it would have to copy all the elements to make sure that change to one list won't affect another one. But that's my vision. > Making a.next be a copying operation gives the wrong answer (and, > more generally, breaks every mutating operation on cons-lists) and > makes?almost everything O(N) instead of O(1). Yes, you're right, this turns copy into O(N). But you can still keep other operations fast. For example to implement: a += b you only need to walk over the items of `b`. So a simple loop (and my patched sum): for x in manylists: a += x would still be O(N) where N is a total number of elements. Again, that's how *I* would implement that. But you CAN have a fast __iadd__ even for your simple a.next case! You only need to initialize `tail` before calling sum() and update it inside __iadd__, this way you get O(N) sum() where N is a total number of elements in all lists. Such __iadd__ implementation would be destructive, since it would modify all other lists pointing to those elements, but I guess this is how you wanted it do be. > If people really do need to concatenate a bunch of tuples this > often, maybe the answer is to move chain to builtins. Yes, I understand that we can restrict sum(), for example it can throw an error for non-numbers as it currently does for strings. What I can't understand is why we should restrict it if instead we can make it fast? >> Seriously, I miss add for set(). I needed it when I had a dictionary >> like {x:set(...), ...} and needed a set of all the values from it. >> I wanted to use sum(dict.values()), that would be easy and obvious, >> but I couldn't, because set() does not support __add__. So I had >> to write a few lines of loop instead of a few characters. :( > > So what you really want is to be able to combine things using any > arbitrary operation, instead of just addition?? No! I only want to make python code simple. Simple is better than complex, right? And sum() makes code simple. But sometimes it's too slow, and cannot be used. So I wanted to make sum() faster, to make even more code simple. That's all. It's just someone said, that "+" should not be used in the first place, so I answered, that missing "+" makes code complex, and provided an example about sets. From paddy3118 at gmail.com Fri Jul 5 09:09:17 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Fri, 5 Jul 2013 00:09:17 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51D6656C.1090506@canterbury.ac.nz> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> Message-ID: <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> On Friday, 5 July 2013 07:19:24 UTC+1, Greg Ewing wrote: > > Oscar Benjamin wrote: > > On 4 July 2013 03:10, Steven D'Aprano > > wrote: > > > >>Does any other language with sets offer this as a set primitive? > > > > I'd be interested to know what they call it if they do. > > I think it should be called vennerate(). > Vennerate? Cute; but we have str1.partition(str2), and set1.partition(set2) would be mathematical partition of the union of both sets returning a tuple of three elements - just like str.partition, which makes me vote for the name "partition". -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Fri Jul 5 09:41:03 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 5 Jul 2013 10:41:03 +0300 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> Message-ID: <20130705104103.2c9173d2@sergey> On Jul 4, 2013 Joshua Landau wrote: > One thing I didn't understand from the original post was that this > was a "special case". That's because it wasn't. There were too many different ideas during this thread, I'll try to collect them all together. So... Sum() is a great function. Why? Because it makes the code simple! You can always live without sum(), you can use itertools, reduce, lambdas, etc. But using sum() always makes the code easier to read. Simple Code is the goal! Faster sum() is just a tool to reach it! Sum() was designed to support different types from the beginning. For example, Tim Peters suggested to use it for the list of datetime.timedelta [1]. Unfortunately for some common types sum() is too slow. This encourages people to fallback to other solutions, write a code that is hard to read and maintain. That's why I suggest making sum a little faster. Still not the fastest thing ever, but fast enough to have a choice between "a little faster" and "easier to read". Faster sum() at least for something (e.g. lists) is already a good thing, but even the great "fast sum() for everything" goal is easy to reach. I already wrote some patches. So now there 4 suggestions/patches pending: 1. Fast sum() for lists and tuples. [2] This patch adds a special case optimisation. I wasn't going to do it that way, but sum() already have 2 special cases for ints and floats (yes, sum() already has special cases, and it has them for the last 6 years), one more special case to reach the "fast for everything" goal is not that bad. Practicality beats purity. This patch introduces no behavior changes, and may go even for python2 bugfix release. 2. Fast sum() for strings. [3] This patch is a small enhancement of #1, that makes sum() O(N) for strings. Obviously it introduces a behavior change and I do NOT suggest it for inclusion. I believe that ''.join() alternative is simple enough and don't have to be replaces with sum() for normal use cases. It is just to shows that sum() can be O(N) for strings. 3. Fast sum() for everything else. [4] This is the original patch that I suggested. It rearranges existing code so that sum() uses __iadd__ if available, making sum() fast for all built-in and custom types with fast __iadd__. Even with #1 this patch also introduces no behavior changes, and may go even for python2 bugfix release. 4. Explicit documentation of sum() complexity. If patches #1 or #3 are accepted it may be a good idea to explicitly state what sum() needs to be O(N) for user types. E.g.: """ function:: sum(iterable[, start]) Sums *start* and the items of an *iterable* from left to right and returns the total. *start* defaults to ``0``. To make advantage of faster __iadd__ implementation sum() uses it if possible. sum() has linear time complexity for built-in types and all types providing constant-time `__iadd__` or `__add__`. For some use cases, there are good alternatives to sum(). The preferred, fast way to concatenate a sequence of string is by [...] """ Patches #1 and #2 were written as an answer to "you can't make sum() fast for everything, e.g. strings and tuples". Yes, I can. :) Together these patches achieve the "fast sum() for everything" goal and make sure, that people know what is required to keep sum() fast. That's all the suggestions I have for now. Well, there's one more, but let's first decide about these. :) -- [1] http://mail.python.org/pipermail/python-dev/2003-April/034837.html [2] http://bugs.python.org/file30769/fastsum-special.patch without the string part [3] the string part of http://bugs.python.org/file30769/fastsum-special.patch [4] http://bugs.python.org/file30705/fastsum.patch From tjreedy at udel.edu Fri Jul 5 09:55:55 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 05 Jul 2013 03:55:55 -0400 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130705094341.7c1c84de@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> Message-ID: On 7/5/2013 2:43 AM, Sergey wrote: > I don't like the idea that `a` implicitly changes when I change `b`. When a and b are the same things or one is part of the other, that must happen. This is the nature of mutable structures (when all or part of a structure can be named, which should be always). Here is Andrew Barnett's example translated into Python. a = [1, [2, [3, None]]] b = a[1] b[1] = [10, [20, None]] print(a) >>> [1, [2, [10, [20, None]]]] The difference bettween that being a lisp list and a python list is that list sees len(a) == 4, which Python would say 2, but that is because lisp looks as simple nested structures as if there were flat, becuase the nesting is, in a sense, an internal detail. Here is a possible corresponding iadd. class Lisp(list): # the second member of each Lisp is either a Lisp or None. def __iadd__(self, other): if not instance(other, Lisp): raise TypeError while self[1] is not None: self = self[1] self[1] = other -- Terry Jan Reedy From bruce at leapyear.org Fri Jul 5 09:50:46 2013 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 5 Jul 2013 00:50:46 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130705094341.7c1c84de@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> Message-ID: On Jul 4, 2013 11:45 PM, "Sergey" wrote: > > > > Or maybe even: > >>> a = linkedlist([0, 1, 2]) > >>> b = a[1:] > >>> b[1:] = linkedlist([10, 20]) > >>> a > linkedlist([0, 1, 2]) > >>> b > linkedlist([1, 10, 20]) > > I don't like the idea that `a` implicitly changes when I change `b`. > And yes, b=a[1:] would be O(N) since it would have to copy all the > elements to make sure that change to one list won't affect another > one. If you don't like that don't use a linked list. That's like complaining about >>> a = [1, 2, 3] >>> b = a >>> a[1] = 4 >>> b [1, 4, 3] A linked list is not just another implementation of 'list'. --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Jul 5 09:59:55 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 5 Jul 2013 08:59:55 +0100 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: <20130705104103.2c9173d2@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> Message-ID: On 5 July 2013 08:41, Sergey wrote: > 2. Fast sum() for strings. [3] > This patch is a small enhancement of #1, that makes sum() O(N) for > strings. Obviously it introduces a behavior change and I do NOT > suggest it for inclusion. I believe that ''.join() alternative is > simple enough and don't have to be replaces with sum() for normal > use cases. It is just to shows that sum() can be O(N) for strings. > Why does this "obviously" introduce a behaviour change? If it is just a performance improvement, that's not a behaviour change. Can you clarify what behaviour will change? Also, why does there need to be a behaviour change for strings and not for any other type (you make no mention of behaviour changes anywhere else in this message that I can see). It's hard to see why there would be any argument over *pure* performance improvements. But behaviour changes, especially ones which are "needed" to get performance benefits, are a different matter. Apologies if this has already been discussed, but this thread has become quite complex, and I haven't been following the details (and thanks for the summary, BTW!) Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jul 5 10:36:20 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 05 Jul 2013 18:36:20 +1000 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> Message-ID: <51D68584.30100@pearwood.info> On 05/07/13 17:59, Paul Moore wrote: > On 5 July 2013 08:41, Sergey wrote: > >> 2. Fast sum() for strings. [3] >> This patch is a small enhancement of #1, that makes sum() O(N) for >> strings. Obviously it introduces a behavior change and I do NOT >> suggest it for inclusion. I believe that ''.join() alternative is >> simple enough and don't have to be replaces with sum() for normal >> use cases. It is just to shows that sum() can be O(N) for strings. >> > > Why does this "obviously" introduce a behaviour change? If it is just a > performance improvement, that's not a behaviour change. Have you tried summing strings? py> sum(['a', 'b'], '') Traceback (most recent call last): File "", line 1, in TypeError: sum() can't sum strings [use ''.join(seq) instead] > It's hard to see why there would be any argument over *pure* performance > improvements. But behaviour changes, especially ones which are "needed" to > get performance benefits, are a different matter. Even pure performance improvements don't come for free. Adding all these extra special cases to sum increases the complexity of the code. More complex code, more documentation, more tests means more code to contain bugs, more documentation to be wrong, more tests which can fail. Simplicity is a virtue in and of itself. What benefit does this extra complexity give you? If you *only* look at the advantages, then *every* optimization looks like an obvious win. And if you *only* look at the risks, then every new piece of code looks like a problem to be avoided. The trick is to balance the two, and that relies on some sense of how often you will receive the benefit versus how often you pay the cost. Sergey's suggestion also has a more subtle behavioural change. In order to guarantee that sum is fast, i.e. O(N) instead of O(N**2), the semantics of sum have to change from "support anything which supports __add__" to "support anything which supports fast __iadd__". I don't believe that optimizing sum for non-numbers gives enough benefit to make up for the increased complexity. Who, apart from Sergey, uses sum() on large numbers of lists or tuples? Both have had quadratic behaviour for a decade, and I've never known anyone who noticed or cared. So it seems to me that optimizing these special cases will have very little benefit. I've already made it clear that I'm -0 on this: on balance, I believe the disadvantage of more complex code just slightly outweighs the benefit. But Sergey doesn't have to convince *me*, he just has to convince those who will accept or reject the patch. -- Steven From p.f.moore at gmail.com Fri Jul 5 11:20:47 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 5 Jul 2013 10:20:47 +0100 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: <51D68584.30100@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> <51D68584.30100@pearwood.info> Message-ID: On 5 July 2013 09:36, Steven D'Aprano wrote: > On 05/07/13 17:59, Paul Moore wrote: > >> On 5 July 2013 08:41, Sergey wrote: >> >> 2. Fast sum() for strings. [3] >>> This patch is a small enhancement of #1, that makes sum() O(N) for >>> strings. Obviously it introduces a behavior change and I do NOT >>> suggest it for inclusion. I believe that ''.join() alternative is >>> simple enough and don't have to be replaces with sum() for normal >>> use cases. It is just to shows that sum() can be O(N) for strings. >>> >>> >> Why does this "obviously" introduce a behaviour change? If it is just a >> performance improvement, that's not a behaviour change. >> > > Have you tried summing strings? > > > py> sum(['a', 'b'], '') > Traceback (most recent call last): > File "", line 1, in > TypeError: sum() can't sum strings [use ''.join(seq) instead] Ah, yes. I'd forgotten there was an explicit rejection of strings, rather than just the obvious performance trap. It's hard to see why there would be any argument over *pure* performance >> improvements. But behaviour changes, especially ones which are "needed" to >> get performance benefits, are a different matter. >> > > Even pure performance improvements don't come for free. Adding all these > extra special cases to sum increases the complexity of the code. More > complex code, more documentation, more tests means more code to contain > bugs, more documentation to be wrong, more tests which can fail. Simplicity > is a virtue in and of itself. What benefit does this extra complexity give > you? > Good point - that was to an extent what I was trying to express by emphasizing *pure* - that very few things in practice are unqualified improvements. I didn't put it well, though. The history of all this is somewhere in the archives. I recall the original discussion (although not the details). The explicit rejection of str is clearly deliberate, and implemented by people who certainly knew how to special-case str to use ''.join(), so has Sergey researched the original discussion and explained why his proposal invalidates the conclusions reached at the time? (Again, I apologise for jumping in late - if this is all old news, I'll shut up :-)) > If you *only* look at the advantages, then *every* optimization looks like > an obvious win. And if you *only* look at the risks, then every new piece > of code looks like a problem to be avoided. The trick is to balance the > two, and that relies on some sense of how often you will receive the > benefit versus how often you pay the cost. > > Sergey's suggestion also has a more subtle behavioural change. In order to > guarantee that sum is fast, i.e. O(N) instead of O(N**2), the semantics of > sum have to change from "support anything which supports __add__" to > "support anything which supports fast __iadd__". > Whoa. That's a non-trivial change. I thought the proposal was "guarantee O(N) for types with a constant-time __iadd__ and don't impact performance for other types". But that of course begs the question of what happens in pathological cases where __iadd__ is (say) O(N^3). There's no way to programatically detect whether __iadd__ or __add__ gives the best performance, so what I had thought of course cannot be the case :-( > I don't believe that optimizing sum for non-numbers gives enough benefit > to make up for the increased complexity. Who, apart from Sergey, uses sum() > on large numbers of lists or tuples? Both have had quadratic behaviour for > a decade, and I've never known anyone who noticed or cared. So it seems to > me that optimizing these special cases will have very little benefit. > > I've already made it clear that I'm -0 on this: on balance, I believe the > disadvantage of more complex code just slightly outweighs the benefit. But > Sergey doesn't have to convince *me*, he just has to convince those who > will accept or reject the patch. Your points are good ones. Personally, I have no need for this change - like you, I never use sum() on anything other than numbers. Given the subtle implications of the change, and the fact that it explicitly reverses a deliberate decision by the core devs, I think this proposal would require a PEP to have any chance of being accepted. Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Fri Jul 5 11:28:35 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Fri, 5 Jul 2013 17:28:35 +0800 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> <51D68584.30100@pearwood.info> Message-ID: Is there a better way to flatten lists than [item for sublist in l for item in sublist]? I naturally don't use sum() for much, but if i could, sum(my_list) looks much better than [item for sublist in my_list for item in sublist]. Looking at http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python, all the alternatives seem equally obtuse and verbose. I have a lot of list flattenings I would love to use sum(my_list) for if I had the chance. On Fri, Jul 5, 2013 at 5:20 PM, Paul Moore wrote: > On 5 July 2013 09:36, Steven D'Aprano wrote: > >> On 05/07/13 17:59, Paul Moore wrote: >> >>> On 5 July 2013 08:41, Sergey wrote: >>> >>> 2. Fast sum() for strings. [3] >>>> This patch is a small enhancement of #1, that makes sum() O(N) for >>>> strings. Obviously it introduces a behavior change and I do NOT >>>> suggest it for inclusion. I believe that ''.join() alternative is >>>> simple enough and don't have to be replaces with sum() for normal >>>> use cases. It is just to shows that sum() can be O(N) for strings. >>>> >>>> >>> Why does this "obviously" introduce a behaviour change? If it is just a >>> performance improvement, that's not a behaviour change. >>> >> >> Have you tried summing strings? >> >> >> py> sum(['a', 'b'], '') >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: sum() can't sum strings [use ''.join(seq) instead] > > > Ah, yes. I'd forgotten there was an explicit rejection of strings, rather > than just the obvious performance trap. > > It's hard to see why there would be any argument over *pure* performance >>> improvements. But behaviour changes, especially ones which are "needed" >>> to >>> get performance benefits, are a different matter. >>> >> >> Even pure performance improvements don't come for free. Adding all these >> extra special cases to sum increases the complexity of the code. More >> complex code, more documentation, more tests means more code to contain >> bugs, more documentation to be wrong, more tests which can fail. Simplicity >> is a virtue in and of itself. What benefit does this extra complexity give >> you? >> > > Good point - that was to an extent what I was trying to express by > emphasizing *pure* - that very few things in practice are unqualified > improvements. I didn't put it well, though. > > The history of all this is somewhere in the archives. I recall the > original discussion (although not the details). The explicit rejection of > str is clearly deliberate, and implemented by people who certainly knew how > to special-case str to use ''.join(), so has Sergey researched the original > discussion and explained why his proposal invalidates the conclusions > reached at the time? (Again, I apologise for jumping in late - if this is > all old news, I'll shut up :-)) > > >> If you *only* look at the advantages, then *every* optimization looks >> like an obvious win. And if you *only* look at the risks, then every new >> piece of code looks like a problem to be avoided. The trick is to balance >> the two, and that relies on some sense of how often you will receive the >> benefit versus how often you pay the cost. >> >> Sergey's suggestion also has a more subtle behavioural change. In order >> to guarantee that sum is fast, i.e. O(N) instead of O(N**2), the semantics >> of sum have to change from "support anything which supports __add__" to >> "support anything which supports fast __iadd__". >> > > Whoa. That's a non-trivial change. I thought the proposal was "guarantee > O(N) for types with a constant-time __iadd__ and don't impact performance > for other types". But that of course begs the question of what happens in > pathological cases where __iadd__ is (say) O(N^3). There's no way to > programatically detect whether __iadd__ or __add__ gives the best > performance, so what I had thought of course cannot be the case :-( > > >> I don't believe that optimizing sum for non-numbers gives enough benefit >> to make up for the increased complexity. Who, apart from Sergey, uses sum() >> on large numbers of lists or tuples? Both have had quadratic behaviour for >> a decade, and I've never known anyone who noticed or cared. So it seems to >> me that optimizing these special cases will have very little benefit. >> >> I've already made it clear that I'm -0 on this: on balance, I believe the >> disadvantage of more complex code just slightly outweighs the benefit. But >> Sergey doesn't have to convince *me*, he just has to convince those who >> will accept or reject the patch. > > > Your points are good ones. Personally, I have no need for this change - > like you, I never use sum() on anything other than numbers. > > Given the subtle implications of the change, and the fact that it > explicitly reverses a deliberate decision by the core devs, I think this > proposal would require a PEP to have any chance of being accepted. > > Paul. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Fri Jul 5 11:48:49 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 5 Jul 2013 10:48:49 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> Message-ID: On 5 July 2013 08:09, Paddy3118 wrote: > On Friday, 5 July 2013 07:19:24 UTC+1, Greg Ewing wrote: >> Oscar Benjamin wrote: >> > On 4 July 2013 03:10, Steven D'Aprano wrote: >> >> >> >>Does any other language with sets offer this as a set primitive? >> > >> > I'd be interested to know what they call it if they do. >> >> I think it should be called vennerate(). > > Vennerate? > Cute; but we have str1.partition(str2), and set1.partition(set2) would be > mathematical partition of the union of both sets returning a tuple of three > elements - just like str.partition, which makes me vote for the name > "partition". I didn't know about the str.partition method but having looked at it now what it does is closer to what I think of as a partition of a set i.e. after head, sep, tail = str1.partition(str2) we have that str1 == head + sep + tail By analogy I would expect that after set2, set3, set4 = set1.partition(...) we would have set1 == set2 | set3 | set4 because that is (one property of) a "partition" of set1. However this is not the meaning you intend. Oscar From ronaldoussoren at mac.com Fri Jul 5 11:54:59 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 5 Jul 2013 11:54:59 +0200 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> <51D68584.30100@pearwood.info> Message-ID: On 5 Jul, 2013, at 11:28, Haoyi Li wrote: > Is there a better way to flatten lists than [item for sublist in l for item in sublist]? I naturally don't use sum() for much, but if i could, sum(my_list) looks much better than [item for sublist in my_list for item in sublist]. Looking at http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python, all the alternatives seem equally obtuse and verbose. I have a lot of list flattenings I would love to use sum(my_list) for if I had the chance. There's also list(itertools.chain.from_iterable(somelist)). I haven't benchmarked that, I don't need to flatten lists often enough that I need to worry about the performance. Ronald From oscar.j.benjamin at gmail.com Fri Jul 5 11:56:10 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 5 Jul 2013 10:56:10 +0100 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> <51D68584.30100@pearwood.info> Message-ID: On 5 July 2013 10:28, Haoyi Li wrote: > Is there a better way to flatten lists than [item for sublist in l for item > in sublist]? I naturally don't use sum() for much, but if i could, > sum(my_list) looks much better than [item for sublist in my_list for item in > sublist]. Looking at > http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python, > all the alternatives seem equally obtuse and verbose. I have a lot of list > flattenings I would love to use sum(my_list) for if I had the chance. from itertools import chain def listjoin(iterable_of_iterables): return list(chain.from_iterable(iterable_of_iterables)) Oscar From ronaldoussoren at mac.com Fri Jul 5 12:02:38 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 5 Jul 2013 12:02:38 +0200 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> <51D68584.30100@pearwood.info> Message-ID: On 5 Jul, 2013, at 11:20, Paul Moore wrote: > On 5 July 2013 09:36, Steven D'Aprano wrote: >> >> Sergey's suggestion also has a more subtle behavioural change. In order to >> guarantee that sum is fast, i.e. O(N) instead of O(N**2), the semantics of >> sum have to change from "support anything which supports __add__" to >> "support anything which supports fast __iadd__". >> > > Whoa. That's a non-trivial change. I thought the proposal was "guarantee > O(N) for types with a constant-time __iadd__ and don't impact performance > for other types". But that of course begs the question of what happens in > pathological cases where __iadd__ is (say) O(N^3). There's no way to > programatically detect whether __iadd__ or __add__ gives the best > performance, so what I had thought of course cannot be the case :-( It's not that bad, any type where __iadd__ has (significantly) worse performance that __add__ is IMHO broken, as you could just not have implemented __iadd__ in the first place. I'm only +O on using __iadd__ in sum because it is unclear how useful this would be in real-world code, but does have a simple enough implementation that is still easily explained. I'd be -1 on adding custom hooks to improve the performance of specific types (such, adding tricks for speeding up joining a sequence of tuples or strings) as that makes the implementation harder and increases the conceptual complexity of the function as well. Ronald From abarnert at yahoo.com Fri Jul 5 12:34:37 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 5 Jul 2013 03:34:37 -0700 (PDT) Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130705094341.7c1c84de@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> Message-ID: <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Sergey Sent: Thursday, July 4, 2013 11:43 PM > On Jul 4, 2013 Andrew Barnert wrote: > >> Right on some points, but you're making unwarranted assumptions >> for these two: >>> * sum() can *always* be fast! (patch and tests) >>> * linked list is O(n) where n is number of lists to add > >> It's not two other types, it's every type anyone ever has built or >> will build that doesn't have a fast __iadd__.?If I create a new >> type,?I can very easily make it?addable, iterable, displayable, >> pickleable, etc., with simple Python code. But I can't make it >> summable without?patching the builtin C function and recompiling >> the interpreter. > > Yes, you can! You just need to implement a fast __iadd__ or __add__ > for your type to make it O(n) summable. Well, in Python 3.3, or even 2.3, you just need to implement a fast __add__. So, if that's an acceptable answer, then we don't need to do anything. The reason it's not acceptable to you is that sometimes, you can't implement a fast __add__. But sometimes, you can't implement a fast __iadd__, either. > And you can always do that,?can't you? No. You can't implement a fast __iadd__ for tuple. In fact, there is no single operation you could implement for tuple.?To make it fast-summable, you had to modify sum so that it converts to list, extends, and converts back to tuple at the end. If that's going to be extendable to other types, you need to give them control over some external context?e.g., a way?to specify how to set up the start of a sum operation before the first __iadd__ and how to conclude it after the last __iadd__. > If that is not obvious we can make it explicit in sum() description. > E.g.: [?] > ? To make advantage of faster __iadd__ implementation sum() uses it > ? if possible. sum() has linear time complexity for built-in types > ? and all types providing constant-time `__iadd__` or `__add__`. Note that integer addition isn't actually constant, it's linear on the word size of the integers. Which means sum isn't linear for integers. So, this is misleading right off the bat.?Currently, because we don't try to specify anything except to imply that sum is an appropriately fast way to sum numbers, there's no such problem. >> A cons is a single-linked list with no tail pointer and an O(N) >> __iadd__. The node type is the list type; it's just a head value and >> a next pointer to another list.? > Personally I don't think such implementation would be very useful. Given that nearly every program every written in Lisp and its successors makes heavy use of such an implementation, I submit that your intuition may be wrong. > (It's certainly unusual for python's ?batteries included? philosophy). What does "batteries included" have to do with excluding data types? > What I would call a linked-list is a separate type where your nodes > are just its internal representation. If you make the lists and nodes separate types, and the nodes private,?you have to create yet a third type,?like the list::iterator type in C++, which either holds a reference to a node plus a reference to a list, or can be passed to a list in the same way indices are passed to an array.?Without that, you can't do anything useful with lists at all, because any operation is O(N). Just as arrays use indices and hash tables use keys, linked lists use node references. >> There is no way to make this work is a holds a tail pointer. > > There is. You just need to separate the nodes from the list object, > and turn nodes into internal details, invisible outside. > > So that your sample would turn into: > ? ? >>> a = linkedlist([0, 1, 2]) > ? ? >>> a.replacefrom(2, linkedlist([10, 20])) > ? ? >>> a > ? ? linkedlist([0, 1, 10, 20]) > > Or maybe even: > ? ? >>> a = linkedlist([0, 1, 2]) > ? ? >>> b = a[1:] > ? ? >>> b[1:] = linkedlist([10, 20]) > ? ? >>> a > ? ? linkedlist([0, 1, 2]) > ? ? >>> b > ? ? linkedlist([1, 10, 20]) > > I don't like the idea that `a` implicitly changes when I change `b`. Then you don't like linked lists. Saying "There is a way to make it work, just make it do something different, which is O(N) instead of O(1) and won't work with the algorithms you want to use" is not an answer. You don't have to use linked lists?there's a reason they're not built in to Python, after all. But if you want to talk about linked lists, you have to talk about linked lists, not some similar data type you've invented that has most of the disadvantages of linked lists and none of the advantages, and that no one will ever use. > And yes, b=a[1:] would be O(N) since it would have to copy all the > elements to make sure that change to one list won't affect another > one. But that's my vision. Even b=a[n] is O(N) on linked lists, even without copying. Which is?exactly why linked lists provide head and next instead of indexing and slicing. Trying to use linked lists as if they were arrays is as silly as the other way around. >> Making a.next be a copying operation gives the wrong answer (and, >> more generally, breaks every mutating operation on cons-lists) and >> makes?almost everything O(N) instead of O(1). > > Yes, you're right, this turns copy into O(N). But you can still > keep other operations fast. No, you can make one operation fast, at the expense of making every other operation slow. That's not a good tradeoff.?It's like giving a tree O(1) searches by attaching a hash table that maps keys to node pointers, and then claiming that this is an improvement, and who cares that inserts are now O(N) instead of O(log N). And again, you've also ignored the fact that, performance aside, it _breaks every algorithm people use mutable linked lists for_. > But you CAN have a fast __iadd__ even for your simple a.next case! > You only need to initialize `tail` before calling sum() and update > it inside __iadd__ Initialize tail where exactly? In a global variable? >> If people really do need to concatenate a bunch of tuples this >> often, maybe the answer is to move chain to builtins. > > Yes, I understand that we can restrict sum(), for example it can > throw an error for non-numbers as it currently does for strings. > What I can't understand is why we should restrict it if instead > we can make it fast? How is that even a response to my point??I suggest moving chain to builtins, and you ask why we should add restrictions to sum? From p.f.moore at gmail.com Fri Jul 5 12:50:06 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 5 Jul 2013 11:50:06 +0100 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> <51D68584.30100@pearwood.info> Message-ID: On 5 July 2013 10:28, Haoyi Li wrote: > Is there a better way to flatten lists than [item for sublist in l for > item in sublist]? I naturally don't use sum() for much, but if i could, > sum(my_list) looks much better than [item for sublist in my_list for item > in sublist]. Looking at > http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python, > all the alternatives seem equally obtuse and verbose. I have a lot of list > flattenings I would love to use sum(my_list) for if I had the chance. > Only partially facetious, but newlist = flatten(list_of_lists) In other words, if you don't like the nested comprehension, hide it in a personal function. No need to force the functionality into an existing builtin, Not-every-one-liner-needs-to-be-a-builtin-ly y'rs, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Fri Jul 5 12:50:18 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 5 Jul 2013 11:50:18 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On 4 July 2013 23:25, Andrew Barnert wrote: > If people really do need to concatenate a bunch of tuples this often, maybe the answer is to move chain to builtins. If it's not important enough to move chain to builtins, it's probably not important enough to do anything else. Can I suggest that if chain is ever moved to builtins it should actually be chain.from_iterable that gets moved and renamed to chain (or concat or something). It has always seemed to me that chain.from_iterable is the more appropriate function and I assume that it only has second class status because it was initially overlooked. Oscar From stefan_ml at behnel.de Fri Jul 5 12:50:54 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 05 Jul 2013 12:50:54 +0200 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130702211209.6dbde663@sergey> References: <20130702211209.6dbde663@sergey> Message-ID: Sergey, 02.07.2013 20:12: > sum() is a great function. It is the "obvious way" to add things. > Unfortunately sometimes it's slower than it could be. > > The problem is that code: > sum([[1,2,3]]*1000000, []) > takes forever to complete. Let's fix that! No, please. Using sum() on lists is not more than a hack that seems to be a cool idea but isn't. Seriously - what's the sum of lists? Intuitively, it makes no sense at all to say sum(lists). What you want is the concatenation, not the sum. Please don't stuff something as simple as sum() with a nonsensical misuse case. Stefan From ron3200 at gmail.com Fri Jul 5 18:22:25 2013 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 05 Jul 2013 11:22:25 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D41C82.2040301@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: On 07/03/2013 07:43 AM, Steven D'Aprano wrote: > > I'm not sure that sum() is the Obvious Way to concatenate lists, and I > don't think that concatenating many lists is a common thing to do. > Traditionally, sum() works only on numbers, and I think we wouldn't be > having this discussion if Python used & for concatenation instead of +. So > I don't care that sum() has quadratic performance on lists (and tuples), > and I must admit that having a simple quadratic algorithm in the built-ins > is sometimes useful for teaching purposes, so I'm -0 on optimizing this case. I agree, and wished sequences used a __join__ method instead of __add__. The '&' is already used for Bitwise And. How about '++' instead? 'hello ' ++ 'world' == "hello world" [1, 3, 3] ++ [4, 5, 6] == [1, 2, 3, 4, 5, 6] While this doesn't seem like a big change, I think it would simplify code in many places that is more complicated than it really needs to be. Cheers, Ron From storchaka at gmail.com Fri Jul 5 18:27:36 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 05 Jul 2013 19:27:36 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: 05.07.13 19:22, Ron Adam ???????(??): > The '&' is already used for Bitwise And. How about '++' instead? > > > 'hello ' ++ 'world' == "hello world" > > [1, 3, 3] ++ [4, 5, 6] == [1, 2, 3, 4, 5, 6] > > > While this doesn't seem like a big change, I think it would simplify > code in many places that is more complicated than it really needs to be. The '++' "operator" already defined for numbers. >>> 123 ++ 456 579 From ron3200 at gmail.com Fri Jul 5 18:35:03 2013 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 05 Jul 2013 11:35:03 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: On 07/05/2013 11:22 AM, Ron Adam wrote: > > The '&' is already used for Bitwise And. How about '++' instead? > > > 'hello ' ++ 'world' == "hello world" > > [1, 3, 3] ++ [4, 5, 6] == [1, 2, 3, 4, 5, 6] > > > While this doesn't seem like a big change, I think it would simplify code > in many places that is more complicated than it really needs to be. Hmm... Nope.. '++' is valded for numbers. Ron From steve at pearwood.info Fri Jul 5 18:38:40 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 06 Jul 2013 02:38:40 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: <51D6F690.2000201@pearwood.info> On 06/07/13 02:22, Ron Adam wrote: > > > On 07/03/2013 07:43 AM, Steven D'Aprano wrote: >> >> I'm not sure that sum() is the Obvious Way to concatenate lists, and I >> don't think that concatenating many lists is a common thing to do. >> Traditionally, sum() works only on numbers, and I think we wouldn't be >> having this discussion if Python used & for concatenation instead of +. So >> I don't care that sum() has quadratic performance on lists (and tuples), >> and I must admit that having a simple quadratic algorithm in the built-ins >> is sometimes useful for teaching purposes, so I'm -0 on optimizing this case. > > I agree, and wished sequences used a __join__ method instead of __add__. > > > The '&' is already used for Bitwise And. How about '++' instead? Nope, because it is ambiguous. Given x++y, is that a ++ binary operator, or a binary + operator followed by unary + operator? Besides, for backwards compatibility, changing the operator from + cannot happen now until Python 4000, if ever. > 'hello ' ++ 'world' == "hello world" > > [1, 3, 3] ++ [4, 5, 6] == [1, 2, 3, 4, 5, 6] > > > While this doesn't seem like a big change, I think it would simplify code in many places that is more complicated than it really needs to be. I don't think that merely changing an operator from one token + to a very slightly different token ++ doesn't "simplify code" in any way. I suggested that & was a better token for concatenation because it avoids associating/conflating concatenation with addition, but there's no difference in code complexity whether you write x+y, x&y or x++y. -- Steven From ron3200 at gmail.com Fri Jul 5 18:54:23 2013 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 05 Jul 2013 11:54:23 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D6F690.2000201@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D6F690.2000201@pearwood.info> Message-ID: On 07/05/2013 11:38 AM, Steven D'Aprano wrote: > On 06/07/13 02:22, Ron Adam wrote: >> >> >> On 07/03/2013 07:43 AM, Steven D'Aprano wrote: >>> >>> I'm not sure that sum() is the Obvious Way to concatenate lists, and I >>> don't think that concatenating many lists is a common thing to do. >>> Traditionally, sum() works only on numbers, and I think we wouldn't be >>> having this discussion if Python used & for concatenation instead of +. So >>> I don't care that sum() has quadratic performance on lists (and tuples), >>> and I must admit that having a simple quadratic algorithm in the built-ins >>> is sometimes useful for teaching purposes, so I'm -0 on optimizing this >>> case. >> >> I agree, and wished sequences used a __join__ method instead of __add__. >> >> >> The '&' is already used for Bitwise And. How about '++' instead? > > Nope, because it is ambiguous. Given x++y, is that a ++ binary operator, or > a binary + operator followed by unary + operator? Yes, I realised that soon after I posted. > Besides, for backwards compatibility, changing the operator from + cannot > happen now until Python 4000, if ever. Right, but if a good solution comes before then,it can be used when the time comes. >> 'hello ' ++ 'world' == "hello world" >> >> [1, 3, 3] ++ [4, 5, 6] == [1, 2, 3, 4, 5, 6] >> >> >> While this doesn't seem like a big change, I think it would simplify code >> in many places that is more complicated than it really needs to be. > > > I don't think that merely changing an operator from one token + to a very > slightly different token ++ doesn't "simplify code" in any way. I suggested > that & was a better token for concatenation because it avoids > associating/conflating concatenation with addition, but there's no > difference in code complexity whether you write x+y, x&y or x++y. It's not the expressions that are improved, but the code surrounding them. Like I said it may not seem like a big change, but I think it would be an important change. Cheers, Ron From tjreedy at udel.edu Fri Jul 5 22:25:46 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 05 Jul 2013 16:25:46 -0400 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: On 7/5/2013 12:22 PM, Ron Adam wrote: > > > On 07/03/2013 07:43 AM, Steven D'Aprano wrote: >> >> I'm not sure that sum() is the Obvious Way to concatenate lists, and I >> don't think that concatenating many lists is a common thing to do. Agree here. >> Traditionally, sum() works only on numbers, and I think we wouldn't be >> having this discussion if Python used & for concatenation instead of >> +. Since addition of numbers represented in base-1 or tally notation is the same as sequence concatenation, I think '+' is the right choice. In my experience of American English, normal people 'add' lists (though perhaps as sets, with duplicate removal), not 'concatenate' them. And while they may 'add' together multiple lists, they would not likely 'sum' them unless they are lists of numbers. When Alex said that it was not possible to determine if the start value is a number, he was talking in the context of old style classes where the type of every user class was 'Class' and the type of every user instance was 'Instance' (or something like that). In Python 3, with ABCs, isinstance(start, Number) would solve the problem as long as the requirement were documented. -- Terry Jan Reedy From joshua.landau.ws at gmail.com Fri Jul 5 23:31:14 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Fri, 5 Jul 2013 22:31:14 +0100 Subject: [Python-ideas] All ideas together: Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <20130705104103.2c9173d2@sergey> <51D68584.30100@pearwood.info> Message-ID: On 5 July 2013 11:02, Ronald Oussoren wrote: > I'm only +O on using __iadd__ in sum because it is unclear how useful this > would be in real-world code, but does have a simple enough implementation that > is still easily explained. I'd be -1 on adding custom hooks to improve > the performance of specific types (such, adding tricks for speeding up > joining a sequence of tuples or strings) as that makes the implementation > harder and increases the conceptual complexity of the function as well. This is a perfect summary of where I am on this proposal, but for slightly different reasons. If you special-case, either "sum(tuple_subclass)" is going to be asymptotically slower (and hence code that duck-types should not rely on it) or broken. Thus, special-casing is out of the window for me. Using __iadd__, btw, cannot be a "bugfix", AFAIK, because it changes behavior (subtly). But code that depends on that distinction is broken, so I'm OK with it going forward. I appreciate the speed-up, but as Steven convinced me it's not that big a deal. Hence +0. From joshua.landau.ws at gmail.com Fri Jul 5 23:38:22 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Fri, 5 Jul 2013 22:38:22 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> Message-ID: On 5 July 2013 10:48, Oscar Benjamin wrote: > I didn't know about the str.partition method but having looked at it > now what it does is closer to what I think of as a partition of a set > i.e. after > > head, sep, tail = str1.partition(str2) > > we have that > > str1 == head + sep + tail > > By analogy I would expect that after > > set2, set3, set4 = set1.partition(...) > > we would have > > set1 == set2 | set3 | set4 > > because that is (one property of) a "partition" of set1. However this > is not the meaning you intend. Hence why it should be defined: set1.partition(set2) === set1 - set2, set1 & set 2 That lets you have the "single pass" of each set as before, but is a "smaller" operation. The full partition would just plot "set2 - set1" on the end: ven_a, ven_shared = a.partition(b) ven_b = ven_b - ven_shared And "ven_a | ven_shared == a" as you said. From dholth at gmail.com Sat Jul 6 01:01:13 2013 From: dholth at gmail.com (Daniel Holth) Date: Fri, 5 Jul 2013 19:01:13 -0400 Subject: [Python-ideas] format specifier for "not bytes" In-Reply-To: <503E9822.10709@gmx.net> References: <5037C1FE.9020509@mrabarnett.plus.com> <20120824204043.3c4c4524@pitrou.net> <5037D0F3.70108@pearwood.info> <20120827103438.093aa3f4@resist.wooz.org> <503E9822.10709@gmx.net> Message-ID: On Wed, Aug 29, 2012 at 6:30 PM, Mathias Panzenb?ck wrote: > On 08/27/2012 04:34 PM, Barry Warsaw wrote: >> >> On Aug 25, 2012, at 09:16 AM, Nick Coghlan wrote: >> >>> A couple of people at PyCon Au mentioned running into this kind of issue >>> with Python 3. It relates to the fact that: >>> 1. String formatting is *coercive* by default >>> 2. Absolutely everything, including bytes objects can be coerced to a >>> string, due to the repr() fallback >>> >>> So it's relatively easy to miss a decode or encode operation, and end up >>> interpolating an unwanted "b" prefix and some quotes. >>> >>> For existing versions, I think the easiest answer is to craft a regex >>> that >>> matches bytes object repr's and advise people to check that it *doesn?t* >>> match their formatted strings in their unit tests. >>> >>> For 3.4+ a non-coercive string interpolation format code may be >>> desirable. >> >> >> Or maybe just one that calls __str__ without a __repr__ fallback? >> > >>>> b'a'.__str__() > "b'a'" > > __str__ still returns the bytes literal representation. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas This is now a patch at http://bugs.python.org/issue18373. The user can call sys.getbyteswarning() and sys.setbyteswarning(integer) to control whether str(bytes) warns in the current thread, but you also have to adjust the warnings module for it to be useful. From haoyi.sg at gmail.com Sat Jul 6 02:56:11 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Sat, 6 Jul 2013 08:56:11 +0800 Subject: [Python-ideas] Macro-Powered Enums Message-ID: Hey All, I know this ship has already sailed with PEP 435, but I've been working on a prototype implementation of enums using MacroPy macros . The goal was to have enums whose methods provide most of the nice capabilities of int enums (auto-generated indexes, fast comparison, incrementing, index-arithmetic, find-by-index) and string enums (nice __repr__, find-by-name) but has the ability to smoothly scale to full-fledged objects with methods *and* fields, all this is an extremely concise way. The gimmick here is that it uses macros to provide a concise syntax to allow you to construct each enum instance with whatever constructor parameters you wish, java-enum-style (here). This allows a degree of enums-as-objects which i haven't seen in any other library (most don't allow custom fields). Probably not standard-library worthy, but I thought it was pretty cool. -Haoyi -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Jul 6 03:53:45 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 06 Jul 2013 10:53:45 +0900 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: <87zju0v4gm.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > In my experience of American English, normal people 'add' lists Nitpick: That somehow doesn't sit right with me. They "put lists together" or "include one in another" (assign to slice, not list as element of superior list). Adding refers to elements: "add item to list", although it's often applied iteratively (being synonymous to "include in").. I don't think it's appropriate to refer to any specific English usage here. Most native speakers, and especially programmers, have flexibility in the direction of allowing the generalization of "+" to strings to imply that "sum" also generalizes to strings, as the iterated and associative application of "+". They're even more flexible than that: they intuitively understand that although "+" (and perhaps even more so) usually imply "commutative" (as in the convention in abstract algebra), when applied to strings and other sequences this doesn't make sense (as of course it doesn't make sense for infinite sequences in analysis). I think that as far as consistency with intuition goes, it doesn't matter which we choose: we really need to ask the ultimate authority on "Pythonicity". My own preference is based on "consenting adults": if "sum(list_of_strings)" bothers you, don't use it. But I don't understand the ramifications of thoroughly implementing that in a community where intuition splits as deeply as it evidently does here. Steve From ethan at stoneleaf.us Sat Jul 6 04:23:44 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Jul 2013 19:23:44 -0700 Subject: [Python-ideas] Macro-Powered Enums In-Reply-To: References: Message-ID: <51D77FB0.3000705@stoneleaf.us> On 07/05/2013 05:56 PM, Haoyi Li wrote: > > I know this ship has already sailed with PEP 435, but I've been working on a prototype implementation of enums using > MacroPy macros . > > The goal was to have enums whose methods provide most of the nice capabilities of int enums (auto-generated indexes, > fast comparison, incrementing, index-arithmetic, find-by-index) and string enums (nice __repr__, find-by-name) but has > the ability to smoothly scale to full-fledged objects with methods *and* fields, all this is an extremely concise way. The stdlib Enum and IntEnum offer this already, although not quite as concisely (I haven't looked at your macro code to see how it handles the missing name lookup, but in standard Python that "feature" invites way too many bugs). > The gimmick here is that it uses macros to provide a concise syntax to allow you to construct each enum instance with > whatever constructor parameters you wish, java-enum-style (here ). > This allows a degree of enums-as-objects which i haven't seen in any other library (most don't allow custom fields). Check out the stdlib Enum -- I think you'll be pleasantly suprised! > Probably not standard-library worthy, but I thought it was pretty cool. It is indeed cool. For comparison, here is your complex Direction Enum in stdlib syntax: -- 8< ---------------------------------------------------------------------- from enum import Enum class Direction(Enum): """stdlib version of MacroPy complex Directions Enum""" __order__ = 'North East South West' # for 2.x North = ("Vertical", ["Northrend"]) East = ("Horizontal", ["Azeroth", "Khaz Modan", "Lordaeron"]) South = ("Vertical", ["Pandaria"]) West = ("Horizontal", ["Kalimdor"]) def __new__(cls, *args): "__new__ can be omitted if you don't want int-like behavior" value = len(cls.__members__) + 1 obj = object.__new__(cls) obj._value = value return obj def __init__(self, alignment, continent): self.alignment = alignment self.continent = continent def __int__(self): "__int__ can be omitted if you don't want int-like behavior" return self.value def __index__(self): return self.value @property def opposite(self): return Direction((self.value + 2) % 4) def padded_name(self, n): return ("<" * n) + self.name + (">" * n) # members print(Direction.North.alignment) # Vertical print(Direction.East.continent) # ["Azeroth", "Khaz Modan", "Lordaeron"] # properties print(Direction.North.opposite) # Direction.South # methods print(Direction.South.padded_name(2)) # <> # int-like print(int(Direction.East)) # 1 print(('zero', 'one', 'two', 'three', 'four')[Direction.South]) # 'three' -- 8< ---------------------------------------------------------------------- Over half the code goes away if you don't need the int-like behavior. Oh, and the `__order__` is only used in the PyPI version (enum34) so that 2.x enums can still have a "definition" order. -- ~Ethan~ From joshua.landau.ws at gmail.com Sat Jul 6 06:30:16 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 6 Jul 2013 05:30:16 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" Message-ID: The PEP is attached. I'm not sure if I've covered the basics, but it's a try. If anyone knows how to get the patch (from the bug report) working, or where to find http://code.python.org/python/users/twouters/starunpack after code.python.org was deleted in favour of hg.python.org (which seems not to have it), it'd be nice to know. -------------- next part -------------- PEP: XXX Title: Additional Unpacking Generalizations Version: $Revision$ Last-Modified: $Date$ Author: Joshua Landau Discussions-To: python-ideas at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 29-Jun-2013 Python-Version: 3.4 Post-History: #TODO Abstract ======== This PEP proposes extended usages of the ``*`` iterable unpacking operator to allow unpacking in more positions, and an arbitrary number of times, and in several additional circumstances. Specifically: Arbitrarily positioned unpacking operators:: >>> print(*[1], *[2], 3) 1 2 3 >>> dict(**{'x': 1}, y=3, **{'z': 2}) {'x': 1, 'y': 2, 'z': 3} >>> def f(*args, last): pass Keywords arguments must still follow positional arguments but now must also follow ``*``-unpackings. The function of a lone ``*`` is unchanged. Unpacking is proposed to be allowed inside tuples, lists, sets, dictionaries and comprehensions:: >>> *range(4), 4 (0, 1, 2, 3, 4) >>> [*range(4), 4] [0, 1, 2, 3, 4] >>> {*range(4), 4} {0, 1, 2, 3, 4} >>> {'x': 1, **{'y': 2}} {'x': 1, 'y': 2} >>> ranges = [range(i) for i in range(5)] >>> [*item for item in ranges] [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] Rationale ========= Current usage of the ``*`` iterable unpacking operator features unnecessary restrictions that can harm readability. There is also asymmetry between what is allowed in assignment unpacking in function definition. This proposal hopes to alleviate a large proportion of this imbalance. Unpacking multiple times has an obvious rationale. When you want to unpack several iterables into a function definition or follow an unpack with more positional arguments, the most natural way would be to write:: function(**kw_arguments, **more_arguments) function(*arguments, argument) Simple examples where this is useful are ``print`` and ``str.format``. Instead, you could be forced to write:: kwargs = dict(kw_arguments) kwargs.update(more_arguments) function(**kwargs) args = list(arguments) args.append(arg) function(*args) or, if you know to do so:: from collections import ChainMap function(**ChainMap(more_arguments, arguments)) from itertools import chain function(*chain(args, [arg])) which add unnecessary line-noise and, with the first methods, causes duplication of work. Function definitions are also now more symmetrical with assignment; whereas previously just:: first, *others, last = iterable was valid, now so too is:: def f(first, *others, last): ... f(*iterable) As PEP 3132 has been finalized, it should already be clear the benefits of this approach. In particular, this should improve the signatures of functions that utilize this feature by moving the unpacking from the body into the definition. There are two primary rationale for unpacking inside of containers. Firstly there is a symmetry of assignment, where ``fst, *other, lst = elems`` and ``elems = fst, *other, lst`` are approximate inverses, ignoring the specifics of types. This, in effect, simplifies the language by removing special cases. Secondly, it vastly simplifies types of "addition" such as combining dictionaries, and does so in an unambiguous and well-defined way:: combination = {**first_dictionary, "x": 1, "y": 2} instead of:: combination = first_dictionary.copy() combination.update({"x": 1, "y": 2}) which is especially important in contexts where expressions are preferred. This is also useful as a more readable way of summing many lists, such as ``my_list + list(my_tuple) + list(my_range)`` which is now equivalent to just ``[*my_list, *my_tuple, *my_range]``. A further extension to comprehensions is a logical and necessary extension. It's usage will primarily be a neat replacement for ``[i for j in 2D_list for i in j]``, as the more readable ``[*l for l in 2D_list]``. Other uses are possible, but expected to occur rarely. Specification ============= Function calls may accept an unbound number of ``*`` and ``**`` unpackings, which are allowed anywhere that positional and keyword arguments are allowed respectively. In approximate pseudo-notation:: function( argument or *args, argument or *args, ..., kwargument or **kwargs, kwargument or **kwargs, ... ) As the function ``lambda *args, last: ...`` now does not require ``last`` to be a keyword only argument, ``lambda *args, *, last: ...`` will be valid. No other changes are made to function definition. Tuples, lists, sets and dictionaries will allow unpacking. This will act as if the elements from unpacked item were inserted in order at the site of unpacking, much as happens in unpacking in a function-call. Dictionaries require ``**`` unpacking, all the others require ``*`` unpacking. A dictionary's key remain in a right-to-left priority order, so ``{**{'a': 1}, 'a': 2, **{'a': 3}}`` evaluates to ``{'a': 3}``. Comprehensions, by simple extension, will support unpacking. As before, dictionaries require ``**`` unpacking, all the others require ``*`` unpacking and key priorities are unchanged. Examples include:: {*[1, 2, 3], 4, 5} (*e for e in [[1], [3, 4, 5], [2]]) {**dictionary for dictionary in (globals(), locals())} {**locals(), "override": None} Disadvantages ============= Parts of this change are not backwards-compatible. - ``function(kwarg="foo", *args)`` is no longer valid syntax; ``function(*args, kwarg="foo")`` is required instead - ``lambda *args, last: ...`` no longer requires ``last`` to be a keyword only argument Additionally, whilst ``*elements, = iterable`` causes ``elements`` to be a list, ``elements = *iterable,`` causes ``elements`` to be a tuple. The reason for this is may not be obvious at first glance, and may confuse people unfamiliar with the construct. .. I don't feel I have the standing to make a judgment on these cases. Needless to say the first of these is a more significant hurdle and will affect more working code. Implementation ============== An implementation for an old version of Python 3 is found at Issue 2292 on bug tracker [1]_. It has yet to be updated to the most recent Python version. References ========== .. [1] Issue 2292, "Missing `*`-unpacking generalizations", Thomas Wouters (http://bugs.python.org/issue2292) .. [2] Discussion on Python-ideas list, "list / array comprehensions extension", Alexander Heger (http://mail.python.org/pipermail/python-ideas/2011-December/013097.html) Copyright ========= This document has been placed in the public domain. From paddy3118 at gmail.com Sat Jul 6 06:56:00 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Fri, 5 Jul 2013 21:56:00 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> Message-ID: <42697435-0d9b-4edb-a4a3-777699cbe8b6@googlegroups.com> On Friday, 5 July 2013 22:38:22 UTC+1, Joshua Landau wrote: > > ... > > Hence why it should be defined: > > set1.partition(set2) === set1 - set2, set1 & set 2 > > That lets you have the "single pass" of each set as before, but is a > "smaller" operation. The full partition would just plot "set2 - set1" > on the end: > > ven_a, ven_shared = a.partition(b) > ven_b = ven_b - ven_shared > > And "ven_a | ven_shared == a" as you said. > _______________________________________________ > Unfortunately I tend to need all three of exclusively1, common, exclusively2 . Assuming fast C implementations, then your proposal would lead to less opportunity for optimization and the need to create one needed result in Python. The need for all three and the potentially better optimization afforded to computing three at once in C should outweigh any considerations of the name of the method. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sat Jul 6 07:54:18 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 6 Jul 2013 06:54:18 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <42697435-0d9b-4edb-a4a3-777699cbe8b6@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <42697435-0d9b-4edb-a4a3-777699cbe8b6@googlegroups.com> Message-ID: On 6 July 2013 05:56, Paddy3118 wrote: > On Friday, 5 July 2013 22:38:22 UTC+1, Joshua Landau wrote: >> >> Hence why it should be defined: >> >> set1.partition(set2) === set1 - set2, set1 & set 2 >> >> That lets you have the "single pass" of each set as before, but is a >> "smaller" operation. The full partition would just plot "set2 - set1" >> on the end: >> >> ven_a, ven_shared = a.partition(b) >> ven_b = ven_b - ven_shared >> >> And "ven_a | ven_shared == a" as you said. > > Unfortunately I tend to need all three of exclusively1, common, > exclusively2 . Assuming fast C implementations, then your proposal would > lead to less opportunity for optimization and the need to create one needed > result in Python. The need for all three and the potentially better > optimization afforded to computing three at once in C should outweigh any > considerations of the name of the method. I don't believe it would be any (measurably) slower, given the single-pass optimisations shown in other posts. I could be wrong, but it's not like Python's known for it's micro-optimisations (except for dicts...). I also think that my way is better *because* of efficiency concerns -- if it's not measurably slower for the 3-way split, surely it's more useful to allow people to have a fast 2-way split. It's also a simpler, less redundant implementation. I could give you some uses of my partition over 3-way partitioning: flags = set(...) flags_passed_to_next_in_chain, flags_kept = flags.partition(flags_that_I_want) The rest are all variations of this. From steve at pearwood.info Sat Jul 6 08:32:47 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 06 Jul 2013 16:32:47 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> Message-ID: <51D7BA0F.1070506@pearwood.info> On 06/07/13 06:25, Terry Reedy wrote: > When Alex said that it was not possible to determine if the start value is a number, he was talking in the context of old style classes where the type of every user class was 'Class' and the type of every user instance was 'Instance' (or something like that). In Python 3, with ABCs, isinstance(start, Number) would solve the problem as long as the requirement were documented. For the record, it has always been possible to check if something is a number: try: x + 0 except TypeError: print "x is not a number" -- Steven From abarnert at yahoo.com Sat Jul 6 09:07:11 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 6 Jul 2013 00:07:11 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D7BA0F.1070506@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D7BA0F.1070506@pearwood.info> Message-ID: On Jul 5, 2013, at 23:32, Steven D'Aprano wrote: > On 06/07/13 06:25, Terry Reedy wrote: > >> When Alex said that it was not possible to determine if the start value is a number, he was talking in the context of old style classes where the type of every user class was 'Class' and the type of every user instance was 'Instance' (or something like that). In Python 3, with ABCs, isinstance(start, Number) would solve the problem as long as the requirement were documented. > > > For the record, it has always been possible to check if something is a number: > > > try: > x + 0 > except TypeError: > print "x is not a number" This isn't a very good rule for "is a number". You can add 0 to numpy arrays, for example, and they're not numbers. But I think it is actually a good rule for "is summable". If you've got something that's not a number, but 0+x makes sense, summing probably also makes sense. Conversely, if you create some type that is numeric, but isn't addable to 0, you wouldn't be surprised if you couldn't sum it. From ron3200 at gmail.com Sat Jul 6 11:02:37 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 06 Jul 2013 04:02:37 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <87zju0v4gm.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <87zju0v4gm.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 07/05/2013 08:53 PM, Stephen J. Turnbull wrote: > Terry Reedy writes: > > > In my experience of American English, normal people 'add' lists > > Nitpick: That somehow doesn't sit right with me. They "put lists > together" or "include one in another" (assign to slice, not list as > element of superior list). Adding refers to elements: "add item to > list", although it's often applied iteratively (being synonymous to > "include in").. We combine lists, and add ingredients. Or do we combine ingredients and add lists? ;-) > I don't think it's appropriate to refer to any specific English usage > here. It's one of those things that if we can find a relationship that works, great, but it just won't always work out that well. > Most native speakers, and especially programmers, have > flexibility in the direction of allowing the generalization of "+" to > strings to imply that "sum" also generalizes to strings, as the > iterated and associative application of "+". They're even more > flexible than that: they intuitively understand that although "+" (and > perhaps even more so) usually imply "commutative" (as in the > convention in abstract algebra), when applied to strings and other > sequences this doesn't make sense (as of course it doesn't make sense > for infinite sequences in analysis). > > I think that as far as consistency with intuition goes, it doesn't > matter which we choose: we really need to ask the ultimate authority > on "Pythonicity". My own preference is based on "consenting adults": > if "sum(list_of_strings)" bothers you, don't use it. But I don't > understand the ramifications of thoroughly implementing that in a > community where intuition splits as deeply as it evidently does here. It helps to view these thing from two sides. On the human side, we want something that is somewhat familiar, but on the CPU side, we need to be fairly specific. Humans tend to overgeneralise, and over simplify, but we also are able to infer more specific information by including lots of extraneous information. Python is very limited in that respect. On one hand it sometimes is nice to be able to write very general routines that don't care what the input is, or will work with a wide range of input. But if the routine is too generalized, it can do the wrong thing at times. Like iterating letters in a string in a list. So having the language organised in meaningful ways can help. The question is, is there something we can change that will help with things like this? The behaviour I would like is... (only some examples.) add and subtract scaler objects (and other scaler operations.) combine and separate immutable collections (strings, tuples) Join and split mutable collections (lists, dictionaries, sets) We already get some nice error messages when try to do things that don't make sense, but sometimes, we get a wrong result instead. Cheers, Ron From steve at pearwood.info Sat Jul 6 15:38:54 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 06 Jul 2013 23:38:54 +1000 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> Message-ID: <51D81DEE.9030505@pearwood.info> On 05/07/13 19:48, Oscar Benjamin wrote: > On 5 July 2013 08:09, Paddy3118 wrote: >> On Friday, 5 July 2013 07:19:24 UTC+1, Greg Ewing wrote: >>> Oscar Benjamin wrote: >>>> On 4 July 2013 03:10, Steven D'Aprano wrote: >>>>> >>>>> Does any other language with sets offer this as a set primitive? >>>> >>>> I'd be interested to know what they call it if they do. >>> >>> I think it should be called vennerate(). Groan. That's a terrible pun :-) >> Vennerate? >> Cute; but we have str1.partition(str2), and set1.partition(set2) would be >> mathematical partition of the union of both sets returning a tuple of three >> elements - just like str.partition, which makes me vote for the name >> "partition". > > I didn't know about the str.partition method but having looked at it > now what it does is closer to what I think of as a partition of a set > i.e. after > > head, sep, tail = str1.partition(str2) > > we have that > > str1 == head + sep + tail > > By analogy I would expect that after In this case, what's being partitioned isn't set1, but the union of the two sets. > set2, set3, set4 = set1.partition(...) > > we would have > > set1 == set2 | set3 | set4 > > because that is (one property of) a "partition" of set1. However this > is not the meaning you intend. set1 | set2 = only1 | common | only2 [only a quarter serious] In a way, perhaps this ought to be an operator, and return a set of three (frozen)sets. "partition" is not the ideal name for this method, but the actual operation itself is very useful. I have often done this, mostly on dict views rather than sets. In my head, it's an obvious operation, to split a pair of sets into three, spoiled only by lack of a good name. Perhaps "split" is a reasonable name? only1, both, only2 = set1.split(set2) Hmmm... seems reasonable to me. -- Steven From stephen at xemacs.org Sat Jul 6 17:34:51 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 07 Jul 2013 00:34:51 +0900 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51D81DEE.9030505@pearwood.info> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> Message-ID: <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > "partition" is not the ideal name for this method, +1 It would completely confuse anybody who did want a partition. > but the actual operation itself is very useful. I have often done > this, mostly on dict views rather than sets. In my head, it's an > obvious operation, to split a pair of sets into three, spoiled only > by lack of a good name. > > Perhaps "split" is a reasonable name? > > only1, both, only2 = set1.split(set2) -1 Set splitting is an intractable problem. https://en.wikipedia.org/wiki/Set_splitting_problem That wouldn't bother me all that much except that I can imagine all kinds of ways to split sets that have little to do with boolean algebra (starting with Dedekind cuts, you see how messy this will get). This particular operation is quite a ways down my list of candidates. I propose "join".[1] If we consider set1 and set2 as implicitly defining the partition of some universe into set1 and complement of set1, and respectively the partition of the same universe into set2 and its complement, then what you have here is the lattice-theoretic join of the two partitions (the coarsest common refinement), with (as before) "everything else in the universe" being implied (ie, left out of the result). This also generalizes well to more than two sets. I suspect that anybody who wants a true partition join will define a partition class, and will define their own join. So this slight abuse of terminology probably won't cause that much trouble. Footnotes: [1] Or maybe it's the meet? I never can keep the two straight.... From spaghettitoastbook at gmail.com Sat Jul 6 18:40:06 2013 From: spaghettitoastbook at gmail.com (SpaghettiToastBook .) Date: Sat, 6 Jul 2013 12:40:06 -0400 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: (Sent again with the list CC'd) I like the PEP overall, and it seems to cover everything. I'm not sure that the function definition changes should be included due to backward incompatibility. I do have one question about this line: "- ``lambda *args, last: ...`` no longer requires ``last`` to be a keyword only argument" What exactly is backward incompatible about this change? Also, on line 34, "keywords" should be "keyword" instead. ? SpaghettiToastBook ? SpaghettiToastBook On Sat, Jul 6, 2013 at 12:30 AM, Joshua Landau wrote: > The PEP is attached. I'm not sure if I've covered the basics, but it's a try. > > If anyone knows how to get the patch (from the bug report) working, or > where to find http://code.python.org/python/users/twouters/starunpack > after code.python.org was deleted in favour of hg.python.org (which > seems not to have it), it'd be nice to know. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From paddy3118 at gmail.com Sat Jul 6 18:46:30 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Sat, 6 Jul 2013 09:46:30 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: <5e8057ca-c1e3-45b4-9455-39ca6516224f@googlegroups.com> On Wednesday, 3 July 2013 21:50:35 UTC+1, Paddy3118 wrote: > > I found myself repeating something that I know I have used before, several > times: I get two sets of results, may be sets of the passing tests when a > design has changed, and I need to work out what has changed so work out > > 1. What passed first time round > 2. What passed both times. > 3. What passed only the second time round. > > I usually use something like the set equations in the title to do this but > I recognise that this requires both sets to be traversed at least three > times which seems wasteful. > > I wondered if their was am algorithm to partition the two sets of data > into three as above, but cutting down on the number of set traversals? > > I also wondered that if such an algorithm existed, would it be useful > enough to be worth incorporating into the Python library? > > Maybe defined as: > > exclusively1, common, exclusively2 = set1.partition(set2) > - Others who have used this construct have now come to light which was one of my goals. - There is an algorithm that can save on the set traversals and that might be quicker when implemented in C - another of my goals. We are currently arguing on the name which I must admit does need to be agreed, but usually takes a disproportionate amount of time to agree. (Especially when we all think that of course my solution is obviously the best ;-) I re-read the Wikipedia article on Venn diagrams as I am indeed asking for a split of two sets into the regions depicted in a two variable Venn diagram. The punny name *vennerate *gains some credibility, but that awful pun just has to die a quick death. *vennsplit*, *venndiv*, *vennpartition*, *vpartition *are new name candidates on the same theme but without the pun. The division Of two two sets that I am asking for is related to truth tables set1 set2 : Resultant ==== ==== : ========= 0 0 : Items not in set1 and not in set2; - Dumb! 1 0 : Items in set1 and not in set2; exclusively1 0 1 : Items not in set1 and in set2; exclusively2 1 1 : Items not in set1 and not in set2; common The above leads to a generalization for more than two sets; for example three sets and a change in nomenclature for the resultants would lead to: (bin001, bin010, bin011, bin100, bin101, bin110, bin111) = set0.partition(set1, set2) ...But three sets and more should be abandoned I think as being too complicated for little gain. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jul 6 19:26:29 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 07 Jul 2013 03:26:29 +1000 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51D85345.3020707@pearwood.info> On 07/07/13 01:34, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > "partition" is not the ideal name for this method, > > +1 > > It would completely confuse anybody who did want a partition. > > > but the actual operation itself is very useful. I have often done > > this, mostly on dict views rather than sets. In my head, it's an > > obvious operation, to split a pair of sets into three, spoiled only > > by lack of a good name. > > > > Perhaps "split" is a reasonable name? > > > > only1, both, only2 = set1.split(set2) > > -1 > > Set splitting is an intractable problem. > https://en.wikipedia.org/wiki/Set_splitting_problem Paddy's suggested method is a concrete, conceptually simple operation on two finite, discrete sets, not some theoretical problem[1] from complexity theory. If we're going to reject method names because they have some vague relation to some obscure corner of mathematics that 99% of programmers will never have heard of, let alone care about, I think we're going to soon run out of good names. > That wouldn't bother me all that much except that I can imagine all > kinds of ways to split sets that have little to do with boolean > algebra (starting with Dedekind cuts, you see how messy this will > get). The string split method implements *one specific way* of splitting strings, by partitioning on some given delimiter. There are other ways of splitting, say by keeping the delimiter, or by partitioning the string in groups of N characters, or between pairs of brackets, etc. We don't reject the name "split" for strings just because there are alternative ways to split, and we shouldn't reject a simple, descriptive, understandable name "split" for sets just because there are other ways to split sets. > I propose "join".[1] ... > [1] Or maybe it's the meet? I never can keep the two straight.... I don't think that either is appropriate. Join and meet are operations on a single set, not a pair of them: the join of a set is the least upper bound (effectively, the maximum) and the meet is the greatest lower bound (effectively, the minimum) of the set. They are not operations on two sets. http://en.wikipedia.org/wiki/Join_and_meet Alternatively, join and meet can be defined as binary operations on elements of the set, rather than on the set itself. But in any case, I don't think that a method that takes two sets as input and returns three sets should be called a "join". In plain English, when you join two things you get one thing, not three. And if we're going to reject set.partition because it doesn't behave quite the same as str.partition, then we should reject set.join because it doesn't behave anything even slightly like str.join, which is *far* more well-known than str.partition. This is clearly a convenience method. There's already a fast way to calculate the result, it just takes three calls instead of one. This would be a little faster and more convenient but it wouldn't change what we can do with sets. I already have a utility function in my toolbox to calculate this, so it would be a Nice To Have if it were a built-in set method, but not if it means spending three weeks arguing about the method name :-) [1] Not that I mean to imply that there is necessarily no concrete application for this problem. But being intractable, it is unlikely to be proposed or accepted as a set method. -- Steven From ron3200 at gmail.com Sat Jul 6 19:51:20 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 06 Jul 2013 12:51:20 -0500 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <5e8057ca-c1e3-45b4-9455-39ca6516224f@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <5e8057ca-c1e3-45b4-9455-39ca6516224f@googlegroups.com> Message-ID: On 07/06/2013 11:46 AM, Paddy3118 wrote: > We are currently arguing on the name which I must admit does need to be > agreed, but usually takes a disproportionate amount of time to agree. > (Especially when we all think that of course my solution is obviously the > best ;-) A good practice when it comes to names here, if a there isn't a obvious name to use, is to decide on a temporary name while the exact behaviour is being worked out. Later, if there is some consensus for adding the feature to python's library, collect name suggestions and then have an informal poll/discussion for the best name/spelling out of those suggestions. That has worked well in the past and avoids getting bogged down early on because of the name. Cheers, Ron From joshua.landau.ws at gmail.com Sat Jul 6 20:32:33 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 6 Jul 2013 19:32:33 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: On 6 July 2013 17:40, SpaghettiToastBook . wrote: > I like the PEP overall, and it seems to cover everything. Ty. > I'm not sure > that the function definition changes should be included due to > backward incompatibility. I decided when writing this that I'd just go and include everything, seeing as it's easier to remove things I've included than add things I haven't. > I do have one question about this line: > > "- ``lambda *args, last: ...`` no longer requires ``last`` to be a > keyword only argument" > > What exactly is backward incompatible about this change? To be particular, every extension to syntax is backward incompatible. Imagine any case where you have: try: exec("now-valid-syntax") except SyntaxError: expected_to_happen() That's obviously far-fetched, but the point is across. Any code that requires it to fail when "last" isn't passed in will break. For example, code that "builds" arguments to the function using input from the user (Ranger might do this, I'm not sure). It's also a bit far-fetched, but it might count. It might also change the behaviour of certain types of currying. Probably the most important change is that "(lambda *args, last=None: print(args, last))(1, 2, 3, 4, 5)" prints "(1, 2, 3, 4, 5) None", whereas with the PEP as it is will print "(1, 2, 3, 4) 5". This is quite a nonobvious change, so I should add it to the PEP. > Also, on line 34, "keywords" should be "keyword" instead. Additionally, "A further extension to comprehensions is a logical and necessary extension." (line. 108) is quite a silly thing to write. I'll upload these fixes in-bulk when I update the PEP. From joshua.landau.ws at gmail.com Sat Jul 6 20:54:11 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 6 Jul 2013 19:54:11 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D7BA0F.1070506@pearwood.info> Message-ID: On 6 July 2013 08:07, Andrew Barnert wrote: > On Jul 5, 2013, at 23:32, Steven D'Aprano wrote: > >> On 06/07/13 06:25, Terry Reedy wrote: >> >>> When Alex said that it was not possible to determine if the start value is a number, he was talking in the context of old style classes where the type of every user class was 'Class' and the type of every user instance was 'Instance' (or something like that). In Python 3, with ABCs, isinstance(start, Number) would solve the problem as long as the requirement were documented. >> >> >> For the record, it has always been possible to check if something is a number: >> >> >> try: >> x + 0 >> except TypeError: >> print "x is not a number" > > This isn't a very good rule for "is a number". You can add 0 to numpy arrays, for example, and they're not numbers. > > But I think it is actually a good rule for "is summable". If you've got something that's not a number, but 0+x makes sense, summing probably also makes sense. Conversely, if you create some type that is numeric, but isn't addable to 0, you wouldn't be surprised if you couldn't sum it. Hyelll nooo (imagine I said that with distortedly high pitch while wearing a hat). What about summing vectors? You can't tell me that doesn't make sense. Why on earth would you need to implement +0 for vectors? What about summing counters? I don't like your "dogma" about summables. It's summable *if I want to sum it, and it makes sense*. So stop trying to stop me. *Runs away crying* From zuo at chopin.edu.pl Sat Jul 6 22:06:09 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 06 Jul 2013 22:06:09 +0200 Subject: [Python-ideas] =?utf-8?q?exclusively1=2C_common=2C_exclusively2_?= =?utf-8?q?=3D_set1_-_set2=2C_set1_=26_set2=2C_set2_-_set1?= In-Reply-To: <51D85345.3020707@pearwood.info> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> Message-ID: <873e84077caffb95dee71997e672e504@chopin.edu.pl> 06.07.2013 19:26, Steven D'Aprano wrote: > On 07/07/13 01:34, Stephen J. Turnbull wrote: >> Steven D'Aprano writes: >> >> > "partition" is not the ideal name for this method, >> >> +1 >> >> It would completely confuse anybody who did want a partition. >> >> > but the actual operation itself is very useful. I have often >> done >> > this, mostly on dict views rather than sets. In my head, it's an >> > obvious operation, to split a pair of sets into three, spoiled >> only >> > by lack of a good name. >> > >> > Perhaps "split" is a reasonable name? >> > >> > only1, both, only2 = set1.split(set2) >> >> -1 >> >> Set splitting is an intractable problem. >> https://en.wikipedia.org/wiki/Set_splitting_problem [...] >> I propose "join".[1] > ... >> [1] Or maybe it's the meet? I never can keep the two straight.... > > I don't think that either is appropriate. Join and meet are > operations on a single set, not a pair of them [...] Other ideas for the method name: * trisect * trisection * cover * over * overlap * interfere Cheers. *j From grosser.meister.morti at gmx.net Sat Jul 6 22:17:42 2013 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sat, 06 Jul 2013 22:17:42 +0200 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130702211209.6dbde663@sergey> References: <20130702211209.6dbde663@sergey> Message-ID: <51D87B66.2030305@gmx.net> Maybe what you actually want would be list.extend to accept *args? >>> l = [] >>> l.extend(*([[1,2,3]]*1000000)) Or something similar. I think "sum" is about mathematical sums. This would be list concatenation and not building sums. After all, what does addition in the context of lists even mean? In the context of sets it might be meaningful, but for lists? From zuo at chopin.edu.pl Sat Jul 6 22:38:55 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 06 Jul 2013 22:38:55 +0200 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: While I like all the literal/comprehension/etc.-related proposals, those related to function definitions seem to me to be dubious. I am not convinced by the argument of unification with the assignment LHS syntax. Assignment LHS and function parameter processing are inherently different in Python anyway: * the former does not (and rather, even imaginarily, cannot) include **keywords unpacking and any notion of parameter names, * the latter sets *args as a tuple, not as a list. Making possible to define some positional arguments after *args does not seem to be a big win, and would destroy nice simplicity of the way you specify keyword-only arguments. *** Forbidding keyword arguments before *args in function calls does not seem so bad, but still it is a serious backwards incompatibility... And why would we actually want to forbid it? Regards. *j From joshua.landau.ws at gmail.com Sat Jul 6 23:35:08 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 6 Jul 2013 22:35:08 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: On 6 July 2013 21:38, Jan Kaliszewski wrote: > While I like all the literal/comprehension/etc.-related proposals, those > related to function definitions seem to me to be dubious. > > I am not convinced by the argument of unification with the assignment LHS > syntax. > > Assignment LHS and function parameter processing are inherently different in > Python anyway: Agreed, but similarity still lowers cognitive load. > Making possible to define some positional arguments after *args does not > seem to be a big win, and would destroy nice simplicity of the way you > specify keyword-only arguments. My personal interpretation is that it simplifies things; if you want keyword-only arguments, you use a lone star. Otherwise you don't get them. I find that comprehensively simpler, especially as it is more in-tune with assignment. I'm not actually sure what levels of backward-incompatibility are deemed reasonable between releases, so I'm not sure whether there's any point arguing for this. I also seem to have magically created this - it seems not to be in the original implementation. Hence if no-one supports the idea, and since I'm not very attached to it, there's no loss in letting it go. It'd be easier if I had a running version of the implementation to test against (I wouldn't just make things up), but as I said above I'm finding it difficult to figure out. > Forbidding keyword arguments before *args in function calls does not seem so > bad, but still it is a serious backwards incompatibility... And why would we > actually want to forbid it? I included it because my understanding is that it was in the original patch. I'm not sure why anyone would want to forbid it, other than it being easier to write the patch that way. Compatibility aside, I'm not sure why anyone would want to keep it either, though. From joshua.landau.ws at gmail.com Sat Jul 6 23:40:17 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 6 Jul 2013 22:40:17 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <873e84077caffb95dee71997e672e504@chopin.edu.pl> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> Message-ID: On 6 July 2013 21:06, Jan Kaliszewski wrote: > Other ideas for the method name: > > * trisect > * trisection > * cover > * over > * overlap > * interfere .group .group_by .split_with (I still don't get what people have against my version though. A 2-way partition makes sense) From joshua.landau.ws at gmail.com Sat Jul 6 23:41:41 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 6 Jul 2013 22:41:41 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D87B66.2030305@gmx.net> References: <20130702211209.6dbde663@sergey> <51D87B66.2030305@gmx.net> Message-ID: On 6 July 2013 21:17, Mathias Panzenb?ck wrote: > After all, what does addition in the context of lists even mean? What it currently does. What is everyone so confused about? From guido at python.org Sun Jul 7 00:06:48 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Jul 2013 15:06:48 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: On Sat, Jul 6, 2013 at 2:35 PM, Joshua Landau wrote: > On 6 July 2013 21:38, Jan Kaliszewski wrote: >> Forbidding keyword arguments before *args in function calls does not seem so >> bad, but still it is a serious backwards incompatibility... And why would we >> actually want to forbid it? > > I included it because my understanding is that it was in the original patch. > > I'm not sure why anyone would want to forbid it, other than it being > easier to write the patch that way. Compatibility aside, I'm not sure > why anyone would want to keep it either, though. In this case, compatibility trumps everything, and we should keep it for sure. But even if we had a choice, my experience tells me that it's a good thing to keep, because nobody can remember the rules of what goes before what. -- --Guido van Rossum (python.org/~guido) From joshua.landau.ws at gmail.com Sun Jul 7 00:20:17 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 6 Jul 2013 23:20:17 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: On 6 July 2013 23:06, Guido van Rossum wrote: > On Sat, Jul 6, 2013 at 2:35 PM, Joshua Landau > wrote: >> On 6 July 2013 21:38, Jan Kaliszewski wrote: >>> Forbidding keyword arguments before *args in function calls does not seem so >>> bad, but still it is a serious backwards incompatibility... And why would we >>> actually want to forbid it? >> >> I included it because my understanding is that it was in the original patch. >> >> I'm not sure why anyone would want to forbid it, other than it being >> easier to write the patch that way. Compatibility aside, I'm not sure >> why anyone would want to keep it either, though. > > In this case, compatibility trumps everything, and we should keep it for sure. > > But even if we had a choice, my experience tells me that it's a good > thing to keep, because nobody can remember the rules of what goes > before what. Then should we expand to allow arbitrary mixing of keyword and positional arguments (which sounds reasonable if we want to allow keyword arguments before *args, and also treat *args like any positional argument)? From shane at umbrellacode.com Sun Jul 7 00:47:22 2013 From: shane at umbrellacode.com (Shane Green) Date: Sat, 6 Jul 2013 15:47:22 -0700 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <873e84077caffb95dee71997e672e504@chopin.edu.pl> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> Message-ID: <55DEFBE1-E504-42C0-BA69-EEA7BF98B243@umbrellacode.com> subsets? On Jul 6, 2013, at 1:06 PM, Jan Kaliszewski wrote: > 06.07.2013 19:26, Steven D'Aprano wrote: > >> On 07/07/13 01:34, Stephen J. Turnbull wrote: >>> Steven D'Aprano writes: >>> >>> > "partition" is not the ideal name for this method, >>> >>> +1 >>> >>> It would completely confuse anybody who did want a partition. >>> >>> > but the actual operation itself is very useful. I have often done >>> > this, mostly on dict views rather than sets. In my head, it's an >>> > obvious operation, to split a pair of sets into three, spoiled only >>> > by lack of a good name. >>> > >>> > Perhaps "split" is a reasonable name? >>> > >>> > only1, both, only2 = set1.split(set2) >>> >>> -1 >>> >>> Set splitting is an intractable problem. >>> https://en.wikipedia.org/wiki/Set_splitting_problem > [...] >>> I propose "join".[1] >> ... >>> [1] Or maybe it's the meet? I never can keep the two straight.... >> >> I don't think that either is appropriate. Join and meet are >> operations on a single set, not a pair of them > [...] > > Other ideas for the method name: > > * trisect > * trisection > * cover > * over > * overlap > * interfere > > Cheers. > *j > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From joshua.landau.ws at gmail.com Sun Jul 7 01:03:40 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 7 Jul 2013 00:03:40 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: On 6 July 2013 23:20, Joshua Landau wrote: > Then should we expand to allow arbitrary mixing of keyword and > positional arguments (which sounds reasonable if we want to allow > keyword arguments before *args, and also treat *args like any > positional argument)? To give more hints as to what I am saying: Original Proposal:: Function calls may accept an unbound number of ``*`` and ``**`` unpackings, which are allowed anywhere that positional and keyword arguments are allowed respectively. In approximate pseudo-notation:: function( argument or *args, argument or *args, ..., kwargument or **kwargs, kwargument or **kwargs, ... ) This has been rejected, primarily as it is not worth the backwards-incompatibility. Status Quo:: Function calls may accept an unbound number of ``*`` and ``**`` unpackings. Keyword-arguments must follow positional arguments, and ``**`` unpackings must also follow ``*`` unpackings. In approximate pseudo-notation:: function( argument or *args, argument or *args, ..., kwargument or *args, kwargument or *args, ..., kwargument or **kwargs, kwargument or **kwargs, ... ) Looser rulings:: Function calls may accept an unbound number of ``*`` and ``**`` unpackings. Arguments can now occur in any position in a function call. As usual, keyword arguments always go to their respective keys and positional arguments are then placed into the remaining positional slots. In approximate pseudo-notation:: function( argument or keyword_argument or *args or **kwargs, argument or keyword_argument or *args or **kwargs, ... ) From joshua.landau.ws at gmail.com Sun Jul 7 01:11:56 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 7 Jul 2013 00:11:56 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <55DEFBE1-E504-42C0-BA69-EEA7BF98B243@umbrellacode.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <55DEFBE1-E504-42C0-BA69-EEA7BF98B243@umbrellacode.com> Message-ID: On 6 July 2013 23:47, Shane Green wrote: > subsets? ovxrfurqqvat? From shane at umbrellacode.com Sun Jul 7 01:24:09 2013 From: shane at umbrellacode.com (Shane Green) Date: Sat, 6 Jul 2013 16:24:09 -0700 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <55DEFBE1-E504-42C0-BA69-EEA7BF98B243@umbrellacode.com> Message-ID: +1 On Jul 6, 2013, at 4:11 PM, Joshua Landau wrote: > ovxrfurqqvat? From guido at python.org Sun Jul 7 01:34:55 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Jul 2013 16:34:55 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: What do yo mean by Status Quo? No version of Python supports multiple *args or a regular positional arg after *args or after kw=arg. The only flexibility that was added "recently" (in 2.6) is that kw=arg may now follow *args. On Sat, Jul 6, 2013 at 4:03 PM, Joshua Landau wrote: > On 6 July 2013 23:20, Joshua Landau wrote: >> Then should we expand to allow arbitrary mixing of keyword and >> positional arguments (which sounds reasonable if we want to allow >> keyword arguments before *args, and also treat *args like any >> positional argument)? > > To give more hints as to what I am saying: > > Original Proposal:: > > Function calls may accept an unbound number of ``*`` and ``**`` > unpackings, which are allowed anywhere that positional and keyword > arguments are allowed respectively. In approximate pseudo-notation:: > > function( > argument or *args, argument or *args, ..., > kwargument or **kwargs, kwargument or **kwargs, ... > ) > > This has been rejected, primarily as it is not worth the > backwards-incompatibility. > > Status Quo:: > > Function calls may accept an unbound number of ``*`` and ``**`` > unpackings. Keyword-arguments must follow positional arguments, and > ``**`` unpackings must also follow ``*`` unpackings. In approximate > pseudo-notation:: > > function( > argument or *args, argument or *args, ..., > kwargument or *args, kwargument or *args, ..., > kwargument or **kwargs, kwargument or **kwargs, ... > ) > > Looser rulings:: > > Function calls may accept an unbound number of ``*`` and ``**`` > unpackings. Arguments can now occur in any position in a function > call. As usual, keyword arguments always go to their respective keys > and positional arguments are then placed into the remaining positional > slots. In approximate pseudo-notation:: > > function( > argument or keyword_argument or *args or **kwargs, > argument or keyword_argument or *args or **kwargs, > ... > ) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From joshua.landau.ws at gmail.com Sun Jul 7 01:55:46 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 7 Jul 2013 00:55:46 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: On 7 July 2013 00:34, Guido van Rossum wrote: > What do yo mean by Status Quo? No version of Python supports multiple > *args or a regular positional arg after *args or after kw=arg. > > The only flexibility that was added "recently" (in 2.6) is that kw=arg > may now follow *args. It's the one that changes the least -- given that people seem to have accepted [1] multiple unpacking, and that you seem to have already (in the issue tracker) accepted [2] "foo(*a, b, c)"?, should we continue with the restriction that "Keyword-arguments must follow positional arguments, and ``**`` unpackings must also follow ``*`` unpackings"? That has the fewest changes, but I believe that given [1] and [2] that these restrictions are either insufficient (hence the rejected "Original Proposal" from the PEP) or confusing (hence the additional "Looser rulings"). aka. I didn't mean Status Quo as "change nothing", but "change nothing other than those two things we already seem to like". ? And, by extension, one could also assume you support "foo(**a, b=...)" From abarnert at yahoo.com Sun Jul 7 02:30:40 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 6 Jul 2013 17:30:40 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D7BA0F.1070506@pearwood.info> Message-ID: On Jul 6, 2013, at 11:54, Joshua Landau wrote: > On 6 July 2013 08:07, Andrew Barnert wrote: >> On Jul 5, 2013, at 23:32, Steven D'Aprano wrote: >> >>> On 06/07/13 06:25, Terry Reedy wrote: >>> >>>> When Alex said that it was not possible to determine if the start value is a number, he was talking in the context of old style classes where the type of every user class was 'Class' and the type of every user instance was 'Instance' (or something like that). In Python 3, with ABCs, isinstance(start, Number) would solve the problem as long as the requirement were documented. >>> >>> >>> For the record, it has always been possible to check if something is a number: >>> >>> >>> try: >>> x + 0 >>> except TypeError: >>> print "x is not a number" >> >> This isn't a very good rule for "is a number". You can add 0 to numpy arrays, for example, and they're not numbers. >> >> But I think it is actually a good rule for "is summable". If you've got something that's not a number, but 0+x makes sense, summing probably also makes sense. Conversely, if you create some type that is numeric, but isn't addable to 0, you wouldn't be surprised if you couldn't sum it. > > Hyelll nooo (imagine I said that with distortedly high pitch while > wearing a hat). > > What about summing vectors? You can't tell me that doesn't make sense. > Why on earth would you need to implement +0 for vectors? Maybe because it makes perfect sense to treat 0 as a null vector? Or just because it comes for free (and does the right thing) if you're using numpy.array, or complex or quaternion, etc.? > What about summing counters? You're replying to "if you create some type that is numeric, but isn't addable to 0, you wouldn't be surprised if you couldn't sum it." I can easily see considering a vector numeric, but a counter? Anyway, in not sure that I agree with Alex Martelli that sum should be restricted--but if they should, I think 0+x is a much better rule than isinstance(x, Number). From guido at python.org Sun Jul 7 02:36:45 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Jul 2013 17:36:45 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: I guess I have to read the whole PEP first. :-) That will be after July 9th (the Dropbox developer conference). On Sat, Jul 6, 2013 at 4:55 PM, Joshua Landau wrote: > On 7 July 2013 00:34, Guido van Rossum wrote: >> What do yo mean by Status Quo? No version of Python supports multiple >> *args or a regular positional arg after *args or after kw=arg. >> >> The only flexibility that was added "recently" (in 2.6) is that kw=arg >> may now follow *args. > > It's the one that changes the least -- given that people seem to have > accepted [1] multiple unpacking, and that you seem to have already (in > the issue tracker) accepted [2] "foo(*a, b, c)"?, should we continue > with the restriction that "Keyword-arguments must follow positional > arguments, and ``**`` unpackings must also follow ``*`` unpackings"? > > That has the fewest changes, but I believe that given [1] and [2] that > these restrictions are either insufficient (hence the rejected > "Original Proposal" from the PEP) or confusing (hence the additional > "Looser rulings"). > > aka. I didn't mean Status Quo as "change nothing", but "change nothing > other than those two things we already seem to like". > > ? And, by extension, one could also assume you support "foo(**a, b=...)" -- --Guido van Rossum (python.org/~guido) From joshua.landau.ws at gmail.com Sun Jul 7 03:08:15 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 7 Jul 2013 02:08:15 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D7BA0F.1070506@pearwood.info> Message-ID: On 7 July 2013 01:30, Andrew Barnert wrote: > On Jul 6, 2013, at 11:54, Joshua Landau wrote: >> On 6 July 2013 08:07, Andrew Barnert wrote: >>> On Jul 5, 2013, at 23:32, Steven D'Aprano wrote: >>>> >>>> try: >>>> x + 0 >>>> except TypeError: >>>> print "x is not a number" >>> >>> This isn't a very good rule for "is a number". You can add 0 to numpy arrays, for example, and they're not numbers. >>> >>> But I think it is actually a good rule for "is summable". If you've got something that's not a number, but 0+x makes sense, summing probably also makes sense. Conversely, if you create some type that is numeric, but isn't addable to 0, you wouldn't be surprised if you couldn't sum it. >> >> Hyelll nooo (imagine I said that with distortedly high pitch while >> wearing a hat). >> >> What about summing vectors? You can't tell me that doesn't make sense. >> Why on earth would you need to implement +0 for vectors? > > Maybe because it makes perfect sense to treat 0 as a null vector? Or just because it comes for free (and does the right thing) if you're using numpy.array, or complex or quaternion, etc.? I said why would you *need* to, not why would you. You can if you want. >> What about summing counters? > > You're replying to "if you create some type that is numeric, but isn't addable to 0, you wouldn't be surprised if you couldn't sum it." Well, also to: > But I think [being addable to 0] is actually a good rule for "is summable" > I can easily see considering a vector numeric, but a counter? > > Anyway, in not sure that I agree with Alex Martelli that sum should be restricted--but if they should, I think 0+x is a much better rule than isinstance(x, Number). Fair 'nuff. That wasn't clear from what you said, though. From ron3200 at gmail.com Sun Jul 7 04:14:54 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 06 Jul 2013 21:14:54 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D87B66.2030305@gmx.net> Message-ID: On 07/06/2013 04:41 PM, Joshua Landau wrote: > On 6 July 2013 21:17, Mathias Panzenb?ck wrote: >> >After all, what does addition in the context of lists even mean? > What it currently does. > > What is everyone so confused about? They aren't confused. It just isn't a clear cut issue. Ideally... And being "ideal' isn't practical in this case because it will need too many changes. Take the following example. def add_to_values(vs, v): return [n + v for n in vs] Now what do you suppose this should do? Well it it depends on what vs and v are. It might add a value to each item in a list of values, it might add a value to each byte in a bytes string, it might concatenate a string to each string in a list, or it might join a sequence to each sequence in a list. Sounds reasonable doesn't it? Now consider that in some companies, programmers are required to take great care to be sure that the routines that they write can't do the wrong thing with lots of testing to back that up. That simple routine "ideally" should be dead simple, but now it requires some added care to be sure it can't do the wrong thing. Which could also slow it down. :/ Generalised routines are very nice and can save a lot of work, but it is easier to add behaviours than it is to limit unwanted behaviours. The problem here is that the different behaviours use a similar operator at a very low level. But, can we change this? Probably not any time soon. It would mean changing __add__, __iadd__, __mul__, __rmul__, __imul__, and possibly a few others for a lot of different objects to get a clean separation of the behaviours. And we would need new symbols and method names to replace those. So the question becomes how we be more specific in a case like this and avoid the extra conditional expression. This is one way... >>> def add_value_to_many(value, many): ... return [int.__add__(x, value) for x in many] ... >>> add_value_to_many(4, [3, 6, 0]) [7, 10, 4] >>> add_value_to_many("abc", "efg") Traceback (most recent call last): File "", line 1, in File "", line 2, in add_value_to_many File "", line 2, in TypeError: descriptor '__add__' requires a 'int' object but received a 'str' It rejects the unwanted cases without the test. But different operators would have been a bit nicer. Cheers, Ron From steve at pearwood.info Sun Jul 7 04:23:00 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 07 Jul 2013 12:23:00 +1000 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> Message-ID: <51D8D104.2090201@pearwood.info> On 07/07/13 07:40, Joshua Landau wrote: > (I still don't get what people have against my version though. A 2-way > partition makes sense) For the record, your suggestion was to add a convenience method: set1.partition(set2) => (set1 - set2, set1 & set2) and then call it as a two-liner: only1, both = set1.partition(set2) only2 = set2 - set1 instead of Paddy's suggestion (with the method name left unknown): only1, both, only2 = set1.???????(set2) I don't think much of your suggestion. It doesn't solve the stated use-case, where you want a three-way partition of two sets. And I'm having difficulty in thinking of practical examples where I might only want two out of the three "Venn subsets". Since this is a convenience method, it doesn't add any new functionality, it needs to be *more* rather than *less* convenient than what it replaces. -- Steven From steve at pearwood.info Sun Jul 7 04:32:40 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 07 Jul 2013 12:32:40 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D87B66.2030305@gmx.net> Message-ID: <51D8D348.4050704@pearwood.info> On 07/07/13 12:14, Ron Adam wrote: > Take the following example. > > def add_to_values(vs, v): > return [n + v for n in vs] I don't actually see the point of this example, but I'm willing to bear with you. > Now what do you suppose this should do? > > Well it it depends on what vs and v are. It might add a value to each item in a list of values, it might add a value to each byte in a bytes string, it might concatenate a string to each string in a list, or it might join a sequence to each sequence in a list. Sounds reasonable doesn't it? > > Now consider that in some companies, programmers are required to take great care to be sure that the routines that they write can't do the wrong thing with lots of testing to back that up. > > That simple routine "ideally" should be dead simple, but now it requires some added care to be sure it can't do the wrong thing. Which could also slow it down. :/ Can you give an actual example of the above add_to_values function doing the wrong thing? > Generalised routines are very nice and can save a lot of work, but it is easier to add behaviours than it is to limit unwanted behaviours. The problem here is that the different behaviours use a similar operator at a very low level. > > But, can we change this? Probably not any time soon. It would mean changing __add__, __iadd__, __mul__, __rmul__, __imul__, and possibly a few others for a lot of different objects to get a clean separation of the behaviours. And we would need new symbols and method names to replace those. > > So the question becomes how we be more specific in a case like this and avoid the extra conditional expression. This is one way... > >>>> def add_value_to_many(value, many): > ... return [int.__add__(x, value) for x in many] > ... >>>> add_value_to_many(4, [3, 6, 0]) > [7, 10, 4] >>>> add_value_to_many("abc", "efg") > Traceback (most recent call last): > File "", line 1, in > File "", line 2, in add_value_to_many > File "", line 2, in > TypeError: descriptor '__add__' requires a 'int' object but received a 'str' > > It rejects the unwanted cases without the test. But different operators would have been a bit nicer. This doesn't make sense to me. Above, you just said that "concatenate a string to each string in a list" was reasonable, and here you prohibit it. -- Steven From joshua.landau.ws at gmail.com Sun Jul 7 04:35:24 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 7 Jul 2013 03:35:24 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D87B66.2030305@gmx.net> Message-ID: On 7 July 2013 03:14, Ron Adam wrote: > > On 07/06/2013 04:41 PM, Joshua Landau wrote: >> >> On 6 July 2013 21:17, Mathias Panzenb?ck >> wrote: >>> >>> >After all, what does addition in the context of lists even mean? >> >> What it currently does. >> >> What is everyone so confused about? > > > They aren't confused. It just isn't a clear cut issue. > > > Ideally... And being "ideal' isn't practical in this case because it will > need too many changes. > > > Take the following example. > > def add_to_values(vs, v): > return [n + v for n in vs] > > Now what do you suppose this should do? > > Well it it depends on what vs and v are. It might add a value to each item > in a list of values, it might add a value to each byte in a bytes string, it > might concatenate a string to each string in a list, or it might join a > sequence to each sequence in a list. Sounds reasonable doesn't it? > > Now consider that in some companies, programmers are required to take great > care to be sure that the routines that they write can't do the wrong thing > with lots of testing to back that up. So why are they using a duck-typed language? It would imply that they would never be allowed to use operators, call functions they are given or... do anything. It seems hard to work like that. From joshua.landau.ws at gmail.com Sun Jul 7 04:47:33 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 7 Jul 2013 03:47:33 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51D8D104.2090201@pearwood.info> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> Message-ID: On 7 July 2013 03:23, Steven D'Aprano wrote: > On 07/07/13 07:40, Joshua Landau wrote: > >> (I still don't get what people have against my version though. A 2-way >> partition makes sense) > > For the record, your suggestion was to add a convenience method: > > set1.partition(set2) => (set1 - set2, set1 & set2) > > and then call it as a two-liner: > > only1, both = set1.partition(set2) > only2 = set2 - set1 > > > instead of Paddy's suggestion (with the method name left unknown): > > only1, both, only2 = set1.???????(set2) > > > I don't think much of your suggestion. It doesn't solve the stated use-case, > where you want a three-way partition of two sets. Only it does, as the problem was that it was "wasteful" to do the extra passes over the list - which mine solves. If you really need a convenience function for a three-way partition it makes sense to put it in a function and call "partition3(set1, set2)". It looks better as a function, too, as the arguments "act" on each-other, rather than being unidirectional as the method-based syntax implies. > And I'm having difficulty > in thinking of practical examples where I might only want two out of the > three "Venn subsets". I've already given one. When you have a set of and you only want to deal with and pass to the next in the chain, you'd want to do .partition(). There are a lot of circumstances of this sort. Personally, I've no idea why you'd want all three. I'm willing, though, to accept that you have a good reason. > Since this is a convenience method, it doesn't add any > new functionality, it needs to be *more* rather than *less* convenient than > what it replaces. I'm quite surprised you think it's less convenient than the default method. The main reason it was brought up was because of the "wastefulness". If it's really so bad to be forced to write a function, given that the original efficiency problems are solved, may I ask why we still have to write: d = dict1.copy() d.update(dict2) func(d) to pass a function the combination of two dictionaries (in my experience a far more needed utility)? I think that ship has sailed long ago?. ? This may change soon if we get to do "func({**dict1, **dict2})", but that's not the primary motive for the change so w/e. From mertz at gnosis.cx Sun Jul 7 04:53:09 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 6 Jul 2013 19:53:09 -0700 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51D8D104.2090201@pearwood.info> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> Message-ID: On Sat, Jul 6, 2013 at 7:23 PM, Steven D'Aprano wrote: > For the record, your suggestion was to add a convenience method: > set1.partition(set2) => (set1 - set2, set1 & set2) > only1, both = set1.partition(set2) > only2 = set2 - set1 > > instead of Paddy's suggestion (with the method name left unknown): > > only1, both, only2 = set1.???????(set2) > I dislike the appearance of creating a method on the 'set' type. It creates an asymmetry between the respective sets that doesn't really express the sense of the symmetrical operation. However, I *do* think it would be worth having a faster (i.e. C-coded) way of doing the three-way partitioning. Moreover, there's really no reason that such a function couldn't operate on collections (or iterators) other than sets, and being general seems more useful. Therefore, I would suggest adding a C-coded function in the STDLIB--probably in 'collections' to do this. E.g. from collections import segment # tentative name, see various suggestions in thread only1, only2, intersection = segment(iter1, iter2) In behavior, this should do the same thing as the below (just faster): def segment(iter1, iter2): set1, set2 = map(set, (iter1, iter2)) return set1 - set2, set2 - set1, set1 & set2 -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Jul 7 05:40:37 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 6 Jul 2013 20:40:37 -0700 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> Message-ID: <3CF51AC2-20F1-42CB-B976-818178284BE1@yahoo.com> On Jul 6, 2013, at 19:53, David Mertz wrote: > However, I *do* think it would be worth having a faster (i.e. C-coded) way of doing the three-way partitioning. Moreover, there's really no reason that such a function couldn't operate on collections (or iterators) other than sets, and being general seems more useful. If we don't have "intersection" and "difference" functions that work on any pair of iterables, it seems strange to have this function do so. And I think there's a good reason we don't have them, and that same reason applies here: there actually _is_ an inherent asymmetry to all of these algorithms. One of the iterables has to be a set (or at least a Set with fast lookup); the other can be anything, with no change in semantics or performance. In fact, if the large iterable is an iterator, there could even be a substantial improvement in space if the result were three iterators. (Obviously O(M) is much better than O(N+M) when M << N.) However, going again by the parallel with the methods we already have, if we don't have itertools.set_difference we probably don't need itertools.whatever_this_is, just a method on set. From ron3200 at gmail.com Sun Jul 7 06:01:12 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 06 Jul 2013 23:01:12 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51D8D348.4050704@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D87B66.2030305@gmx.net> <51D8D348.4050704@pearwood.info> Message-ID: On 07/06/2013 09:32 PM, Steven D'Aprano wrote: > On 07/07/13 12:14, Ron Adam wrote: > >> Take the following example. >> >> def add_to_values(vs, v): >> return [n + v for n in vs] > > I don't actually see the point of this example, but I'm willing to bear > with you. Yes, It's over simplified to highlight a concept rather than have a specific programming problem to solve. >> Now what do you suppose this should do? >> >> Well it it depends on what vs and v are. It might add a value to each >> item in a list of values, it might add a value to each byte in a bytes >> string, it might concatenate a string to each string in a list, or it >> might join a sequence to each sequence in a list. Sounds reasonable >> doesn't it? >> >> Now consider that in some companies, programmers are required to take >> great care to be sure that the routines that they write can't do the >> wrong thing with lots of testing to back that up. >> >> That simple routine "ideally" should be dead simple, but now it requires >> some added care to be sure it can't do the wrong thing. Which could also >> slow it down. :/ > > Can you give an actual example of the above add_to_values function doing > the wrong thing? It was kept simple to demonstrate a concept. What I was trying to demonstrate is it has to do with the context it's used in, and not something wrong with the example it self. It will do what it it says it will do. Sometimes we want very general behaviour, and sometime we don't want that. Both are good, but it should be easy to do both in a simple way. That is the point I was trying to make. >> Generalised routines are very nice and can save a lot of work, but it is >> easier to add behaviours than it is to limit unwanted behaviours. The >> problem here is that the different behaviours use a similar operator at a >> very low level. >> >> But, can we change this? Probably not any time soon. It would mean >> changing __add__, __iadd__, __mul__, __rmul__, __imul__, and possibly a >> few others for a lot of different objects to get a clean separation of >> the behaviours. And we would need new symbols and method names to >> replace those. >> >> So the question becomes how we be more specific in a case like this and >> avoid the extra conditional expression. This is one way... >> >>>>> def add_value_to_many(value, many): >> ... return [int.__add__(x, value) for x in many] >> ... >>>>> add_value_to_many(4, [3, 6, 0]) >> [7, 10, 4] >>>>> add_value_to_many("abc", "efg") >> Traceback (most recent call last): >> File "", line 1, in >> File "", line 2, in add_value_to_many >> File "", line 2, in >> TypeError: descriptor '__add__' requires a 'int' object but received a 'str' >> >> It rejects the unwanted cases without the test. But different operators >> would have been a bit nicer. > > This doesn't make sense to me. Above, you just said that "concatenate a > string to each string in a list" was reasonable, and here you prohibit it. You missed the question mark right after "reasonable". Yes, it's an example of how to do that if you need to do it. Otherwise you need to add a isinstance() check or something else to get that. Cheers, Ron From ron3200 at gmail.com Sun Jul 7 06:02:28 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 06 Jul 2013 23:02:28 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D87B66.2030305@gmx.net> Message-ID: On 07/06/2013 09:35 PM, Joshua Landau wrote: >> >Now consider that in some companies, programmers are required to take great >> >care to be sure that the routines that they write can't do the wrong thing >> >with lots of testing to back that up. > So why are they using a duck-typed language? It would imply that they > would never be allowed to use operators, call functions they are given > or... do anything. It seems hard to work like that. That is the other extreme, sometime you want both. Cheers, Ron From paddy3118 at gmail.com Sun Jul 7 07:28:23 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Sat, 6 Jul 2013 22:28:23 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> Message-ID: On Sunday, 7 July 2013 03:53:09 UTC+1, David Mertz wrote: > > > from collections import segment # tentative name, see various > suggestions in thread > only1, only2, intersection = segment(iter1, iter2) > > In behavior, this should do the same thing as the below (just faster): > > def segment(iter1, iter2): > set1, set2 = map(set, (iter1, iter2)) > return set1 - set2, set2 - set1, set1 & set2 > Hi David, I really am not sure about generalizing the interface to iterators and then immediately turning them into sets in the implementation. I think the functionality can be naturally explained as an operation on two sets and should be restricted to sets. The caller should have to map other types to sets explicitly. If it turns out that the implemented algorithm in C would work just as well with one of the arguments being any finite iterator and the other needing to be a set then we could still stick with requiring two sets or change to a format of: set_only, common, iter_only = some_set.????(some_iterator) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Jul 7 07:45:12 2013 From: mertz at gnosis.cx (David Mertz) Date: Sat, 6 Jul 2013 22:45:12 -0700 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> Message-ID: Maybe the generalization isn't worthwhile. I was thinking that maybe a more general version should keep order in types that have order to start with, so I confess I'm not certain what the "correct" interface would be. But even if it were only for sets, I like the idea of a plain function much better than a method of a set, even if the only arguments it accepted were sets. On Sat, Jul 6, 2013 at 10:28 PM, Paddy3118 wrote: > > > On Sunday, 7 July 2013 03:53:09 UTC+1, David Mertz wrote: >> >> >> > from collections import segment # tentative name, see various >> suggestions in thread >> only1, only2, intersection = segment(iter1, iter2) >> >> In behavior, this should do the same thing as the below (just faster): >> >> def segment(iter1, iter2): >> set1, set2 = map(set, (iter1, iter2)) >> return set1 - set2, set2 - set1, set1 & set2 >> > > Hi David, I really am not sure about generalizing the interface to > iterators and then immediately turning them into sets in the > implementation. I think the functionality can be naturally explained as an > operation on two sets and should be restricted to sets. The caller should > have to map other types to sets explicitly. > > If it turns out that the implemented algorithm in C would work just as > well with one of the arguments being any finite iterator and the other > needing to be a set then we could still stick with requiring two sets or > change to a format of: > > set_only, common, iter_only = some_set.????(some_iterator) > > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Sun Jul 7 08:26:56 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Sun, 7 Jul 2013 08:26:56 +0200 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: <36609008-BACD-4C17-B5F9-E8396B9FA654@mac.com> On 6 Jul, 2013, at 6:30, Joshua Landau wrote: > The PEP is attached. I'm not sure if I've covered the basics, but it's a try. > > If anyone knows how to get the patch (from the bug report) working, or > where to find http://code.python.org/python/users/twouters/starunpack > after code.python.org was deleted in favour of hg.python.org (which > seems not to have it), it'd be nice to know. As you already noted in your proposal the proposed changes to function definitions are not backward compatible: def func(*args, foo): pass Currently 'foo' is a required keyword argument, with your change it would be just another positional only argument. How would you define keyword-only arguments with your proposal? The only alternative I could come up with is an extension of how you currently define keyword arguments without having a '*args' argument: def func(*args, *, foo): pass This however is currently not valid (SyntaxError) and would therefore make it a lot harder to write functions with keyword-only arguments that work both before and after your proposed change. Ronald > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From joshua.landau.ws at gmail.com Sun Jul 7 08:57:21 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 7 Jul 2013 07:57:21 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: <36609008-BACD-4C17-B5F9-E8396B9FA654@mac.com> References: <36609008-BACD-4C17-B5F9-E8396B9FA654@mac.com> Message-ID: On 7 July 2013 07:26, Ronald Oussoren wrote: > > On 6 Jul, 2013, at 6:30, Joshua Landau wrote: > >> The PEP is attached. I'm not sure if I've covered the basics, but it's a try. >> >> If anyone knows how to get the patch (from the bug report) working, or >> where to find http://code.python.org/python/users/twouters/starunpack >> after code.python.org was deleted in favour of hg.python.org (which >> seems not to have it), it'd be nice to know. > > As you already noted in your proposal the proposed changes to function > definitions are not backward compatible: I wrote the PEP, but I tried not to add any ideas of my own (other than clarifications). Whilst I failed at that -- function signature changes weren't actually in the original implementation -- it's worth keeping me separate from those who proposed the ideas and did the bulk of the work for them. I just liked the idea so am trying to nudge it a bit with this. > def func(*args, foo): pass > > Currently 'foo' is a required keyword argument, with your change it > would be just another positional only argument. How would you define > keyword-only arguments with your proposal? The only alternative I could > come up with is an extension of how you currently define keyword arguments > without having a '*args' argument: > > def func(*args, *, foo): pass > > This however is currently not valid (SyntaxError) and would therefore > make it a lot harder to write functions with keyword-only arguments that > work both before and after your proposed change. As I replied to someone else, "I also seem to have magically created this - it seems not to be in the original implementation. Hence if no-one supports the idea, and since I'm not very attached to it, there's no loss in letting it go." That's a good counter-argument, and no-one seems to support changes to function definitions, so I'll go with the flow and remove it. It's not needed for the rest of the PEP to make sense. I'll probably update the PEP tomorrow, for some skewed definition of tomorrow (I don't really have a sleep-cycle ATM) but how to define function-call grammar is still undefined so I'll lay out the two alternatives that currently make sense to me. From me at dpk.io Sun Jul 7 12:29:03 2013 From: me at dpk.io (David Kendal) Date: Sun, 7 Jul 2013 11:29:03 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) Message-ID: Hi, Python provides a way to iterate characters of a string by using the string as an iterable. But there's no way to iterate over Unicode graphemes (a cluster of characters consisting of a base character plus a number of combining marks and other modifiers -- or what the human eye would consider to be one "character"). I think this ought to be provided either in the unicodedata library, (unicodedata.itergraphemes(string)) which exposes the character database information needed to make this work, or as a method on the built-in str type. (str.itergraphemes() or str.graphemes()) Below is my own implementation of this as a generator, as an example and for reference. --- import unicodedata def itergraphemes(string): def ismodifier(char): return unicodedata.category(char)[0] == 'M' start = 0 for end, char in enumerate(string): if not ismodifier(char) and not start == end: yield string[start:end] start = end yield string[start:] --- Thanks, dpk -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From python at mrabarnett.plus.com Sun Jul 7 16:31:38 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 07 Jul 2013 15:31:38 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: Message-ID: <51D97BCA.5060609@mrabarnett.plus.com> On 07/07/2013 11:29, David Kendal wrote: > Hi, > > Python provides a way to iterate characters of a string by using the string as an iterable. But there's no way to iterate over Unicode graphemes (a cluster of characters consisting of a base character plus a number of combining marks and other modifiers -- or what the human eye would consider to be one "character"). > > I think this ought to be provided either in the unicodedata library, (unicodedata.itergraphemes(string)) which exposes the character database information needed to make this work, or as a method on the built-in str type. (str.itergraphemes() or str.graphemes()) > > Below is my own implementation of this as a generator, as an example and for reference. > > --- > import unicodedata > > def itergraphemes(string): > def ismodifier(char): return unicodedata.category(char)[0] == 'M' > start = 0 > for end, char in enumerate(string): > if not ismodifier(char) and not start == end: > yield string[start:end] > start = end > yield string[start:] > --- > The definition of a grapheme cluster is actually a little more complicated than that. See here: http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries From paddy3118 at gmail.com Sun Jul 7 23:07:40 2013 From: paddy3118 at gmail.com (Paddy3118) Date: Sun, 7 Jul 2013 14:07:40 -0700 (PDT) Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> Message-ID: <1219dca3-7cab-476b-a35e-13ad312d472f@googlegroups.com> On Wednesday, 3 July 2013 21:50:35 UTC+1, Paddy3118 wrote: > > I found myself repeating something that I know I have used before, several > times: I get two sets of results, may be sets of the passing tests when a > design has changed, and I need to work out what has changed so work out > > 1. What passed first time round > 2. What passed both times. > 3. What passed only the second time round. > > I usually use something like the set equations in the title to do this but > I recognise that this requires both sets to be traversed at least three > times which seems wasteful. > > I wondered if their was am algorithm to partition the two sets of data > into three as above, but cutting down on the number of set traversals? > > I also wondered that if such an algorithm existed, would it be useful > enough to be worth incorporating into the Python library? > > Maybe defined as: > > exclusively1, common, exclusively2 = set1.partition(set2) > I've done a related blog entry Set divisions/partitions in which I try for a more general algorithm. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sun Jul 7 23:07:50 2013 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 07 Jul 2013 17:07:50 -0400 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> Message-ID: <51D9D8A6.4050309@trueblade.com> On 7/7/2013 1:45 AM, David Mertz wrote: > Maybe the generalization isn't worthwhile. I was thinking that maybe a > more general version should keep order in types that have order to start > with, so I confess I'm not certain what the "correct" interface would be. > > But even if it were only for sets, I like the idea of a plain function > much better than a method of a set, even if the only arguments it > accepted were sets. If it were added, I think a classmember on set would be reasonable. -- Eric. From mertz at gnosis.cx Sun Jul 7 23:37:56 2013 From: mertz at gnosis.cx (David Mertz) Date: Sun, 7 Jul 2013 14:37:56 -0700 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51D9D8A6.4050309@trueblade.com> References: <027388da-f7da-45c5-8181-510b1a51ef96@googlegroups.com> <51D4D994.7020000@pearwood.info> <51D6656C.1090506@canterbury.ac.nz> <9a2a0b6b-88e3-4c18-8446-c6f3fb7ed6f1@googlegroups.com> <51D81DEE.9030505@pearwood.info> <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> <51D9D8A6.4050309@trueblade.com> Message-ID: On Jul 7, 2013 2:09 PM, "Eric V. Smith" wrote: > > On 7/7/2013 1:45 AM, David Mertz wrote: > > Maybe the generalization isn't worthwhile. I was thinking that maybe a > > more general version should keep order in types that have order to start > > with, so I confess I'm not certain what the "correct" interface would be. > > > > But even if it were only for sets, I like the idea of a plain function > > much better than a method of a set, even if the only arguments it > > accepted were sets. > > If it were added, I think a classmember on set would be reasonable. I agree. > > > -- > Eric. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From zuo at chopin.edu.pl Mon Jul 8 02:56:01 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Mon, 08 Jul 2013 02:56:01 +0200 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: Message-ID: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> On 07.07.2013 01:03, Joshua Landau wrote: > Function calls may accept an unbound number of ``*`` and ``**`` > unpackings. Arguments can now occur in any position in a function > call. As usual, keyword arguments always go to their respective keys > and positional arguments are then placed into the remaining > positional > slots. In approximate pseudo-notation:: > > function( > argument or keyword_argument or *args or **kwargs, > argument or keyword_argument or *args or **kwargs, > ... What do you exactly mean by "remaining positional slots"? Please note that the current behaviour is to raise TypeError when several (more than 1) arguments match the same parameter slot. IMHO it must be kept. Another question is related to this matter as well: if we adopt the idea of more than one **kwargs in function call -- what about key duplication? I.e. whether: fun(**{'a': 1}, **{'a': 2}) ...should raise TypeError as well, or should it be equivalent to fun(a=2)? My first thought was that it should raise TypeError -- prohibition of parameter duplication is a simple and well settled rule for Python function calls. On second thought: it could be relaxed a bit if we agreed about another rule that would be simple enough, e.g.: "for anything *after* the first '**kwargs' (or maybe also bare '**,'?) another rule is applied: later arguments override earlier (looking from left to right), as in dict(...)/.update(...) or as in {**foo, **bar} in literals (if the rest of the PEP is accepted). Cheers. *j From joshua.landau.ws at gmail.com Mon Jul 8 04:58:51 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 8 Jul 2013 03:58:51 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 8 July 2013 01:56, Jan Kaliszewski wrote: > On 07.07.2013 01:03, Joshua Landau wrote: > >> Function calls may accept an unbound number of ``*`` and ``**`` >> unpackings. Arguments can now occur in any position in a function >> call. As usual, keyword arguments always go to their respective keys >> and positional arguments are then placed into the remaining positional >> slots. In approximate pseudo-notation:: >> >> function( >> argument or keyword_argument or *args or **kwargs, >> argument or keyword_argument or *args or **kwargs, >> ... > > > What do you exactly mean by "remaining positional slots"? Please note > that the current behaviour is to raise TypeError when several (more > than 1) arguments match the same parameter slot. IMHO it must be kept. You're right -- I've never gotten that error before, so this is actually new to me. That is a nicer solution, and it keeps things clean. > Another question is related to this matter as well: if we adopt > the idea of more than one **kwargs in function call -- what about > key duplication? I.e. whether: > > fun(**{'a': 1}, **{'a': 2}) > > ...should raise TypeError as well, or should it be equivalent to > fun(a=2)? > > My first thought was that it should raise TypeError -- prohibition > of parameter duplication is a simple and well settled rule for Python > function calls. On second thought: it could be relaxed a bit if we > agreed about another rule that would be simple enough, e.g.: "for > anything *after* the first '**kwargs' (or maybe also bare '**,'?) > another rule is applied: later arguments override earlier (looking > from left to right), as in dict(...)/.update(...) or as in > {**foo, **bar} in literals (if the rest of the PEP is accepted). My first opinion would be that if relaxation is something people find useful, it would be suited to a separate proposal; it seems outside of this PEP's scope ? mon avis. Given: >>> {1:"original", 1:"override"} {1: 'override'} the most consistent behaviours would be what are in the PEP already, and I think that's worth keeping. --- Thinking about examples, the two cases ("status quo" rules [1] and relaxed rules [2]) would allow things like: def f(a, b, c=0, d=0, e=0): ... "Status Quo" rules: f(a, e=e, d=d, *[b, c]) Relaxed rules only: f(a, e=e, d=d, b, c) I brought up the idea for the Relaxed rules because the priority rules for arguments are somewhat complicated when you add in the ability to have multiple *args and **kwargs, and remove the restriction of *args after positionals and **kwargs after positionals. However, considering that the Relaxed rules are never actually useful AFAICT (there's no real reason to define positionals after keywords), this would be a simplification to the specification alone. That'll make it easier to learn the rules, I believe, but simply saying "write your arguments in a sane order" should do more than enough to cover it anyway. Personally, the rule from the issue itself (positionals, then keywords) is the simplest, but I agree with Guido that it's not worth breaking backwards compatibility. In a sense, then, the best way to describe the "Status Quo" as: Positionals, then Keywords -- but *if you must* you are allowed to put "*args" after keywords. I'm still undecided, so I'll leave this for others to comment on. An updated version of the PEP that removes the changes to function definitions and discusses the alternatives for function calls is attached. I haven't double-checked it, so it may be a bit rougher around the edges. -------------- next part -------------- PEP: XXX Title: Additional Unpacking Generalizations Version: $Revision$ Last-Modified: $Date$ Author: Joshua Landau Discussions-To: python-ideas at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 29-Jun-2013 Python-Version: 3.4 Post-History: #TODO Abstract ======== This PEP proposes extended usages of the ``*`` iterable unpacking operator to allow unpacking in more positions, and an arbitrary number of times, and in several additional circumstances. Specifically: Arbitrarily positioned unpacking operators:: >>> print(*[1], *[2], 3) 1 2 3 >>> dict(**{'x': 1}, y=3, **{'z': 2}) {'x': 1, 'y': 2, 'z': 3} Function calls currently have the restriction that keyword arguments must follow positional arguments and ``**`` unpackings must additionally follow ``*`` unpackings. Because of the new levity for ``*`` and ``**`` unpackings, it may be advisable to list some or all of these restritions. As currently, if an argument is given multiple times - such as a positional argument given both positionally and by keyword - a TypeError is raised. Unpacking is proposed to be allowed inside tuples, lists, sets, dictionaries and comprehensions:: >>> *range(4), 4 (0, 1, 2, 3, 4) >>> [*range(4), 4] [0, 1, 2, 3, 4] >>> {*range(4), 4} {0, 1, 2, 3, 4} >>> {'x': 1, **{'y': 2}} {'x': 1, 'y': 2} >>> ranges = [range(i) for i in range(5)] >>> [*item for item in ranges] [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] Rationale ========= Current usage of the ``*`` iterable unpacking operator features unnecessary restrictions that can harm readability. Unpacking multiple times has an obvious rationale. When you want to unpack several iterables into a function definition or follow an unpack with more positional arguments, the most natural way would be to write:: function(**kw_arguments, **more_arguments) function(*arguments, argument) Simple examples where this is useful are ``print`` and ``str.format``. Instead, you could be forced to write:: kwargs = dict(kw_arguments) kwargs.update(more_arguments) function(**kwargs) args = list(arguments) args.append(arg) function(*args) or, if you know to do so:: from collections import ChainMap function(**ChainMap(more_arguments, arguments)) from itertools import chain function(*chain(args, [arg])) which add unnecessary line-noise and, with the first methods, causes duplication of work. There are two primary rationale for unpacking inside of containers. Firstly there is a symmetry of assignment, where ``fst, *other, lst = elems`` and ``elems = fst, *other, lst`` are approximate inverses, ignoring the specifics of types. This, in effect, simplifies the language by removing special cases. Secondly, it vastly simplifies types of "addition" such as combining dictionaries, and does so in an unambiguous and well-defined way:: combination = {**first_dictionary, "x": 1, "y": 2} instead of:: combination = first_dictionary.copy() combination.update({"x": 1, "y": 2}) which is especially important in contexts where expressions are preferred. This is also useful as a more readable way of summing many lists, such as ``my_list + list(my_tuple) + list(my_range)`` which is now equivalent to just ``[*my_list, *my_tuple, *my_range]``. The addition of unpacking to comprehensions is a logical extension. It's usage will primarily be a neat replacement for ``[i for j in 2D_list for i in j]``, as the more readable ``[*l for l in 2D_list]``. Other uses are possible, but expected to occur rarely. Specification ============= Function calls may accept an unbound number of ``*`` and ``**`` unpackings. Function calls currently have the restriction that keyword arguments must follow positional arguments and ``**`` unpackings must additionally follow ``*`` unpackings. Because of the new levity for ``*`` and ``**`` unpackings, it may be advisable to list some or all of these restritions. As currently, if an argument is given multiple times - such as a positional argument given both positionally and by keyword - a TypeError is raised. If the restrictions are kept, a function call will look like this:: function( argument or *args, argument or *args, ..., kwargument or *args, kwargument or *args, ..., kwargument or **kwargs, kwargument or **kwargs, ... ) If they are removed completely, a function call will look like this:: function( argument or keyword_argument or `*`args or **kwargs, argument or keyword_argument or `*`args or **kwargs, ... ) Tuples, lists, sets and dictionaries will allow unpacking. This will act as if the elements from unpacked item were inserted in order at the site of unpacking, much as happens in unpacking in a function-call. Dictionaries require ``**`` unpacking, all the others require ``*`` unpacking. A dictionary's key remain in a right-to-left priority order, so ``{**{'a': 1}, 'a': 2, **{'a': 3}}`` evaluates to ``{'a': 3}``. Comprehensions, by simple extension, will support unpacking. As before, dictionaries require ``**`` unpacking, all the others require ``*`` unpacking and key priorities are unchanged. Examples include:: {*[1, 2, 3], 4, 5} (*e for e in [[1], [3, 4, 5], [2]]) {**dictionary for dictionary in (globals(), locals())} {**locals(), "override": None} Disadvantages ============= If the current restrictions for function call arguments (keyword arguments must follow positional arguments and ``**`` unpackings must additionally follow ``*`` unpackings) are kept, the allowable orders for arguments in a function call is more complicated than before. The simplest explanation for the rules may be "positional arguments come first and keyword arguments follow, but ``*`` unpackings are allowed after keyword arguments". If the current restrictions are lifted, there are no obvious gains to code as the only new orders that are allowed look silly: ``f(a, e=e, d=d, b, c)`` being a simpler example. Whilst ``*elements, = iterable`` causes ``elements`` to be a list, ``elements = *iterable,`` causes ``elements`` to be a tuple. The reason for this is may not be obvious at first glance, and may confuse people unfamiliar with the construct. Implementation ============== An implementation for an old version of Python 3 is found at Issue 2292 on bug tracker [1]_, although several changes should be made. Several changes should be made: - It has yet to be updated to the most recent Python version - It features a now redundant replacement for "yield from" which should be removed - It also loses support for calling function with keyword arguments before positional arguments, which is an unnecessary backwards-incompatible change - If the restrictions on the order of arguments in a function call are partially or fully lifted, they would need to be included References ========== .. [1] Issue 2292, "Missing `*`-unpacking generalizations", Thomas Wouters (http://bugs.python.org/issue2292) .. [2] Discussion on Python-ideas list, "list / array comprehensions extension", Alexander Heger (http://mail.python.org/pipermail/python-ideas/2011-December/013097.html) Copyright ========= This document has been placed in the public domain. From joshua.landau.ws at gmail.com Mon Jul 8 05:02:40 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 8 Jul 2013 04:02:40 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 8 July 2013 03:58, Joshua Landau wrote: > On 8 July 2013 01:56, Jan Kaliszewski wrote: >> Another question is related to this matter as well: if we adopt >> the idea of more than one **kwargs in function call -- what about >> key duplication? I.e. whether: >> >> fun(**{'a': 1}, **{'a': 2}) >> >> ...should raise TypeError as well, or should it be equivalent to >> fun(a=2)? >> >> My first thought was that it should raise TypeError -- prohibition >> of parameter duplication is a simple and well settled rule for Python >> function calls. On second thought: it could be relaxed a bit if we >> agreed about another rule that would be simple enough, e.g.: "for >> anything *after* the first '**kwargs' (or maybe also bare '**,'?) >> another rule is applied: later arguments override earlier (looking >> from left to right), as in dict(...)/.update(...) or as in >> {**foo, **bar} in literals (if the rest of the PEP is accepted). > > My first opinion would be that if relaxation is something people find > useful, it would be suited to a separate proposal; it seems outside of > this PEP's scope ? mon avis. Also, surely if the PEP goes through it would be easy enough to write: func(**{**kwargs, **overlapping_kwargs}) which is a more explicit, less special case method. It's less-efficient, but it should handle the simple cases. I'm not sure if it makes intuitive sense though. From steve at pearwood.info Mon Jul 8 09:22:46 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Jul 2013 17:22:46 +1000 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: References: <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> <51D9D8A6.4050309@trueblade.com> Message-ID: <20130708072246.GB32148@ando> On Sun, Jul 07, 2013 at 02:37:56PM -0700, David Mertz wrote: > On Jul 7, 2013 2:09 PM, "Eric V. Smith" wrote: > > > > On 7/7/2013 1:45 AM, David Mertz wrote: > > > Maybe the generalization isn't worthwhile. I was thinking that maybe a > > > more general version should keep order in types that have order to start > > > with, so I confess I'm not certain what the "correct" interface would > be. > > > > > > But even if it were only for sets, I like the idea of a plain function > > > much better than a method of a set, even if the only arguments it > > > accepted were sets. > > > > If it were added, I think a classmember on set would be reasonable. > > I agree. A class member? Do you mean a class *method*? I think it would be freaky and weird if I did this: some_set.venn_split(second_set, another_set) (for lack of a better name) and the value of some_set was ignored. Class methods are okay for things like alternate constructors, but I don't think they are appropriate here. -- Steven From shane at umbrellacode.com Mon Jul 8 09:47:50 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 8 Jul 2013 00:47:50 -0700 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <20130708072246.GB32148@ando> References: <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> <51D9D8A6.4050309@trueblade.com> <20130708072246.GB32148@ando> Message-ID: Yeah, needs to be on sets/dictionary types otherwise question of equality versus identity comes into play. Not to mention the behavior of shared gets pretty ambiguous: if there is one of them in one and two in the other, do two of them go in common, one in common and xor2. I think it makes sense as a standard method of the set type, if it makes sense at all; and also that "vennsubs" might be a good name. Could accept any itterable composed of hashable (that's what makes a viable key, righ?) items. On Jul 8, 2013, at 12:22 AM, Steven D'Aprano wrote: > On Sun, Jul 07, 2013 at 02:37:56PM -0700, David Mertz wrote: >> On Jul 7, 2013 2:09 PM, "Eric V. Smith" wrote: >>> >>> On 7/7/2013 1:45 AM, David Mertz wrote: >>>> Maybe the generalization isn't worthwhile. I was thinking that maybe a >>>> more general version should keep order in types that have order to start >>>> with, so I confess I'm not certain what the "correct" interface would >> be. >>>> >>>> But even if it were only for sets, I like the idea of a plain function >>>> much better than a method of a set, even if the only arguments it >>>> accepted were sets. >>> >>> If it were added, I think a classmember on set would be reasonable. >> >> I agree. > > A class member? Do you mean a class *method*? > > I think it would be freaky and weird if I did this: > > some_set.venn_split(second_set, another_set) > > (for lack of a better name) and the value of some_set was ignored. Class > methods are okay for things like alternate constructors, but I don't > think they are appropriate here. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From eric at trueblade.com Mon Jul 8 10:24:39 2013 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 08 Jul 2013 04:24:39 -0400 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <20130708072246.GB32148@ando> References: <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> <51D9D8A6.4050309@trueblade.com> <20130708072246.GB32148@ando> Message-ID: <51DA7747.3000906@trueblade.com> On 7/8/2013 3:22 AM, Steven D'Aprano wrote: > On Sun, Jul 07, 2013 at 02:37:56PM -0700, David Mertz wrote: >> On Jul 7, 2013 2:09 PM, "Eric V. Smith" wrote: >>> >>> On 7/7/2013 1:45 AM, David Mertz wrote: >>>> Maybe the generalization isn't worthwhile. I was thinking that maybe a >>>> more general version should keep order in types that have order to start >>>> with, so I confess I'm not certain what the "correct" interface would >> be. >>>> >>>> But even if it were only for sets, I like the idea of a plain function >>>> much better than a method of a set, even if the only arguments it >>>> accepted were sets. >>> >>> If it were added, I think a classmember on set would be reasonable. >> >> I agree. > > A class member? Do you mean a class *method*? I did mean classmethod, thanks. Or maybe staticmethod, I haven't really thought it through. The point being, it need not be an instance method. > I think it would be freaky and weird if I did this: > > some_set.venn_split(second_set, another_set) > > (for lack of a better name) and the value of some_set was ignored. Class > methods are okay for things like alternate constructors, but I don't > think they are appropriate here. set.venn_split(second_set, another_set) It's no more surprising than this code not using the values from d: >>> d = {'a':1, 'b':2} >>> d.fromkeys([3, 4, 5]) {3: None, 4: None, 5: None} versus: >>> dict.fromkeys([3, 4, 5]) {3: None, 4: None, 5: None} -- Eric. From p.f.moore at gmail.com Mon Jul 8 10:31:59 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Jul 2013 09:31:59 +0100 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51DA7747.3000906@trueblade.com> References: <87r4fbvh0k.fsf@uwakimon.sk.tsukuba.ac.jp> <51D85345.3020707@pearwood.info> <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> <51D9D8A6.4050309@trueblade.com> <20130708072246.GB32148@ando> <51DA7747.3000906@trueblade.com> Message-ID: On 8 July 2013 09:24, Eric V. Smith wrote: > > I think it would be freaky and weird if I did this: > > > > some_set.venn_split(second_set, another_set) > > > > (for lack of a better name) and the value of some_set was ignored. Class > > methods are okay for things like alternate constructors, but I don't > > think they are appropriate here. > > set.venn_split(second_set, another_set) > > It's no more surprising than this code not using the values from d: > > >>> d = {'a':1, 'b':2} > >>> d.fromkeys([3, 4, 5]) > {3: None, 4: None, 5: None} > > versus: > > >>> dict.fromkeys([3, 4, 5]) > {3: None, 4: None, 5: None} Surely the point is that in s.venn_split(s1, s2) the *value* of s might be ignored, but the *type* of s should be the type of the results? So subclassing works "as expected" (for some value of the word "expected" :-)) Paul. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jul 8 11:38:32 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Jul 2013 19:38:32 +1000 Subject: [Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1 In-Reply-To: <51DA7747.3000906@trueblade.com> References: <873e84077caffb95dee71997e672e504@chopin.edu.pl> <51D8D104.2090201@pearwood.info> <51D9D8A6.4050309@trueblade.com> <20130708072246.GB32148@ando> <51DA7747.3000906@trueblade.com> Message-ID: <20130708093832.GC32148@ando> On Mon, Jul 08, 2013 at 04:24:39AM -0400, Eric V. Smith wrote: > On 7/8/2013 3:22 AM, Steven D'Aprano wrote: > > On Sun, Jul 07, 2013 at 02:37:56PM -0700, David Mertz wrote: > >> On Jul 7, 2013 2:09 PM, "Eric V. Smith" wrote: > >>> > >>> On 7/7/2013 1:45 AM, David Mertz wrote: > >>>> Maybe the generalization isn't worthwhile. I was thinking that maybe a > >>>> more general version should keep order in types that have order to start > >>>> with, so I confess I'm not certain what the "correct" interface would > >> be. > >>>> > >>>> But even if it were only for sets, I like the idea of a plain function > >>>> much better than a method of a set, even if the only arguments it > >>>> accepted were sets. > >>> > >>> If it were added, I think a classmember on set would be reasonable. > >> > >> I agree. > > > > A class member? Do you mean a class *method*? > > I did mean classmethod, thanks. Or maybe staticmethod, I haven't really > thought it through. The point being, it need not be an instance method. Strictly speaking, you're right. Being Python, you could make this any sort of callable you like. But what would be the point of making it something other than either an instance method or a function? It's an operation that requires two set arguments. The obvious way to handle it is as either a function with signature func(set1, set2) or as a method with signature method(self, other). There are no points for "Most Unusual and Imaginative API". (A third alternative would be an operator, __op__(self, other), but I'm not seriously suggesting that.) > > I think it would be freaky and weird if I did this: > > > > some_set.venn_split(second_set, another_set) > > > > (for lack of a better name) and the value of some_set was ignored. Class > > methods are okay for things like alternate constructors, but I don't > > think they are appropriate here. > > set.venn_split(second_set, another_set) > > It's no more surprising than this code not using the values from d: > > >>> d = {'a':1, 'b':2} > >>> d.fromkeys([3, 4, 5]) > {3: None, 4: None, 5: None} > > versus: > > >>> dict.fromkeys([3, 4, 5]) > {3: None, 4: None, 5: None} It is standard behaviour for alternate constructors to ignore the value of the instance, hence they are usually classmethods. A constructor normally depends on the class, not the instance: dict(args), not {}(args). And the args generally don't include an instance of the class you are trying to make. They can, of course, but generally we have things like Decimal.from_float(arg). Even when you call the constructor from an instance, you don't want the instance itself to make any difference to the result, the result should be specified entirely by the arguments. So constructors should ignore the instance, and should be classmethods. None of these things apply to "venn_split". We're not talking about a constructor, but a set operation that requires two set arguments. One is conventionally taken to be `self`, the other explicitly given: class set: def venn_split(self, other): ... not: class set: @classmethod def venn_split(cls, set1, set2): # ignore the value of cls ... Making this "venn-split" operation a class or static method would be as weird as making (say) str.split a class method: "a,b,c,d".split(",") => raise TypeError, too few arguments "a,b,c,d".split("a,b,c,d", ",") => ["a", "b", "c", "d"] That's a strange API. This is Python, you can do it, but you shouldn't. -- Steven From mal at egenix.com Mon Jul 8 13:54:22 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 08 Jul 2013 13:54:22 +0200 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: Message-ID: <51DAA86E.4010903@egenix.com> On 07.07.2013 12:29, David Kendal wrote: > Hi, > > Python provides a way to iterate characters of a string by using the string as an iterable. But there's no way to iterate over Unicode graphemes (a cluster of characters consisting of a base character plus a number of combining marks and other modifiers -- or what the human eye would consider to be one "character"). > > I think this ought to be provided either in the unicodedata library, (unicodedata.itergraphemes(string)) which exposes the character database information needed to make this work, or as a method on the built-in str type. (str.itergraphemes() or str.graphemes()) > > Below is my own implementation of this as a generator, as an example and for reference. > > --- > import unicodedata > > def itergraphemes(string): > def ismodifier(char): return unicodedata.category(char)[0] == 'M' > start = 0 > for end, char in enumerate(string): > if not ismodifier(char) and not start == end: > yield string[start:end] > start = end > yield string[start:] > --- Sounds like a good idea. Could you open a ticket for this to hash out the details ? Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 08 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-16: Python Meeting Duesseldorf ... 8 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From me at dpk.io Mon Jul 8 20:26:27 2013 From: me at dpk.io (David Kendal) Date: Mon, 8 Jul 2013 19:26:27 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: <51DAA86E.4010903@egenix.com> References: <51DAA86E.4010903@egenix.com> Message-ID: On 8 Jul 2013, at 12:54, M.-A. Lemburg wrote: > Sounds like a good idea. > > Could you open a ticket for this to hash out the details ? Done! > Thanks, > -- > Marc-Andre Lemburg > eGenix.com dpk -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From me at dpk.io Mon Jul 8 20:27:31 2013 From: me at dpk.io (David Kendal) Date: Mon, 8 Jul 2013 19:27:31 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: <51DAA86E.4010903@egenix.com> Message-ID: On 8 Jul 2013, at 19:26, David Kendal wrote: >> Could you open a ticket for this to hash out the details ? > > Done! Ooops. Should have included a link, sorry. dpk -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From bruce at leapyear.org Mon Jul 8 20:52:39 2013 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 8 Jul 2013 11:52:39 -0700 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: Message-ID: On Sun, Jul 7, 2013 at 3:29 AM, David Kendal wrote: > Python provides a way to iterate characters of a string by using the > string as an iterable. But there's no way to iterate over Unicode graphemes > (a cluster of characters consisting of a base character plus a number of > combining marks and other modifiers -- or what the human eye would consider > to be one "character"). > > I think this ought to be provided either in the unicodedata library, > (unicodedata.itergraphemes(string)) which exposes the character database > information needed to make this work, or as a method on the built-in str > type. (str.itergraphemes() or str.graphemes()) A common case is wanting to extract the current grapheme or move forward or backward one. Please consider these other use cases rather than just adding an iterator. g = unicodedata.grapheme_cluster(str, i) # extracts cluster that includes index i (i may be in the middle of the cluster) i = unicodedata.grapheme_start(str, i) # if i is the start of the cluster, returns i; otherwise backs up to the start of the cluster i = unicodedata.previous_cluster(str, i) # moves i to the first index of the previous cluster; returns None if no previous cluster in the string i = unicodedata.next_cluster(str, i) # moves i to the first index of the next cluster; returns None if no next cluster in the String I think these belongs in unicodedata, not str. --- Bruce I'm hiring: http://www.geekwork.com/opportunity/1225-job-software-developer-cadencemd Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Jul 8 22:02:37 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 8 Jul 2013 13:02:37 -0700 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: Message-ID: I think the API Bruce suggests, along with its module location in 'unicodedata' makes more sense than the iterator only. But it seems to me that it would still be useful to explicitly break a string into its component clusters with a similar function. E.g.: graphemes = unicodedata.grapheme_clusters(str) # Returns an iterator of strings, often single characters for g in graphemes: ... It wouldn't be very hard to implement 'grapheme_clusters' in terms of the API Bruce suggests, but I feel like it should have a standard name and API along with those others. Actually, I guess the implementation is just: def grapheme_clusters(s): for i in range(len(str)): if i == unicodedata.grapheme_start(s, i): yield unicodedata.grapheme_cluster(s, i) On Mon, Jul 8, 2013 at 11:52 AM, Bruce Leban wrote: > > On Sun, Jul 7, 2013 at 3:29 AM, David Kendal wrote: > > Python provides a way to iterate characters of a string by using the >> string as an iterable. But there's no way to iterate over Unicode graphemes >> (a cluster of characters consisting of a base character plus a number of >> combining marks and other modifiers -- or what the human eye would consider >> to be one "character"). >> >> I think this ought to be provided either in the unicodedata library, >> (unicodedata.itergraphemes(string)) which exposes the character database >> information needed to make this work, or as a method on the built-in str >> type. (str.itergraphemes() or str.graphemes()) > > > A common case is wanting to extract the current grapheme or move forward > or backward one. Please consider these other use cases rather than just > adding an iterator. > > g = unicodedata.grapheme_cluster(str, i) # extracts cluster that > includes index i (i may be in the middle of the cluster) > i = unicodedata.grapheme_start(str, i) # if i is the start of the > cluster, returns i; otherwise backs up to the start of the cluster > i = unicodedata.previous_cluster(str, i) # moves i to the first index of > the previous cluster; returns None if no previous cluster in the string > i = unicodedata.next_cluster(str, i) # moves i to the first index of the > next cluster; returns None if no next cluster in the String > > > I think these belongs in unicodedata, not str. > > --- Bruce > I'm hiring: > http://www.geekwork.com/opportunity/1225-job-software-developer-cadencemd > Latest blog post: Alice's Puzzle Page http://www.vroospeak.com > Learn how hackers think: http://j.mp/gruyere-security > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Mon Jul 8 22:22:34 2013 From: sergemp at mail.ru (Sergey) Date: Mon, 8 Jul 2013 23:22:34 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <20130708232234.72de4688@sergey> On Jul 5, 2013 Andrew Barnert wrote: >> Yes, you can! You just need to implement a fast __iadd__ or __add__ >> for your type to make it O(n) summable. > > Well, in Python 3.3, or even 2.3, you just need to implement a > fast __add__. So, if that's an acceptable answer, then we don't need > to do anything. [...] >> And you can always do that,?can't you? > > No. You can't implement a fast __iadd__ for tuple. Well... Yes, I can! I can't make __iadd__ faster, because tuple has no __iadd__, however I can make a faster __add__. But as long as sum() is the only (?) function suffering from this problem it was easier to do that optimization in sum() instead. > If that's going to be extendable to other types Are there any other types (except strings, that are blocked anyway)? Looks like tuple is the only built-in type having no fast __iadd__, and sum() is the only function suffering from that. So, to make sum() "fast for everything by default" we only need to make sum use __iadd__ and write a special case for sum+tuples. >> ? To make advantage of faster __iadd__ implementation sum() uses it >> ? if possible. sum() has linear time complexity for built-in types >> ? and all types providing constant-time `__iadd__` or `__add__`. > Note that integer addition isn't actually constant, it's linear on > the word size of the integers. Which means sum isn't linear for > integers. That still makes sum O(N) with N being total number of bytes in all integers. Or how do you suggest to explain sum() complexity in docs? That text seems fair enough for me. Something like: sum() has complexity O(N)*C where N is the total number of elements and C is complexity of single __iadd__ operation or __add__ operation if __iadd__ is not supported. looks too cumbersome. >> Personally I don't think such implementation would be very useful. > > Given that nearly every program every written in Lisp and its > successors makes heavy use of such an implementation, I submit > that your intuition may be wrong. They don't have much choice. And they're not using sum(). We're talking about python, and discussing use of sum() in python for such lists in particular. It's just you said: > I'm hostile to any attempt to encourage people to treat sum() as > the preferred way to concatenate large amounts of data, because > that will surely lead them into bad habits and will end with them > trying to sum() a lot of tuples or linked lists or something and > getting O(n**2) performance. Which implies that using such implementation with sum could lead to O(N**2) performance. But it could not, because such implementation is not addable. So, no problem. To have a problem you must modify your implementation. And if you're changing it anyway, you have the power to solve this problem too. Explicitly writing about sum() complexity in documentation makes sure that you're aware of possible slowness. >> (It's certainly unusual for python's ?batteries included? philosophy). > > What does "batteries included" have to do with excluding data types? Excluding data types? What do you mean? I wasn't sure what you meant when you said about problems with linked lists, so I was thinking that you mean something like "if one day linked lists get their way into python standard libraries and people will try using sum() on them..." That's why I said that such implementation would be too skimped. >> What I would call a linked-list is a separate type where your nodes >> are just its internal representation. > > If you make the lists and nodes separate types, and the nodes > private,?you have to create yet a third type,?like the list::iterator > type in C++ If there would be a need for it, why not? Or maybe I won't need it (then I get something like collections.deque, a good tool by the way). >> I don't like the idea that `a` implicitly changes when I change `b`. > > Then you don't like linked lists. Saying "There is a way to make > it work, just make it do something different, which is O(N) instead > of O(1) and won't work with the algorithms you want to use" is not an > answer. That wasn't saying "just make it do something different". That was saying "you can have linked lists in python, that are O(N) summable". > No, you can make one operation fast, at the expense of making > every other operation slow. That's not a good tradeoff. In that implementation I'm making many operations fast. Basically, I make ALL inplace-operations fast. :) As a bonus I get a guarantee that I won't implicitly modify another variable while modifying this one. For someone it's not a good tradeoff, for others it is. > And again, you've also ignored the fact that, performance aside, > it _breaks every algorithm people use mutable linked lists for_. Those algos don't use sum(). And lists they use are not summable. To get a problem you need different lists. So it does not matter. >> But you CAN have a fast __iadd__ even for your simple a.next case! >> You only need to initialize `tail` before calling sum() and update >> it inside __iadd__ > > Initialize tail where exactly? In a global variable? Wherever you want. In a global variable, or in a class variable near 'next'. -- From bruce at leapyear.org Mon Jul 8 22:26:50 2013 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 8 Jul 2013 13:26:50 -0700 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: Message-ID: On Mon, Jul 8, 2013 at 1:02 PM, David Mertz wrote: > I think the API Bruce suggests, along with its module location in > 'unicodedata' makes more sense than the iterator only. > > But it seems to me that it would still be useful to explicitly break a > string into its component clusters with a similar function. E.g.: > > graphemes = unicodedata.grapheme_clusters(str) # Returns an iterator of > strings, often single characters > for g in graphemes: ... > > It wouldn't be very hard to implement 'grapheme_clusters' in terms of the > API Bruce suggests, but I feel like it should have a standard name and API > along with those others. Actually, I guess the implementation is just: > > def grapheme_clusters(s): > for i in range(len(str)): > if i == unicodedata.grapheme_start(s, i): > yield unicodedata.grapheme_cluster(s, i) > Yes, I still think the iterator is useful. I'd use the following implementation instead as the above is going to find the start of each multi-char grapheme multiple times. def grapheme_clusters(s): if len(str): i = 0 while i is not None: yield unicodedata.grapheme_cluster(s, i) i = unicodedata.grapheme_next(str, i) This does "if len(str)" at the top rather than just "if str" so it raises if passed a non-iterable like None rather than silently accepting it. --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Mon Jul 8 22:41:20 2013 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 08 Jul 2013 21:41:20 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: Message-ID: <51DB23F0.909@mrabarnett.plus.com> On 08/07/2013 21:26, Bruce Leban wrote: > > On Mon, Jul 8, 2013 at 1:02 PM, David Mertz > wrote: > > I think the API Bruce suggests, along with its module location in > 'unicodedata' makes more sense than the iterator only. > > But it seems to me that it would still be useful to explicitly break > a string into its component clusters with a similar function. E.g.: > > graphemes = unicodedata.grapheme_clusters(str) # Returns an > iterator of strings, often single characters > for g in graphemes: ... > > It wouldn't be very hard to implement 'grapheme_clusters' in terms > of the API Bruce suggests, but I feel like it should have a standard > name and API along with those others. Actually, I guess the > implementation is just: > > def grapheme_clusters(s): > for i in range(len(str)): > if i == unicodedata.grapheme_start(s, i): > yield unicodedata.grapheme_cluster(s, i) > > > Yes, I still think the iterator is useful. I'd use the following > implementation instead as the above is going to find the start of each > multi-char grapheme multiple times. > > def grapheme_clusters(s): > if len(str): > i = 0 > while i is not None: > yield unicodedata.grapheme_cluster(s, i) > i = unicodedata.grapheme_next(str, i) > > > This does "if len(str)" at the top rather than just "if str" so it > raises if passed a non-iterable like None rather than silently accepting it. > If it's any help, the alternative regex implementation at: http://pypi.python.org/pypi/regex supports matching graphemes, although that bit is written in C. From antony.lee at berkeley.edu Mon Jul 8 23:27:44 2013 From: antony.lee at berkeley.edu (Antony Lee) Date: Mon, 8 Jul 2013 14:27:44 -0700 Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body Message-ID: Currently, during the execution of the body of the Enum declaration, member names are bound to the values, not to the Enum members themselves. For example class StateMachine(Enum): A = {} B = {1: A} # e.g. a transition table StateMachine.B[1] == {}, when one could have expected StateMachine.B[1] == StateMachine.A It seems to me that a behavior where member names are bound to the members instead of being bound to the values is more useful, as one can easily retrieve the values from the members but not the other way round (at least during the execution of class body). Initially, I thought that this could be changed by modifying _EnumDict, so that its __setitem__ method sets the member in the dict, instead of the value, but in fact this doesn't work because while the values are being set in the _EnumDict the class itself doesn't exist yet (and for good reason: the __init__ and __new__ methods may be defined later but there is no way to know that). However, a possible solution could to momentarily create Enum members as instances of some dummy class, and then later, after execution of class body has completed, change the members' class to the actual Enum and initialize them as needed (if an __init__ or a __new__ are actually defined). Well, there are limitations with this approach (e.g. the members are not fully initialized before class body finishes to execute) but this seems better than the current behavior(?) Best, Antony -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Mon Jul 8 23:58:22 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 08 Jul 2013 16:58:22 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130708232234.72de4688@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> Message-ID: On 07/08/2013 03:22 PM, Sergey wrote: >>> >>(It's certainly unusual for python's ?batteries included? philosophy). >> > >> >What does "batteries included" have to do with excluding data types? > Excluding data types? What do you mean? > > I wasn't sure what you meant when you said about problems with linked > lists, so I was thinking that you mean something like "if one day > linked lists get their way into python standard libraries and people > will try using sum() on them..." That's why I said that such > implementation would be too skimped. We need to keep in mind how python programmer will use the tools, "Batteries", we add to the library. Python programmers do currently write programs with linked lists. And they use what ever is in the library they think is useful... the whole point of "batteries included". -Ron From ethan at stoneleaf.us Tue Jul 9 02:12:35 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 08 Jul 2013 17:12:35 -0700 Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body In-Reply-To: References: Message-ID: <51DB5573.5070004@stoneleaf.us> On 07/08/2013 02:27 PM, Antony Lee wrote: > Currently, during the execution of the body of the Enum declaration, member names are bound to the values, not to the > Enum members themselves. For example > > class StateMachine(Enum): > A = {} > B = {1: A} # e.g. a transition table > > StateMachine.B[1] == {}, when one could have expected StateMachine.B[1] == StateMachine.A > > It seems to me that a behavior where member names are bound to the members instead of being bound to the values is more > useful, as one can easily retrieve the values from the members but not the other way round (at least during the > execution of class body). > > Initially, I thought that this could be changed by modifying _EnumDict, so that its __setitem__ method sets the member > in the dict, instead of the value, but in fact this doesn't work because while the values are being set in the _EnumDict > the class itself doesn't exist yet (and for good reason: the __init__ and __new__ methods may be defined later but there > is no way to know that). However, a possible solution could to momentarily create Enum members as instances of some > dummy class, and then later, after execution of class body has completed, change the members' class to the actual Enum > and initialize them as needed (if an __init__ or a __new__ are actually defined). Well, there are limitations with this > approach (e.g. the members are not fully initialized before class body finishes to execute) but this seems better than > the current behavior(?) Part of the problem here would be maintaining the linkage when the temp enum object from _EnumDict was translated into an actual Enum member. One possible work around is to store the name of the member instead: class StateMachine(Enum): A = {} B = {1:'A'} then the other methods can either dereference the name with an __getitem__ look-up, or the class can be post-processed with a decorator to change the strings back to actual members... hmmm, maybe a post_process hook in the metaclass would make sense? -- ~Ethan~ From haoyi.sg at gmail.com Tue Jul 9 02:39:50 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 9 Jul 2013 08:39:50 +0800 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> Message-ID: > Python programmers do currently write programs with linked lists. And they use what ever is in the library they think is useful... the whole point of "batteries included". On the other hand, python programmers do currently write programs with normal lists, and sum(), and using sum() on lists would be useful. They're BOTH in the standard library too! In fact, this seems like a perfect argument for having sum() work on lists! The objection seems like optimizing for a hypothetical, potential use case far in the future at the expense of a concrete use case now. "if you can't make it do EVERYTHING, then we shouldn't make it do ANYTHING or people will get confused!" (paraphrased and exaggerated for theatrical effect). This is all on the assumption that you consider flattening list-of-lists a concrete use case. I for one find it annoying that i have to write a verbose long thingy every time i need to flatten lists, and I have probably needed to flatten lists about a dozen times in the last 3 months and used linked lists not-at-all. Maybe you write a ton of pure-functional algorithms making great use of the persistence of singly-linked-lists for performant non-mutating head-updates, and never use vanilla lists in the code you write, and so having sum() work on linked lists is of great import. On Tue, Jul 9, 2013 at 5:58 AM, Ron Adam wrote: > > > On 07/08/2013 03:22 PM, Sergey wrote: > >> >>(It's certainly unusual for python's ?batteries included? philosophy). >>>> >>> > >>> >What does "batteries included" have to do with excluding data types? >>> >> Excluding data types? What do you mean? >> >> I wasn't sure what you meant when you said about problems with linked >> lists, so I was thinking that you mean something like "if one day >> linked lists get their way into python standard libraries and people >> will try using sum() on them..." That's why I said that such >> implementation would be too skimped. >> > > We need to keep in mind how python programmer will use the tools, > "Batteries", we add to the library. > > Python programmers do currently write programs with linked lists. And > they use what ever is in the library they think is useful... the whole > point of "batteries included". > > -Ron > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jul 9 02:45:56 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Jul 2013 17:45:56 -0700 (PDT) Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130708232234.72de4688@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> Message-ID: <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> First, let me summarize my position, because you're tangling up my separate points, and even tangling other people's points up with mine. I'm -1 on adding special-casing for tuples that would not be available for any other immutable type. I'm -0.75 on adding flexible special-casing that could be extended to other types from Python. I'm -1 on encouraging people to treat sum as the obvious way to concatenate any kind of sequence. I'm -0 on adding explicit code to sum to raise when start is a non-number (as Alex Martelli wanted), a sequence, or something that can't do 0+start. I'm between +1 and -0 on changing sum to use copy.copy(start) and __iadd__, depending on whether there are any types that seem like they should be obviously summable and would gain from this change. Adding a bunch of numpy.arrays starting from 0 seems like a reasonable use for sum, as does adding a bunch of timedeltas to a datetime; would either of those benefit from the __iadd__ optimization? If so, I'd be behind that. From: Sergey Sent: Monday, July 8, 2013 1:22 PM > On Jul 5, 2013 Andrew Barnert wrote: > >>> Yes, you can! You just need to implement a fast __iadd__ or __add__ >>> for your type to make it O(n) summable. >> >> Well, in Python 3.3, or even 2.3, you just need to implement a >> fast __add__. So, if that's an acceptable answer, then we don't > need >> to do anything. > [...] >>> And you can always do that,?can't you? >> >> No. You can't implement a fast __iadd__ for tuple. > > Well... Yes, I can! No, you can't. You can do something different, but only by modifying the C source to sum. > I can't make __iadd__ faster, because tuple has > no __iadd__, however I can make a faster __add__. No, you can't make tuple.__add__ faster either. (If you can, please submit a patch, because that would be useful completely independent of sum.) > But as long as sum() > is the only (?) function suffering from this problem it was easier to > do that optimization in sum() instead. No, because that requires putting an optimization into sum for _any_ type that's fast-summable but not by using its __add__ or __iadd__, and that's not feasible. >> If that's going to be extendable to other types > > Are there any other types (except strings, that are blocked anyway)? Yes. Every single immutable type. Immutable types do not have __iadd__, for the obvious reason. Whether they're builtin, stdlib, third-party, or part of the application, they will not be fast-summable, and there will be no way to make them fast-summable without patching the interpreter. blist.btuple has the same problem.?You wanted sets to be addable? Then frozenset would have this problem. Strings obviously have this problem, and any third-party immutable string type will as well. So, if you're suggesting that sum can be fast for anything reasonable, you're just wrong. Adding a custom optimization for tuple only helps tuple; it does not help immutable types in general.?And that's why I think it's a bad idea to add custom code to sum for tuples. >>> Personally I don't think such implementation would be very useful. >> >> Given that nearly every program every written in Lisp and its >> successors makes heavy use of such an implementation, I submit >> that your intuition may be wrong. > > They don't have much choice. And they're not using sum(). Of course they have a choice. You think you can't use dynamic arrays, doubly-linked lists, balanced trees, etc. in Lisp? They use cons lists because there are some algorithms that are natural and fast with cons lists?and because those algorithms work well with appropriate fold/reduce-type functions.? And Python's sum (like Common Lisp's sum) is a reduce-type function. In fact, other than the optimization for numbers, sum(iterable, start=0)?really is just reduce(operator.__add__, iterable, start). > We're talking about python, and discussing use of sum() in python for > such lists in particular. No. You insisted that every collection type is O(N) summable with your design. Others argued that this isn't true.?You asked for an example. I offered cons lists. Instead of accepting that, you began arguing that nobody would ever want to use such a type, and then suggesting that if you just changed the implementation and all of the performance characteristics of the type, it would become O(N) summable. Now you've come around to the very position you were arguing against. If you agree that your design is not a general solution for al sequences, then all of your misunderstandings about cons lists are a red herring, and we can drop them. >It's just you said: > >> I'm hostile to any attempt to encourage people to treat sum() as >> the preferred way to concatenate large amounts of data, because >> that will surely lead them into bad habits and will end with them >> trying to sum() a lot of tuples or linked lists or something and >> getting O(n**2) performance. First, that wasn't me; please try to keep your attributions straight. But I agree with the sentiment. There is an efficient way to concatenate a lot of cons lists (basically, just fold backward instead of forward), but sum is not it. Similarly, there is an efficient way to concatenate a lot of tuples or other immutable types, but sum is not it. So, whoever said that is right?encouraging people to treat sum() as the preferred way to concatenate large amounts of data is a bad idea. Here's how to concatenate a bunch of cons lists in linear rather than quadratic time (on the resulting list length):?Walk to the end of the first list, link the next pointer to the start of the second list, walk to the end of that, etc. Easy and efficient, and completely impossible to implement in terms of a repeated __add__ or __iadd__ on the head node (or separate list object). > Which implies that using such implementation with sum could lead to > O(N**2) performance. But it could not, because such implementation > is not addable. So, no problem. > > To have a problem you must modify your implementation. And if you're > changing it anyway, you have the power to solve this problem too. No, you don't. It's very easy to add an O(N) extend or __add__ to a cons list type, but it's very hard to make it O(N) summable with the sum function. >>> (It's certainly unusual for python's ?batteries included? > philosophy). >> >> What does "batteries included" have to do with excluding data > types? > > Excluding data types? What do you mean? > > I wasn't sure what you meant when you said about problems with linked > lists, so I was thinking that you mean something like "if one day > linked lists get their way into python standard libraries and people > will try using sum() on them..." That's why I said that such > implementation would be too skimped. That's nonsense. If Python were to add something like a cons list?which I don't think would ever happen, but if it did?it would certainly not be a defective version that wasn't useful with the algorithms that people want cons lists for. "Batteries included" doesn't imply that every collection has to implement the same API as list with the same performance guarantees. You clearly don't understand how people use cons lists, and therefore you don't understand why your suggested "improvement" makes them much weaker. >>> What I would call a linked-list is a separate type where your nodes >>> are just its internal representation. >> >> If you make the lists and nodes separate types, and the nodes >> private,?you have to create yet a third type,?like the list::iterator >> type in C++ > > If there would be a need for it, why not? Agreed. That makes the APIs a little more complicated (you need to a list and a list::iterator instead of just a node), but that's not a big deal. And, with (list, list::iterator) being equivalent to a node, it leads to exactly the same issues as you started with in having just a node type. > Or maybe I won't need it > (then I get something like collections.deque, a good tool by the way). Yes, deque is a great tool, but it's not the same tool as a linked list, and doesn't support the same algorithms. (Note that C++ has both of them, as separate types.)? >>> I don't like the idea that `a` implicitly changes when I change > `b`. >> >> Then you don't like linked lists. Saying "There is a way to make >> it work, just make it do something different, which is O(N) instead >> of O(1) and won't work with the algorithms you want to use" is not > an >> answer. > > That wasn't saying "just make it do something different". That was > saying "you can have linked lists in python, that are O(N) summable". Which is exactly the point you were arguing against. If you now agree with everyone else, fine. There are types that can be efficiently concatenated, but not with sum. That's why everyone else thinks you shouldn't encourage people to use sum for general concatenation. >>> But you CAN have a fast __iadd__ even for your simple a.next case! >>> You only need to initialize `tail` before calling sum() and update >>> it inside __iadd__ >> >> Initialize tail where exactly? In a global variable? > > Wherever you want. In a global variable, or in a class variable near > 'next'. Using a global variable (or a class attribute, which is the same thing) means that sum isn't reentrant, or thread-safe, or generator-safe. Trying to sum two completely different iterables at the same time will break. You really think that's a sensible design? Why even have a list type; just make the head and tail into global variables and everything is fine, right? From haoyi.sg at gmail.com Tue Jul 9 03:03:27 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 9 Jul 2013 09:03:27 +0800 Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body In-Reply-To: <51DB5573.5070004@stoneleaf.us> References: <51DB5573.5070004@stoneleaf.us> Message-ID: > then the other methods can either dereference the name with an __getitem__ look-up, or the class can be post-processed with a decorator to change the strings back to actual members... hmmm, maybe a post_process hook in the metaclass would make sense? Having real strings be part of the enums data members is a pretty common thing, and working through and trying to identify the linkage-strings from normal-strings seems very magical to me. Is there some metaclass-magic way to intercept the usage of A, to instead put the enum instance there? Also, for this to be useful for your described use case, (state machines yay!) you'd probably want to be able to define back/circular references, which i think isn't currently possible. The obvious thing to do would be to somehow make the RHS of the assignments lazy, which would allow out-of-order and circular assignments with a very nice, unambigious: class StateMachine(Enum): "Useless ping-pong state machine" A = {1: B} B = {1: A} But short of using macros to do an AST transform, I don't know if such a thing is possible at all. -Haoyi On Tue, Jul 9, 2013 at 8:12 AM, Ethan Furman wrote: > On 07/08/2013 02:27 PM, Antony Lee wrote: > >> Currently, during the execution of the body of the Enum declaration, >> member names are bound to the values, not to the >> Enum members themselves. For example >> >> class StateMachine(Enum): >> A = {} >> B = {1: A} # e.g. a transition table >> >> StateMachine.B[1] == {}, when one could have expected StateMachine.B[1] >> == StateMachine.A >> >> It seems to me that a behavior where member names are bound to the >> members instead of being bound to the values is more >> useful, as one can easily retrieve the values from the members but not >> the other way round (at least during the >> execution of class body). >> >> Initially, I thought that this could be changed by modifying _EnumDict, >> so that its __setitem__ method sets the member >> in the dict, instead of the value, but in fact this doesn't work because >> while the values are being set in the _EnumDict >> the class itself doesn't exist yet (and for good reason: the __init__ and >> __new__ methods may be defined later but there >> is no way to know that). However, a possible solution could to >> momentarily create Enum members as instances of some >> dummy class, and then later, after execution of class body has completed, >> change the members' class to the actual Enum >> and initialize them as needed (if an __init__ or a __new__ are actually >> defined). Well, there are limitations with this >> approach (e.g. the members are not fully initialized before class body >> finishes to execute) but this seems better than >> the current behavior(?) >> > > Part of the problem here would be maintaining the linkage when the temp > enum object from _EnumDict was translated into an actual Enum member. > > One possible work around is to store the name of the member instead: > > class StateMachine(Enum): > A = {} > B = {1:'A'} > > then the other methods can either dereference the name with an __getitem__ > look-up, or the class can be post-processed with a decorator to change the > strings back to actual members... hmmm, maybe a post_process hook in the > metaclass would make sense? > > -- > ~Ethan~ > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jul 9 03:02:49 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Jul 2013 18:02:49 -0700 (PDT) Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> Message-ID: <1373331769.70011.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Haoyi Li Sent: Monday, July 8, 2013 5:39 PM > I for one find it annoying that i have to write a verbose long thingy every time i need to flatten lists What verbose long thingy? You just need to write: ? ? flatten = itertools.chain.from_iterable Or, if you use more-itertools: ? ? from more_itertools import flatten Someone suggested moving this to builtins as "chain" or "flatten" or similar. I'm at least +0 on this.? At any rate, it's about as simple as can be.? Because it gives you an iterable, you don't pay the cost (in time or space) of building a list unless you need to, which means in many use cases it's actually much faster than sum, at the cost of being a little bit slower for lists when you do need a list. It also?flattens any iterable of sequences, or even any iterable of iterables, in the same time?linear in the total size. With no custom optimizations, it works with tuples, blist.sortedlists,?cons lists, or _anything else you can throw at it_. And it's obvious what it does. If you sum three sortedlists, do you get the first list's elements, then the second, then the third, or a merge of the three? I don't know. If you chain or flatten three sortedlists, it's obvious which one you get. From haoyi.sg at gmail.com Tue Jul 9 03:25:21 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 9 Jul 2013 09:25:21 +0800 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <1373331769.70011.YahooMailNeo@web184703.mail.ne1.yahoo.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373331769.70011.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: > Someone suggested moving this to builtins as "chain" or "flatten" or similar. I'm at least +0 on this. That would be nice too; I consider both imports and third-party-modules to be part of the length-of-code for boilerplate-accounting purposes (not sure if others do), and having an additional `import itertools; flatten = blahblah` brings it up to 3 lines of code to flatten a list by that technique. My files aren't very long, so I can't just do it once at the top of everything.py and use it throughout my project. Three lines of code to flatten a list! Or two-lines with a third party package! May aswell use a for loop and accumulator. > And it's obvious what it does. If you sum three sortedlists, do you get the first list's elements, then the second, then the third, or a merge of the three? I don't know. If you chain or flatten three sortedlists, it's obvious which one you get. Does python even *have* a sortedlist in the standard library? If they're third-party-module classes, then having to do some thinking to see what the builtins do to them seems acceptable to me; it's up to the sortedlist guy to document that "yeah you can sum my sortedlists and it'll do XXX". I mean you can probably *already* sum() them, and it'll do the same thing, just probably slower. I would also really like a nice object-oriented sortedlist to be part of the standard library, instead of that nasty C-style heapq stuff. I'd still rather we make use of the nice new generic dispatch stuff to make our builtins re-usable, rather than having a set of crufty only-really-works-on-builtins functions cluttering the *global* namespaces, and having to manually pick from *another* bunch of non-generic custom functions to do the same (conceptual) thing to different types. I don't think that's going to happen anytime before Python 4.0 though. This sort of "yeah, function_x() doesn't work on y, you have to use special_function_z() to concat ys and thingy_z() to concat ws" is workable, but reminds me of my bad-old php days. -Haoyi On Tue, Jul 9, 2013 at 9:02 AM, Andrew Barnert wrote: > From: Haoyi Li > Sent: Monday, July 8, 2013 5:39 PM > > > > I for one find it annoying that i have to write a verbose long thingy > every time i need to flatten lists > > What verbose long thingy? You just need to write: > > flatten = itertools.chain.from_iterable > > Or, if you use more-itertools: > > from more_itertools import flatten > > Someone suggested moving this to builtins as "chain" or "flatten" or > similar. I'm at least +0 on this. > > At any rate, it's about as simple as can be. > > Because it gives you an iterable, you don't pay the cost (in time or > space) of building a list unless you need to, which means in many use cases > it's actually much faster than sum, at the cost of being a little bit > slower for lists when you do need a list. > > > It also flattens any iterable of sequences, or even any iterable of > iterables, in the same time?linear in the total size. With no custom > optimizations, it works with tuples, blist.sortedlists, cons lists, or > _anything else you can throw at it_. > > And it's obvious what it does. If you sum three sortedlists, do you get > the first list's elements, then the second, then the third, or a merge of > the three? I don't know. If you chain or flatten three sortedlists, it's > obvious which one you get. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Tue Jul 9 03:30:54 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 08 Jul 2013 20:30:54 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> Message-ID: On 07/08/2013 07:39 PM, Haoyi Li wrote: > > Python programmers do currently write programs with linked lists. And > they use what ever is in the library they think is useful... the whole > point of "batteries included". > > On the other hand, python programmers do currently write programs with > normal lists, and sum(), and using sum() on lists would be useful. They're > BOTH in the standard library too! In fact, this seems like a perfect > argument for having sum() work on lists! On *another* hand. These are complementary points, not opposing points. ;-) > The objection seems like optimizing for a hypothetical, potential use case > far in the future at the expense of a concrete use case now. "if you can't > make it do EVERYTHING, then we shouldn't make it do ANYTHING or people will > get confused!" (paraphrased and exaggerated for theatrical effect). Extra dramma noted! > This is all on the assumption that you consider flattening list-of-lists a > concrete use case. I for one find it annoying that i have to write a > verbose long thingy every time i need to flatten lists, ... I think this is the reason Alex Martinelli regretted using __add__ to concatenate strings. And I agree. A method would have worked very well. s1.cat(s2, s3, s4) Or if they were long string literals: "".cat("... s1 ...", "... s2 ...", "... s3 ...") This would work very well in many situations and is a fine alternative to implicit string concatenations in my opinion. Although it doesn't solve the "don't iterate a strings characters when flattening a bunch of iterable objects problem. Cheers, Ron > ...and I have probably > needed to flatten lists about a dozen times in the last 3 months and used > linked lists not-at-all. > > Maybe you write a ton of pure-functional algorithms making great use of the > persistence of singly-linked-lists for performant non-mutating > head-updates, and never use vanilla lists in the code you write, and so > having sum() work on linked lists is of great import. From steve at pearwood.info Tue Jul 9 03:48:38 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 09 Jul 2013 11:48:38 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130708232234.72de4688@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> Message-ID: <51DB6BF6.9030608@pearwood.info> On 09/07/13 06:22, Sergey wrote: > On Jul 5, 2013 Andrew Barnert wrote: > >>> Yes, you can! You just need to implement a fast __iadd__ or __add__ >>> for your type to make it O(n) summable. >> >> Well, in Python 3.3, or even 2.3, you just need to implement a >> fast __add__. So, if that's an acceptable answer, then we don't need >> to do anything. > [...] >>> And you can always do that, can't you? >> >> No. You can't implement a fast __iadd__ for tuple. > > Well... Yes, I can! I can't make __iadd__ faster, because tuple has > no __iadd__, however I can make a faster __add__. And how do you expect to do that? Tuples are immutable, you have to create a new tuple. So when adding a sequence of N tuples together, you end up making and throwing away N-1 intermediate results. > But as long as sum() > is the only (?) function suffering from this problem it was easier to > do that optimization in sum() instead. That's the big question though. Is summing a sequence of tuples important and common enough to justify special-casing it in sum? Just how many special cases can be justified? >> If that's going to be extendable to other types > > Are there any other types (except strings, that are blocked anyway)? > > Looks like tuple is the only built-in type having no fast __iadd__, I don't think so: py> for type in (str, bytes, bytearray, tuple, frozenset, object): ... print(type.__name__, hasattr(type, '__iadd__')) ... str False bytes False bytearray True tuple False frozenset False object False Okay, you can't sum() frozensets at all, but there's at least three types that support + that don't support __iadd__ (str, bytes, tuple), and by default anything inheriting from object will not have __iadd__ either. > and sum() is the only function suffering from that. So, to make sum() > "fast for everything by default" we only need to make sum use __iadd__ > and write a special case for sum+tuples. > >>> To make advantage of faster __iadd__ implementation sum() uses it >>> if possible. sum() has linear time complexity for built-in types >>> and all types providing constant-time `__iadd__` or `__add__`. > >> Note that integer addition isn't actually constant, it's linear on >> the word size of the integers. Which means sum isn't linear for >> integers. I don't think that strictly matters, since the N in O(N) we're talking about is the number of ints being added, not the number of bits in each int. What is true is that adding a pair of ints is only constant-time if they are sufficiently small (and maybe not even then). -- Steven From steve at pearwood.info Tue Jul 9 04:16:43 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 09 Jul 2013 12:16:43 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> Message-ID: <51DB728B.2040709@pearwood.info> On 09/07/13 10:39, Haoyi Li wrote: >> Python programmers do currently write programs with linked lists. And > they use what ever is in the library they think is useful... the whole > point of "batteries included". > > On the other hand, python programmers do currently write programs with > normal lists, and sum(), and using sum() on lists would be useful. They're > BOTH in the standard library too! In fact, this seems like a perfect > argument for having sum() work on lists! sum() does work on lists. The fact that sum(lists) has had quadratic performance since sum was first introduced in Python 2.3, and I've *never* seen anyone complain about it being slow, suggests very strongly that this is not a use-case that matters. I would bet that people simply do not sum large numbers of lists. > The objection seems like optimizing for a hypothetical, potential use case > far in the future at the expense of a concrete use case now. "if you can't > make it do EVERYTHING, then we shouldn't make it do ANYTHING or people will > get confused!" (paraphrased and exaggerated for theatrical effect). No no no. The objection is that complicating the implementation of a function in order to optimize a use-case that doesn't come up in real-world use is actually harmful. Maintaining sum will be harder, for the sake of some benefit that very possibly nobody will actually receive. I don't care that sum() is O(N**2) on strings, linked lists, tuples, lists. I don't think we should care. Sergey thinks we should care, and is willing to change the semantics of sum AND include as many special cases as needed in order to "guarantee" that sum will be "always fast". I don't believe that guarantee can possibly hold, and I'm dubious about the change in semantics and what I see as the standard library making misleading performance guarantees. > This is all on the assumption that you consider flattening list-of-lists a > concrete use case. I for one find it annoying that i have to write a > verbose long thingy every time i need to flatten lists, and I have probably > needed to flatten lists about a dozen times in the last 3 months and used > linked lists not-at-all. Flattening sequences is not sum. You have to consider the question, what counts as an atomic type? Should you recursively flatten sub-sub-lists? What to do with lists that contain a reference to themselves? sum() is correspondingly trivial, you just add the items, end of story. Writing a general purpose flattener is hard, and horribly easy to either under- or over-engineer. I have in my private software collection about half a dozen half-finished flatten() utility functions, because periodically I think it "might be useful", but because I've never actually needed the damn thing, I'm never motivated to complete it. Non-recursively flattening a list of lists is trivial: flattened = [] for sublist in lists: flattened.extend(sublist) Three short lines of code. You could write that a dozen times a day, every day for a year, and not come close to the amount of discussion we've had on this sum() thread :-) -- Steven From haoyi.sg at gmail.com Tue Jul 9 04:23:16 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 9 Jul 2013 10:23:16 +0800 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51DB728B.2040709@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> Message-ID: > Three short lines of code. You could write that a dozen times a day, every day for a year, and not come close to the amount of discussion we've had on this sum() thread :-) That's very true; on the other hand, I don't have to go back and debug the discussions we've had in this thread, and I hope to have many long years of writing code ahead of me before i pass on =) -Haoyi On Tue, Jul 9, 2013 at 10:16 AM, Steven D'Aprano wrote: > On 09/07/13 10:39, Haoyi Li wrote: > >> Python programmers do currently write programs with linked lists. And >>> >> they use what ever is in the library they think is useful... the whole >> point of "batteries included". >> >> On the other hand, python programmers do currently write programs with >> normal lists, and sum(), and using sum() on lists would be useful. They're >> BOTH in the standard library too! In fact, this seems like a perfect >> argument for having sum() work on lists! >> > > sum() does work on lists. > > The fact that sum(lists) has had quadratic performance since sum was first > introduced in Python 2.3, and I've *never* seen anyone complain about it > being slow, suggests very strongly that this is not a use-case that matters. > > I would bet that people simply do not sum large numbers of lists. > > > > The objection seems like optimizing for a hypothetical, potential use case >> far in the future at the expense of a concrete use case now. "if you can't >> make it do EVERYTHING, then we shouldn't make it do ANYTHING or people >> will >> get confused!" (paraphrased and exaggerated for theatrical effect). >> > > No no no. The objection is that complicating the implementation of a > function in order to optimize a use-case that doesn't come up in real-world > use is actually harmful. Maintaining sum will be harder, for the sake of > some benefit that very possibly nobody will actually receive. > > I don't care that sum() is O(N**2) on strings, linked lists, tuples, > lists. I don't think we should care. Sergey thinks we should care, and is > willing to change the semantics of sum AND include as many special cases as > needed in order to "guarantee" that sum will be "always fast". I don't > believe that guarantee can possibly hold, and I'm dubious about the change > in semantics and what I see as the standard library making misleading > performance guarantees. > > > > This is all on the assumption that you consider flattening list-of-lists a >> concrete use case. I for one find it annoying that i have to write a >> verbose long thingy every time i need to flatten lists, and I have >> probably >> needed to flatten lists about a dozen times in the last 3 months and used >> linked lists not-at-all. >> > > > Flattening sequences is not sum. You have to consider the question, what > counts as an atomic type? Should you recursively flatten sub-sub-lists? > What to do with lists that contain a reference to themselves? sum() is > correspondingly trivial, you just add the items, end of story. Writing a > general purpose flattener is hard, and horribly easy to either under- or > over-engineer. I have in my private software collection about half a dozen > half-finished flatten() utility functions, because periodically I think it > "might be useful", but because I've never actually needed the damn thing, > I'm never motivated to complete it. > > Non-recursively flattening a list of lists is trivial: > > flattened = [] > for sublist in lists: > flattened.extend(sublist) > > Three short lines of code. You could write that a dozen times a day, every > day for a year, and not come close to the amount of discussion we've had on > this sum() thread :-) > > > > -- > Steven > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Tue Jul 9 04:46:18 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 08 Jul 2013 19:46:18 -0700 Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body In-Reply-To: References: <51DB5573.5070004@stoneleaf.us> Message-ID: <51DB797A.9060002@stoneleaf.us> On 07/08/2013 06:03 PM, Haoyi Li wrote: On 07/08/2013 Ethan Furman wrote: >> >> then the other methods can either dereference the name with an __getitem__ >> look-up, or the class can be post-processed with a decorator to change the >> strings back to actual members... hmmm, maybe a post_process hook in the >> metaclass would make sense? > > Having real strings be part of the enums data members is a pretty common > thing, and working through and trying to identify the linkage-strings from > normal-strings seems very magical to me. Is there some metaclass-magic way > to intercept the usage of A, to instead put the enum instance there? The post-processing routines would be specific to this enumeration, so they should know which strings are enum references and which aren't. > Also, for this to be useful for your described use case, (state machines yay!) you'd probably want to be able to define > back/circular references, which i think isn't currently possible. The obvious thing to do would be to somehow make the > RHS of the assignments lazy, which would allow out-of-order and circular assignments with a very nice, unambigious: > > class StateMachine(Enum): > "Useless ping-pong state machine" > A = {1: B} > B = {1: A} > > But short of using macros to do an AST transform, I don't know if such a thing is possible at all. Starting off with strings and post-processing would handle it nicely. -- ~Ethan~ From haoyi.sg at gmail.com Tue Jul 9 05:36:04 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 9 Jul 2013 11:36:04 +0800 Subject: [Python-ideas] Why can't we pickle module objects? Message-ID: >>> import pickle >>> pickle.dumps(pickle.dumps) 'cpickle\ndumps\np0\n.' >>> pickle.dumps(pickle) Traceback (most recent call last): File "", line 1, in File "C:\Runtimes\Python\lib\pickle.py", line 1374, in dumps Pickler(file, protocol).dump(obj) File "C:\Runtimes\Python\lib\pickle.py", line 224, in dump self.save(obj) File "C:\Runtimes\Python\lib\pickle.py", line 306, in save rv = reduce(self.proto) File "C:\Runtimes\Python\lib\copy_reg.py", line 70, in _reduce_ex raise TypeError, "can't pickle %s objects" % base.__name__ TypeError: can't pickle module objects I know that you can't, but why not? You can pickle class objects and function objects in modules, and their static path is stored so that when you unpickle it, the correct module is imported and the function is retrieved from that module. It seems odd an inconsistent that you can't pickle the module object itself; can't it just store itself as a name, and have loads() import it and return the resultant module object? This isn't entirely of academic interest; I'm working with some code which is meant to pickle/unpickle arbitrary things, and occasionally it blows up when i accidentally pass in a module. It's always possible to work around the TypeErrors by just changing the stuff I pass in for serializing from the module to the exact function/class i want, but it seems like needless pain. Is there any good reason we don't let people pickle module objects using the same technique that we use to pickle classes and top-level functions? -Haoyi -Haoyi -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jul 9 05:43:44 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Jul 2013 20:43:44 -0700 Subject: [Python-ideas] Why can't we pickle module objects? In-Reply-To: References: Message-ID: I don't think there is any particularly *good* reason, although it would probably require some changes to the unpickling code specifically to support this case (currently when given the name 'foo.bar' it tries to import foo but never bar, IIRC). A bad reason might be that enabling this would make some people think that pickling a module would somehow store the code. But pickling classes and functions already has that disadvantage, so I don't think it's the real reason. Maybe you should just try your hands at a patch? Possibly you'd end up discovering the real reason. :-) --Guido On Mon, Jul 8, 2013 at 8:36 PM, Haoyi Li wrote: >>>> import pickle >>>> pickle.dumps(pickle.dumps) > 'cpickle\ndumps\np0\n.' >>>> pickle.dumps(pickle) > Traceback (most recent call last): > File "", line 1, in > File "C:\Runtimes\Python\lib\pickle.py", line 1374, in dumps > Pickler(file, protocol).dump(obj) > File "C:\Runtimes\Python\lib\pickle.py", line 224, in dump > self.save(obj) > File "C:\Runtimes\Python\lib\pickle.py", line 306, in save > rv = reduce(self.proto) > File "C:\Runtimes\Python\lib\copy_reg.py", line 70, in _reduce_ex > raise TypeError, "can't pickle %s objects" % base.__name__ > TypeError: can't pickle module objects > > I know that you can't, but why not? You can pickle class objects and > function objects in modules, and their static path is stored so that when > you unpickle it, the correct module is imported and the function is > retrieved from that module. It seems odd an inconsistent that you can't > pickle the module object itself; can't it just store itself as a name, and > have loads() import it and return the resultant module object? > > This isn't entirely of academic interest; I'm working with some code which > is meant to pickle/unpickle arbitrary things, and occasionally it blows up > when i accidentally pass in a module. It's always possible to work around > the TypeErrors by just changing the stuff I pass in for serializing from the > module to the exact function/class i want, but it seems like needless pain. > > Is there any good reason we don't let people pickle module objects using the > same technique that we use to pickle classes and top-level functions? > > -Haoyi > > -Haoyi > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From abarnert at yahoo.com Tue Jul 9 07:04:36 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Jul 2013 22:04:36 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373331769.70011.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: On Jul 8, 2013, at 18:25, Haoyi Li wrote: > I would also really like a nice object-oriented sortedlist to be part of the standard library, instead of that nasty C-style heapq stuff. I agree. heapq isn't really a sorted list because it's neither index-accessible nor key-accessible. And bisect isn't really a sorted list because everything but lookup is O(N). And both of them have C-style APIs to boot. AFAIK, two different implementations of sorted collections have been offered for the stdlib, and both were rejected because they tried to offer the wrong thing: blist was sold as a replacement for the existing list and tuple, which also happens to give you sorted list/dict/set types for free, while the other one (I forget the name) was offered as a standard binary tree library, which also happens to give you sorted list/dict/set types for free. I would be very happy to get sorted list/dict/set types added to the stdlib, ideally using the blist interface plus the views and key-slicing stuff from bintrees. And I don't care which implementation we get, as long as it's logarithmic, and there's a fast native accelerator for CPython alongside pure python code that works well in PyPy. There are multiple packages on PyPI that are 90% of the way there. I think it's just a matter of picking one, and either rallying the author to make the changes and commit to maintenance, or taking it over and committing to maintenance yourself. Either way, if you pull it off, I'll be your best friend forever and ever. (Your macro library is very cool, but a collections.sortedlist, I would use every day of my life, which makes it even cooler.) > I'd still rather we make use of the nice new generic dispatch stuff to make our builtins re-usable, rather than having a set of crufty only-really-works-on-builtins functions cluttering the *global* namespaces, and having to manually pick from *another* bunch of non-generic custom functions to do the same (conceptual) thing to different types. The point of using chain instead of sum to concatenate sequences or iterators is that it automatically works for every kind of iterable you can imagine, so it doesn't need any type-switching or generics or anything of the sort. Also, generic dispatch wouldn't really help here, as we need to dispatch on the type(s) that the iterable yields, not the type of the iterable itself. > I don't think that's going to happen anytime before Python 4.0 though. This sort of "yeah, function_x() doesn't work on y, you have to use special_function_z() to concat ys and thingy_z() to concat ws" is workable, but reminds me of my bad-old php days. That's exactly why I don't like expanding sum. At best, it can be the obvious way to concatenate mutable sequences with fast __iadd__ plus some builtin immutable sequences but not others... that's a mess. A chain/flatten/concat/whatever function can be (and already is, with two wasted lines) the obvious way to concatenate any iterables, no matter what. Meanwhile, sum is the obvious way to sum things that are obviously summable (numbers, matrices, etc.), and nothing else. > > -Haoyi > > > > On Tue, Jul 9, 2013 at 9:02 AM, Andrew Barnert wrote: >> From: Haoyi Li >> Sent: Monday, July 8, 2013 5:39 PM >> >> >> > I for one find it annoying that i have to write a verbose long thingy every time i need to flatten lists >> >> What verbose long thingy? You just need to write: >> >> flatten = itertools.chain.from_iterable >> >> Or, if you use more-itertools: >> >> from more_itertools import flatten >> >> Someone suggested moving this to builtins as "chain" or "flatten" or similar. I'm at least +0 on this. >> >> At any rate, it's about as simple as can be. >> >> Because it gives you an iterable, you don't pay the cost (in time or space) of building a list unless you need to, which means in many use cases it's actually much faster than sum, at the cost of being a little bit slower for lists when you do need a list. >> >> >> It also flattens any iterable of sequences, or even any iterable of iterables, in the same time?linear in the total size. With no custom optimizations, it works with tuples, blist.sortedlists, cons lists, or _anything else you can throw at it_. >> >> And it's obvious what it does. If you sum three sortedlists, do you get the first list's elements, then the second, then the third, or a merge of the three? I don't know. If you chain or flatten three sortedlists, it's obvious which one you get. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jul 9 07:30:55 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 09 Jul 2013 14:30:55 +0900 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: Message-ID: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> Bruce Leban writes: > On Sun, Jul 7, 2013 at 3:29 AM, David Kendal??wrote: >> But there's no way to iterate over Unicode graphemes >?A common case is wanting to extract the current grapheme or move > forward or backward one. Please consider these other use cases > rather than just adding an iterator. > ?g = unicodedata.grapheme_cluster(str, i) > ? # extracts cluster that includes index i (i may be in the middle > # of the cluster) Why is indexing a string and returning a grapheme a common case? I would think the common case would be indexing or iterating over a grapheme sequence. At least, if we provided such a type, it would be.[1] Also, for 20 years I've worked with Emacs/Mule which has a multibyte internal representation of characters, and so does a lot of byte index <-> character index conversion in the internals. I would like to avoid imposing that confusion on application programmers, unless they really need it for some reason. Footnotes: [1] Well, of course a lot of applications would continue to work with strs, just as today some applications work directly with bytes even though the content is readable text that could sensibly be translated to str. What I mean is that I expect that indexing str to get grapheme would be rare in applications if grapheme iterators and arrays were available. From stephen at xemacs.org Tue Jul 9 08:23:13 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 09 Jul 2013 15:23:13 +0900 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373331769.70011.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <878v1guu9a.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert writes: > Meanwhile sum is the obvious way to sum things that are obviously > summable (numbers, matrices, etc.), and nothing else. My intuition matches yours, but I find this argument (and the rest of the arguments that say that "generic sum() is unobvious and wrong") logically unsatisfactory. It would be nice if you could provide a plausible definition of "summable" other than "__add__() is implemented". I don't have one. :-( From mal at egenix.com Tue Jul 9 09:03:07 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 09 Jul 2013 09:03:07 +0200 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: <51DAA86E.4010903@egenix.com> Message-ID: <51DBB5AB.3030407@egenix.com> On 08.07.2013 20:27, David Kendal wrote: > On 8 Jul 2013, at 19:26, David Kendal wrote: > >>> Could you open a ticket for this to hash out the details ? >> >> Done! > > Ooops. Should have included a link, sorry. Thanks. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 09 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-16: Python Meeting Duesseldorf ... 7 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Tue Jul 9 09:16:43 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 09 Jul 2013 09:16:43 +0200 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: Message-ID: <51DBB8DB.20107@egenix.com> On 08.07.2013 20:52, Bruce Leban wrote: > On Sun, Jul 7, 2013 at 3:29 AM, David Kendal wrote: > >> Python provides a way to iterate characters of a string by using the >> string as an iterable. But there's no way to iterate over Unicode graphemes >> (a cluster of characters consisting of a base character plus a number of >> combining marks and other modifiers -- or what the human eye would consider >> to be one "character"). >> >> I think this ought to be provided either in the unicodedata library, >> (unicodedata.itergraphemes(string)) which exposes the character database >> information needed to make this work, or as a method on the built-in str >> type. (str.itergraphemes() or str.graphemes()) > > > A common case is wanting to extract the current grapheme or move forward or > backward one. Please consider these other use cases rather than just adding > an iterator. > > g = unicodedata.grapheme_cluster(str, i) # extracts cluster that includes > index i (i may be in the middle of the cluster) > i = unicodedata.grapheme_start(str, i) # if i is the start of the cluster, > returns i; otherwise backs up to the start of the cluster > i = unicodedata.previous_cluster(str, i) # moves i to the first index of > the previous cluster; returns None if no previous cluster in the string > i = unicodedata.next_cluster(str, i) # moves i to the first index of the > next cluster; returns None if no next cluster in the String > > > I think these belongs in unicodedata, not str. FWIW: Here's a pre-PEP I once wrote for these things: http://mail.python.org/pipermail/python-dev/2001-July/015938.html At the time there was little interest, so I dropped the idea. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 09 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-16: Python Meeting Duesseldorf ... 7 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From masklinn at masklinn.net Tue Jul 9 10:31:27 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 9 Jul 2013 10:31:27 +0200 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <6E42B456-AB01-4623-8043-2E71EAF7A732@masklinn.net> On 2013-07-09, at 07:30 , Stephen J. Turnbull wrote: > Bruce Leban writes: > >> On Sun, Jul 7, 2013 at 3:29 AM, David Kendal wrote: >>> But there's no way to iterate over Unicode graphemes > >> A common case is wanting to extract the current grapheme or move >> forward or backward one. Please consider these other use cases >> rather than just adding an iterator. > >> g = unicodedata.grapheme_cluster(str, i) >> # extracts cluster that includes index i (i may be in the middle >> # of the cluster) > > Why is indexing a string and returning a grapheme a common case? I don't know about that but I do know NSString provides two messages for that (one takes an index in a string and returns the corresponding grapheme boundaries ? rangeOfComposedCharacterSequenceAtIndex:; and the other takes a range and returns the range of all composing graphemes ? rangeOfComposedCharacterSequencesForRange:). Of course that might just be because it does not provide a higher-level iterator on graphemes. From abarnert at yahoo.com Tue Jul 9 10:51:41 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Jul 2013 01:51:41 -0700 (PDT) Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <878v1guu9a.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373331769.70011.YahooMailNeo@web184703.mail.ne1.yahoo.com> <878v1guu9a.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1373359901.40418.YahooMailNeo@web184702.mail.ne1.yahoo.com> From: Stephen J. Turnbull Sent: Monday, July 8, 2013 11:23 PM > Andrew Barnert writes: > >> Meanwhile sum is the obvious way to sum things that are obviously >> summable (numbers, matrices, etc.), and nothing else. > > My intuition matches yours, but I find this argument (and the rest of > the arguments that say that "generic sum() is unobvious and wrong") > logically unsatisfactory.? It would be nice if you could provide a > plausible definition of "summable" other than "__add__() is > implemented".? I don't have one. :-( As I see it, there are three possibilities. 1. sum is not appropriate when __add__ means concatenation rather than adding. If you'd use PySequence_Concat/sq_concat rather than PyNumber_Add/nb_add in porting to the C API, or if you'd use a different operator in Python 4 if concatenation stopped being written as __add__, then you shouldn't use sum. The problem is usually, you're not writing C API code or Python 4, you're writing Python 3, so it's not always obvious what the facts are. But I don't think it's ever that hard to figure out. If we used & for concatenation, list.__add__ and str.__add__ would no longer exist, but np.matrix.__add__, datetime.__add__, and quaternion.__add__ would. 2. sum is not appropriate iff chain.from_iterable makes sense.* Needing a list doesn't make chain unusable here any more than it does with map, zip, etc.;?just pass the result to list. But "You can't iterate ints" or "I'm treating these np.matrix 3-vectors as atomic objects, not collections" does mean chain is unusable, so sum is the answer. 3. sum is not appropriate when 0 doesn't make sense as a start value.?Summing things means, by default, starting with 0 and adding repeatedly. You can provide a non-default start value, but?it should be "similar to 0" or "compatible with 0" in some way. Note that you can add 0 to an int, a float, a quaternion, a numpy.matrix, and all kinds of other types that "act like numbers". And that means that testing 0+start or start+0 is a pretty good test for summable.?This is admittedly imperfect,** but it's pretty close, and very concrete and simple. Personally, I'm leaning toward 2. If you come up with a type that is addable, and isn't iterable (or does the wrong thing when iterated), why not, it's summable. Better to let some corner-case false positives slip in than the reject some corner-case false negatives (as with 3), or to make them just impossible to decide (as with 1). * This is a little too loose. Surely there are types where __add__ is not addition, but also not iterable concatenation, right? But I don't think sum is an attractive nuisance there, unlike in the case with concatenation. Meanwhile, what about strings??Actually, chain makes perfect sense with strings; it's just that usually, you're just going to want to pass the iterable to ''.join, and if you're doing that, you can just pass the original strings to ''.join. So, no problem there. ** Most notably,?I think adding a sequence of timedelta objects with a datetime start makes sense, and you can't add 0 to a datetime. Really, what you need here is a way to say that start + 0 * peek(iterable) is also acceptable, not just start + 0?and you can justify that more rigorously in terms of fields?but that's nearly impossible to implement, and way too complicated to explain. So, option 3 will reject some valid types. From shibturn at gmail.com Tue Jul 9 11:22:21 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Tue, 09 Jul 2013 10:22:21 +0100 Subject: [Python-ideas] Why can't we pickle module objects? In-Reply-To: References: Message-ID: You can work around it fairly easily with copyreg: >>> import sys, copyreg, pickle >>> def reduce_mod(m): ... assert sys.modules[m.__name__] is m ... return rebuild_mod, (m.__name__,) ... >>> def rebuild_mod(name): ... __import__(name) ... return sys.modules[name] ... >>> copyreg.pickle(type(sys), reduce_mod) >>> pickle.loads(pickle.dumps(sys)) -- Richard From abarnert at yahoo.com Tue Jul 9 11:32:12 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Jul 2013 02:32:12 -0700 (PDT) Subject: [Python-ideas] Sorted collections again?! (Was: Fast sum() for non-numbers) In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373331769.70011.YahooMailNeo@web184703.mail.ne1.yahoo.com> Message-ID: <1373362332.36085.YahooMailNeo@web184703.mail.ne1.yahoo.com> From: Haoyi Li Sent: Monday, July 8, 2013 6:25 PM >I would also really like a nice object-oriented sortedlist to be part of the standard library, instead of that nasty C-style heapq stuff. See this: http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html I stopped short of writing up a complete proposal, because (a) there are a lot of open questions, and (b)?I'd like to hear from the authors of some of the existing PyPI libraries. But the short version is: ?* [Mutable]Sorted[Mappingt|Set|List] and possibly Sorted[Item|Key|Value]View ABCs, which inherit from the non-Sorted and add methods like key_slice. ?* Sorted[Dict|Set|List] concrete classes, all sharing the same implementation, ideally borrowed from existing PyPI library. (Open questions: red-black trees, or something else? can you get timsort-like behavior for already-sorted source data, or for large updates?) ?* Constructor matches [dict|set|list], but with optional key parameter. (Open question: SortedDict(a=1, B=2, c=3, key=str.lower) ambiguity.) From sergemp at mail.ru Tue Jul 9 11:35:30 2013 From: sergemp at mail.ru (Sergey) Date: Tue, 9 Jul 2013 12:35:30 +0300 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> Message-ID: <20130709123530.2afa1adf@sergey> On Jul 5, 2013 Stefan Behnel wrote: > No, please. Using sum() on lists is not more than a hack that > seems to be a cool idea but isn't. Seriously - what's the sum of > lists? Intuitively, it makes no sense at all to say sum(lists). It's the same as it is now. What else can you think about when you see: [1, 2, 3] + [4, 5] ? Seriously, why there's so much holy wars about that? I'm not asking to rewrite cpython on Java or C#. I'm not adding a bunch of new functions, I'm not even changing signatures of existing functions. It's just among hundreds of existing functions I took one and made it faster for some use-cases. That's all! It's just a minor optimization patch. If instead I optimized e.g. ConfigParser [1] then nobody would care. Then why so many people care about this one? -- [1] http://bugs.python.org/issue7113 From haoyi.sg at gmail.com Tue Jul 9 12:20:53 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Tue, 9 Jul 2013 18:20:53 +0800 Subject: [Python-ideas] Why can't we pickle module objects? In-Reply-To: References: Message-ID: > You can work around it fairly easily with copyreg: I actually just realized that after i downloading the python repo and was poking around how to proceed (actually, no, i found a way manually monkeypatching pickle.Pickler.dispatch, but i guess it's close enough). Regardless, *should* it work with modules? I'm in the mood to try and make a (first) contribution to python, and this seems like a feasible (not too hard) patch, If a patch is desirable/potentially desirable. I guess the (hypothetical) patch would cover all versions {2.7, 3.3} X {pickle, cPickle}? -Haoyi -Haoyi On Tue, Jul 9, 2013 at 5:22 PM, Richard Oudkerk wrote: > You can work around it fairly easily with copyreg: > > >>> import sys, copyreg, pickle > >>> def reduce_mod(m): > ... assert sys.modules[m.__name__] is m > ... return rebuild_mod, (m.__name__,) > ... > >>> def rebuild_mod(name): > ... __import__(name) > ... return sys.modules[name] > ... > >>> copyreg.pickle(type(sys), reduce_mod) > >>> pickle.loads(pickle.dumps(sys)**) > > > -- > Richard > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Tue Jul 9 14:24:14 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 9 Jul 2013 15:24:14 +0300 Subject: [Python-ideas] Why can't we pickle module objects? In-Reply-To: References: Message-ID: On Tue, Jul 9, 2013 at 1:20 PM, Haoyi Li wrote: > [...] > I guess the (hypothetical) patch would cover all versions {2.7, 3.3} X > {pickle, cPickle}? > > It would be for 3.4 only as 2.7 and 3.3 are in feature freeze. Personally I stopped enjoying pickling classes because of the pains in renaming the source after saving the pickle file, so my opinion would be to leave pickle alone. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Tue Jul 9 15:47:06 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 09 Jul 2013 08:47:06 -0500 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <20130709123530.2afa1adf@sergey> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> Message-ID: On 07/09/2013 04:35 AM, Sergey wrote: > On Jul 5, 2013 Stefan Behnel wrote: > >> >No, please. Using sum() on lists is not more than a hack that >> >seems to be a cool idea but isn't. Seriously - what's the sum of >> >lists? Intuitively, it makes no sense at all to say sum(lists). > It's the same as it is now. What else can you think about when you > see: [1, 2, 3] + [4, 5] ? > > Seriously, why there's so much holy wars about that? I'm not asking > to rewrite cpython on Java or C#. I'm not adding a bunch of new > functions, I'm not even changing signatures of existing functions. It's the nature of this particular news group. We focus on improving python, and that includes new things and improving old things, but also includes discussing any existing or potential problems. You will almost always get a mix of approval and disapproval on just about every thing here. It's not a war, it's just different people having different opinions. Quite often that leads to finding better ways to do things, and in the long run, helps avoid adding features and changes that could be counter productive to python. > It's just among hundreds of existing functions I took one and made it > faster for some use-cases. That's all! It's just a minor optimization > patch. If it only makes an existing function faster and doesn't change any other behaviour, and all the tests still pass for it. Just create an issue on the tracker, with the patch posted there, and it will probably be accepted after it goes through a much more focused review process. But discussing it here will invite a lot of opinions about how it works, how it shouldn't work, what would work better, and etc... It's what this board if for. ;-) Cheers, Ron > If instead I optimized e.g. ConfigParser [1] then nobody would care. > Then why so many people care about this one? > > -- [1] http://bugs.python.org/issue7113 From sergemp at mail.ru Tue Jul 9 15:42:35 2013 From: sergemp at mail.ru (Sergey) Date: Tue, 9 Jul 2013 16:42:35 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: <20130709164235.7fe21a7d@sergey> On Jul 8, 2013 Andrew Barnert wrote: > I'm -1 on adding special-casing for tuples that would not be > available for any other immutable type. Ok, let's be honest, I don't like that special case too. :( But when I had two options: 1. Make sum faster for everything BUT tuples and write in a manual: ... sum() is fast for all built-in types except `tuple`. For tuples you have to manually convert it to list, i.e. instead of: sum(list_of_tuples, tuple()) you have to write: tuple(sum(map(list,list_of_tuples),[])) or tuple(itertools.chain.from_iterable(list_of_tuples)) ... 2. Implement just one (!) special case for the only type in python needing it and write: ... sum() is fast for all built-in types! ... I chose #2. Tuple is one of the most frequently used types in python, and it's the only type that needs such a special case. Until someone writes a better solution: Practicality beats purity. That was my motivation. > No, you can't. You can do something different, but only by > modifying the C source to sum. > [...] >> I can't make __iadd__ faster, because tuple has >> no __iadd__, however I can make a faster __add__. > > No, you can't make tuple.__add__ faster either. (If you can, > please submit a patch, because that would be useful completely > independent of sum.) Theoretically it's possible to rewrite a tuple type to internally use list type for storage, and additionally have a `localcount` variable saying how many items from that list belong to this tuple. Then __add__ for such tuple would just create a new tuple with exactly same internal list (with incremented ref.count) extended. This way, technically, __add__ would modify all the tuples around, but due to internal `localcount` variable you won't notice that. Would you like such a patch instead? Would you want to write it? ;) It's just this patch only optimizes add, which is ONLY needed for many sequential adds, i.e. for sum(), so I thought that it would be MUCH easier to add it to sum instead. >> Are there any other types (except strings, that are blocked anyway)? > > Yes. Every single immutable type. Which is just one type ? tuple. There's no other summable standard types in python having O(N) __add__, is it? > blist.btuple has the same problem. Has it? I took just a brief look in its source and it seems that it already uses blist internally, so it can implement fast __add__ on its own (i.e. using the idea I described above). >?You wanted sets to be addable? Then frozenset would have this > problem. But they're not addable, so still not a problem. :) > Strings obviously have this problem, and any third-party immutable > string type will as well. And strings are blocked by sum(), so no performance problems for them. (Third-party immutable strings?) > So, if you're suggesting that sum can be fast for anything > reasonable, you're just wrong. I suggested two ways how to do that. First, one can use the approach above, i.e. use mutable type internally. Second, for internal cpython types we have a simpler option to implement optimization directly in sum(). And there may be many others, specific to the types in question. >> We're talking about python, and discussing use of sum() in python >> for such lists in particular. > > No. You insisted that every collection type is O(N) summable with > your design. Others argued that this isn't true.?You asked for an > example. I offered cons lists. I don't remember saying that every collection type in a world is O(N) summable, but ok. Would you agree that all summable built-in collection types of python become O(N) summable with "my design"? I.e. they were not O(N) summable before my patch, but they are O(N) after it. Then why don't you like the patch? Because somewhere in the world there could theoretically exist some other types that are still not O(N) summable? Maybe, then we (or their authors) will deal with them later (and I tried to show you the options for your single linked list example). After all, they were not O(N) summable before this patch anyway. But they MAY BECOME O(N) summable after it. > If you agree that your design is not a general solution for al > sequences, then all of your misunderstandings about cons lists are > a red herring, and we can drop them. I guess you misunderstand "my design" (or whatever you call that). Let's put it like that. Currently: The only way to make a type O(N) summable is to implement fast __add__ for it. So I suggested: It is often easier to implement fast __iadd__ than fast __add__. So let's change sum so that it took advantage of __iadd__ if it exists. Then someone said: You still cannot make sum fast for everything, i.e. for tuples I understood that as: If you already changing sum() you should make it fast for tuples too, so that we could say "sum() is fast now". So I replied: Yes, that patch does not meet "sum() is fast now" goal, because there's one more type `tuple` that is still slow. So, if we want to make sum fast for all built-in types, we must make it fast for tuples too. Here's a small patch, that just adds a special case for tuples, as that is the only type that needs it. This patch can be also extended to other types, e.g. lists and strings. Yes, authors of custom types won't have that simple option. But we have it, so why not use it, if it's MUCH easier than alternatives? >>It's just you said: > [...] > First, that wasn't me; please try to keep your attributions straight. Oops, sorry, my mistake. > But I agree with the sentiment. There is an efficient way to > concatenate a lot of cons lists (basically, just fold backward > instead of forward), but sum is not it. Hm... If you implement fast __radd__ instead of __add__ sum would use that, wouldn't it? Is that the easy way you were looking for? > So, whoever said that is right?encouraging people to treat sum() > as the preferred way to concatenate large amounts of data is a bad > idea. Then, what way you suggest to be preferred? For example how would you do that in py4k? I guess you would prefer sum() to reject lists and tuples as it now rejects strings and use some other function for them instead? Or what? What is YOUR preferred way? > Agreed. That makes the APIs a little more complicated (you need to > a list and a list::iterator instead of just a node), but that's not > a big deal. And, with (list, list::iterator) being equivalent to a > node, it leads to exactly the same issues as you started with in > having just a node type. We have 'list' and 'listiterator', 'tuple' and 'tupleiterator', 'set' and 'setiterator'. Nothing unusual here. And no issues about them. > Yes, deque is a great tool, but it's not the same tool as a linked > list, and doesn't support the same algorithms. Not all of them, but some. I.e. if you used your cons-lists as queue or stack, then deque is a good replacement. >> That wasn't saying "just make it do something different". That was >> saying "you can have linked lists in python, that are O(N) summable". > > Which is exactly the point you were arguing against. If you now > agree with everyone else, fine. There are types that can be > efficiently concatenated, but not with sum. That's why everyone > else thinks you shouldn't encourage people to use sum for general > concatenation. Really, I don't understand that point. Are you saying, that sum() must remain slow for FREQUENTLY USED standard types just because there MAY BE some other types for which it would still be slow? > Using a global variable (or a class attribute, which is the > same thing) means that sum isn't reentrant, or thread-safe, or > generator-safe. Is it now? No? Then what changes? (BTW, your __iadd__ becomes not-thread-safe, not sum) PS: Talking about all collections in the world, theoretically it may be possible to extend my "special case" to all collection types in the world, but there's a small issue with that idea that I don't know how to handle... From sergemp at mail.ru Tue Jul 9 15:54:49 2013 From: sergemp at mail.ru (Sergey) Date: Tue, 9 Jul 2013 16:54:49 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51DB6BF6.9030608@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB6BF6.9030608@pearwood.info> Message-ID: <20130709165449.6b124367@sergey> On Jul 9, 2013 Steven D'Aprano write: >> Well... Yes, I can! I can't make __iadd__ faster, because tuple has >> no __iadd__, however I can make a faster __add__. > > And how do you expect to do that? Tuples are immutable, you have > to create a new tuple. So when adding a sequence of N tuples > together, you end up making and throwing away N-1 intermediate > results. For example, rewrite tuple to internally store its values in a list, and have `localcount` variable saying how many items from that list belong to this tuple. Then __add__ could extend that list and reuse it for new tuple. But this patch looks too complex for me, implementing such a list storage directly in sum was much easier. >> But as long as sum() is the only (?) function suffering from this >> problem it was easier to do that optimization in sum() instead. > > That's the big question though. Is summing a sequence of tuples > important and common enough to justify special-casing it in sum? Just > how many special cases can be justified? Personally, I think this use-case is not too important. But look at it from another perspective: is this special case is such a bad trade-off to say in manuals: ... sum() is fast for all built-in types. ... >> Looks like tuple is the only built-in type having no fast __iadd__, > > I don't think so: > [...] > Okay, you can't sum() frozensets at all, but there's at least > three types that support + that don't support __iadd__ (str, bytes, > tuple), and by default anything inheriting from object will not have > __iadd__ either. Note, that str and bytes are rejected by sum, so again we end up with just one type: tuple. And things inheriting from object will probably have fast enough __add__ anyway. :) From guido at python.org Tue Jul 9 15:55:21 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Jul 2013 06:55:21 -0700 Subject: [Python-ideas] Why can't we pickle module objects? In-Reply-To: References: Message-ID: As I said I don't see much harm in it. On Tue, Jul 9, 2013 at 5:24 AM, Yuval Greenfield wrote: > On Tue, Jul 9, 2013 at 1:20 PM, Haoyi Li wrote: >> >> [...] >> I guess the (hypothetical) patch would cover all versions {2.7, 3.3} X >> {pickle, cPickle}? >> > > It would be for 3.4 only as 2.7 and 3.3 are in feature freeze. > > Personally I stopped enjoying pickling classes because of the pains in > renaming the source after saving the pickle file, so my opinion would be to > leave pickle alone. > > Yuval > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From ronaldoussoren at mac.com Tue Jul 9 16:00:59 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 9 Jul 2013 16:00:59 +0200 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130709165449.6b124367@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB6BF6.9030608@pearwood.info> <20130709165449.6b124367@sergey> Message-ID: On 9 Jul, 2013, at 15:54, Sergey wrote: > On Jul 9, 2013 Steven D'Aprano write: > >>> Well... Yes, I can! I can't make __iadd__ faster, because tuple has >>> no __iadd__, however I can make a faster __add__. >> >> And how do you expect to do that? Tuples are immutable, you have >> to create a new tuple. So when adding a sequence of N tuples >> together, you end up making and throwing away N-1 intermediate >> results. > > For example, rewrite tuple to internally store its values in a list, > and have `localcount` variable saying how many items from that list > belong to this tuple. Then __add__ could extend that list and reuse > it for new tuple. That's not going to happen, not only breaks that backward compatibility for users for the C API, it has nasty side effects and is incorrect. Nasty side effect: a = (1,) b = (2,) * 1000 c = a + b del b del c With the internal list 'a' keeps alive the extra storage used for 'c'. Incorrect: a = (1,) b = a + (2,) c = a + (3,) Now 'b' and 'c' can't possibly both share storage with 'a'. Ronald From ronaldoussoren at mac.com Tue Jul 9 16:07:57 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 9 Jul 2013 16:07:57 +0200 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130709164235.7fe21a7d@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> Message-ID: <3BA7735D-3823-48D9-A0FB-EE0E574FA142@mac.com> On 9 Jul, 2013, at 15:42, Sergey wrote: > On Jul 8, 2013 Andrew Barnert wrote: > >> I'm -1 on adding special-casing for tuples that would not be >> available for any other immutable type. > > Ok, let's be honest, I don't like that special case too. :( > But when I had two options: > > 1. Make sum faster for everything BUT tuples and write in a manual: > ... > sum() is fast for all built-in types except `tuple`. For tuples > you have to manually convert it to list, i.e. instead of: > sum(list_of_tuples, tuple()) > you have to write: > tuple(sum(map(list,list_of_tuples),[])) > or > tuple(itertools.chain.from_iterable(list_of_tuples)) > ... > > 2. Implement just one (!) special case for the only type in python > needing it and write: > ... > sum() is fast for all built-in types! > ... > > I chose #2. Tuple is one of the most frequently used types in python, > and it's the only type that needs such a special case. > > Until someone writes a better solution: Practicality beats purity. > That was my motivation. The better solution is to not use sum. I haven't looked at your patch, but does it deal with all edge cases (such as calling sum on a heterogenous list that happens to have a tuple as its first item)? Trying to special-case sum for a sequence of tuples is (IMHO too) magical, and getting all details right makes the code more complicated. Sum is primarily intented for summing sequences of numeric values, that it works on list and tuples as well is a more or less unintended side-effect, and btw. the documentation for sum explicitly says its mentioned to sum numbers: sum(iterable[, start]) -> value Returns the sum of an iterable of numbers (NOT strings) plus the value of parameter 'start' (which defaults to 0). When the iterable is empty, returns start. Ronald From ron3200 at gmail.com Tue Jul 9 16:39:38 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 09 Jul 2013 09:39:38 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <1373359901.40418.YahooMailNeo@web184702.mail.ne1.yahoo.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373331769.70011.YahooMailNeo@web184703.mail.ne1.yahoo.com> <878v1guu9a.fsf@uwakimon.sk.tsukuba.ac.jp> <1373359901.40418.YahooMailNeo@web184702.mail.ne1.yahoo.com> Message-ID: On 07/09/2013 03:51 AM, Andrew Barnert wrote: > From: Stephen J. Turnbull > > Sent: Monday, July 8, 2013 11:23 PM > > >> Andrew Barnert writes: >> >>> Meanwhile sum is the obvious way to sum things that are obviously >>> summable (numbers, matrices, etc.), and nothing else. >> >> My intuition matches yours, but I find this argument (and the rest of >> the arguments that say that "generic sum() is unobvious and wrong") >> logically unsatisfactory. It would be nice if you could provide a >> plausible definition of "summable" other than "__add__() is >> implemented". I don't have one. :-( > > > As I see it, there are three possibilities. > > 1. sum is not appropriate when __add__ means concatenation rather than > adding. If you'd use PySequence_Concat/sq_concat rather than > PyNumber_Add/nb_add in porting to the C API, or if you'd use a different > operator in Python 4 if concatenation stopped being written as __add__, > then you shouldn't use sum. The problem is usually, you're not writing C > API code or Python 4, you're writing Python 3, so it's not always > obvious what the facts are. But I don't think it's ever that hard to > figure out. If we used & for concatenation, list.__add__ and str.__add__ > would no longer exist, but np.matrix.__add__, datetime.__add__, and > quaternion.__add__ would. What if we could specify the degree of specialisation of an operator? Kind of like a cast operation, but instead, it works on the operator instead of the value? v = a +(int) b # v = int.__add__(a, b) v = a +(abc.Numbers) b # ? v = a +(str) b # v = str.__add__(a, b) v = a +(abc.Container) b # ? The default operation would be... v = a +(type(a)) b # v = type(a).__add__(a, b) # or v = a.__add__(b) I don't think ABC's are defined/refined enough to make this work as they are, but could they be? (I'm not that familiar with ABC's yet.) Ron > 2. sum is not appropriate iff chain.from_iterable makes sense.* Needing > a list doesn't make chain unusable here any more than it does with map, > zip, etc.; just pass the result to list. But "You can't iterate ints" or > "I'm treating these np.matrix 3-vectors as atomic objects, not > collections" does mean chain is unusable, so sum is the answer. > > 3. sum is not appropriate when 0 doesn't make sense as a start value. > Summing things means, by default, starting with 0 and adding repeatedly. > You can provide a non-default start value, but it should be "similar to > 0" or "compatible with 0" in some way. Note that you can add 0 to an > int, a float, a quaternion, a numpy.matrix, and all kinds of other types > that "act like numbers". And that means that testing 0+start or start+0 > is a pretty good test for summable. This is admittedly imperfect,** but > it's pretty close, and very concrete and simple. > > Personally, I'm leaning toward 2. If you come up with a type that is > addable, and isn't iterable (or does the wrong thing when iterated), why > not, it's summable. Better to let some corner-case false positives slip > in than the reject some corner-case false negatives (as with 3), or to > make them just impossible to decide (as with 1). > > > * This is a little too loose. Surely there are types where __add__ is > not addition, but also not iterable concatenation, right? But I don't > think sum is an attractive nuisance there, unlike in the case with > concatenation. Meanwhile, what about strings? Actually, chain makes > perfect sense with strings; it's just that usually, you're just going to > want to pass the iterable to ''.join, and if you're doing that, you can > just pass the original strings to ''.join. So, no problem there. > > ** Most notably, I think adding a sequence of timedelta objects with a > datetime start makes sense, and you can't add 0 to a datetime. Really, > what you need here is a way to say that start + 0 * peek(iterable) is > also acceptable, not just start + 0?and you can justify that more > rigorously in terms of fields?but that's nearly impossible to implement, > and way too complicated to explain. So, option 3 will reject some valid > types. _______________________________________________ Python-ideas > mailing list Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From steve at pearwood.info Tue Jul 9 18:13:54 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 10 Jul 2013 02:13:54 +1000 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <20130709123530.2afa1adf@sergey> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> Message-ID: <51DC36C2.8000509@pearwood.info> On 09/07/13 19:35, Sergey wrote: > On Jul 5, 2013 Stefan Behnel wrote: > >> No, please. Using sum() on lists is not more than a hack that >> seems to be a cool idea but isn't. Seriously - what's the sum of >> lists? Intuitively, it makes no sense at all to say sum(lists). > > It's the same as it is now. What else can you think about when you > see: [1, 2, 3] + [4, 5] ? Some of us think that using + for concatenation is an abuse of terminology, or at least an unfortunate choice of operator, and are wary of anything that encourages that terminology. Nevertheless, you are right, in Python 3 both + and sum of lists is well-defined. At the moment sum is defined in terms of __add__. You want to change it to be defined in terms of __iadd__. That is a semantic change that needs to be considered carefully, it is not just an optimization. > Seriously, why there's so much holy wars about that? I'm not asking > to rewrite cpython on Java or C#. I'm not adding a bunch of new > functions, I'm not even changing signatures of existing functions. Because sometimes people are cautious and conservative about new ideas. Better to be cautious, and miss out on some new function for a version or two, than to rush into it, and then regret it if it turns out to be a bad idea. I have been very careful to say that I am only a little bit against this idea, -0 not -1. I am uncomfortable about changing the semantics to use __iadd__ instead of __add__, because I expect that this will change the behaviour of sum() for non-builtins. I worry about increased complexity making maintenance harder for no good reason. It's the "for no good reason" that concerns me: you could answer some of my objections if you showed: - real code written by people who sum() large (more than 100) numbers of lists; - real code with comments like "this is a work-around for sum() being slow on lists"; - bug reports or other public complaints by people (other than you) complaining that sum(lists) is slow; or similar. That would prove that people do call sum on lists. But at the moment, I can only judge based on my own experience, both in writing code and dealing with questions on comp.lang.python, and I don't see many people doing sum(lists) or sum(tuples). Earlier in this discussion, you posted benchmarks for the patched sum using Python 2.7. Would you be willing to do it again for 3.3? And confirm that the Python test suite continues to pass? Thank you, -- Steven From ckaynor at zindagigames.com Tue Jul 9 18:42:55 2013 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Tue, 9 Jul 2013 09:42:55 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <3BA7735D-3823-48D9-A0FB-EE0E574FA142@mac.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <3BA7735D-3823-48D9-A0FB-EE0E574FA142@mac.com> Message-ID: On Tue, Jul 9, 2013 at 7:07 AM, Ronald Oussoren wrote: > The better solution is to not use sum. I haven't looked at your patch, > but does it deal with all edge cases (such as calling sum on a heterogenous > list that happens to have a tuple as its first item)? Trying to > special-case > sum for a sequence of tuples is (IMHO too) magical, and getting all details > right makes the code more complicated. > > Sum is primarily intented for summing sequences of numeric values, that > it works on list and tuples as well is a more or less unintended > side-effect, > and btw. the documentation for sum explicitly says its mentioned to sum > numbers: > > sum(iterable[, start]) -> value > > Returns the sum of an iterable of numbers (NOT strings) plus the value > of parameter 'start' (which defaults to 0). When the iterable is > empty, returns start. I wonder if the best solution, if sum is only intended for use on numbers, would be to move it to the math module, rather than being a built-in function. No other changes would need to be made, although the special case for strings could likely be removed then as it would become fairly obvious that the string case is not reasonably supported. I would imagine that this would not take place until Python 4 (due to the large amount of existing code it would break), and I am not really proposing it, but it would seem to be logical. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruce at leapyear.org Tue Jul 9 18:51:07 2013 From: bruce at leapyear.org (Bruce Leban) Date: Tue, 9 Jul 2013 09:51:07 -0700 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jul 8, 2013 at 10:30 PM, Stephen J. Turnbull wrote: > > Why is indexing a string and returning a grapheme a common case? I > would think the common case would be indexing or iterating over a > grapheme sequence. At least, if we provided such a type, it would > be.[1] > If you want to do any operation on the clusters other than in iteration order, without indexed access you're going to end up doing list(grapheme_clusters(...)) first to give you indexed access. Maybe that's the right thing to do sometimes but I wouldn't force it on people. The string already provides indexed access but I need to know cluster boundaries. Note that str.find returns an int, not the found string. What do I do with that index if I can't extract clusters in the middle? Imagine you're writing code that works on English words. Would the only api you provide be one that iterates over the words? How would you write the function that finds the word after 'the' in a string? --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jul 9 19:11:06 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Jul 2013 10:11:06 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130709164235.7fe21a7d@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> Message-ID: On Jul 9, 2013, at 6:42, Sergey wrote: > On Jul 8, 2013 Andrew Barnert wrote: > >> I'm -1 on adding special-casing for tuples that would not be >> available for any other immutable type. > > Ok, let's be honest, I don't like that special case too. :( > But when I had two options: > > 1. Make sum faster for everything BUT tuples and write in a manual: > ... > sum() is fast for all built-in types except `tuple`. For tuples > you have to manually convert it to list, i.e. instead of: > sum(list_of_tuples, tuple()) > you have to write: > tuple(sum(map(list,list_of_tuples),[])) > or > tuple(itertools.chain.from_iterable(list_of_tuples)) > ... > > 2. Implement just one (!) special case for the only type in python > needing it and write: > ... > sum() is fast for all built-in types! > ... > > I chose #2. Tuple is one of the most frequently used types in python, > and it's the only type that needs such a special case. > > Until someone writes a better solution: Practicality beats purity. > That was my motivation. #3 is for the docs to say what they currently say--sum is fast for numbers--but change the "not strings" to "not sequences", and maybe add a note saying _how_ to concatenate sequences (usually join for strings and chain for everything else). This makes sense with or without an __iadd__ patch (as long as __iadd__ is useful for some number-like types--as I said before, I expect it might be, but I don't actually know). >> No, you can't. You can do something different, but only by >> modifying the C source to sum. >> [...] >>> I can't make __iadd__ faster, because tuple has >>> no __iadd__, however I can make a faster __add__. >> >> No, you can't make tuple.__add__ faster either. (If you can, >> please submit a patch, because that would be useful completely >> independent of sum.) > > Theoretically it's possible to rewrite a tuple type to internally use > list type for storage, and additionally have a `localcount` variable > saying how many items from that list belong to this tuple. Then > __add__ for such tuple would just create a new tuple with exactly > same internal list (with incremented ref.count) extended. This way, > technically, __add__ would modify all the tuples around, but due to > internal `localcount` variable you won't notice that. I was going to point out the side effects of such a change, but someone beat me to it. > Would you like such a patch instead? Would you want to write it? ;) > > It's just this patch only optimizes add, which is ONLY needed for > many sequential adds, i.e. for sum(), so I thought that it would be > MUCH easier to add it to sum instead. And it's even easier to add neither. >>> Are there any other types (except strings, that are blocked anyway)? >> >> Yes. Every single immutable type. > > Which is just one type ? tuple. There's no other summable standard > types in python having O(N) __add__, is it? Does nobody ever use types from the stdlib, third-party libs, or their own code in your world? Builtin types are not generally magical. A function that works well for all builtin types, but not types in the stdlib, is a recipe for confusion. >> So, if you're suggesting that sum can be fast for anything >> reasonable, you're just wrong. > > I suggested two ways how to do that. First, one can use the approach > above, i.e. use mutable type internally. Second, for internal cpython > types we have a simpler option to implement optimization directly > in sum(). And there may be many others, specific to the types in > question. Using a mutable type internally will almost always have side effects, or at least complicate the implementation. What you're suggesting is that theoretically, for some different language that placed a very strange and sometimes hard to meet requirement on all types, sum could be fast for all types. That doesn't mean sum can be fast for all Python types, because Python doesn't, and shouldn't, have such a requirement. And again, what would be the benefit? That sum could become the obvious way to do concatenation instead of just summing? That's not even desirable, much less worth bending over backward for. >>> We're talking about python, and discussing use of sum() in python >>> for such lists in particular. >> >> No. You insisted that every collection type is O(N) summable with >> your design. Others argued that this isn't true. You asked for an >> example. I offered cons lists. > > I don't remember saying that every collection type in a world is > O(N) summable, but ok. Would you agree that all summable built-in > collection types of python become O(N) summable with "my design"? Yes, but if it's impossible to add a new collection type that works like tuple, that makes python worse, not better. > I.e. they were not O(N) summable before my patch, but they are O(N) > after it. Then why don't you like the patch? > > Because somewhere in the world there could theoretically exist some > other types that are still not O(N) summable? No, because all over the world there already actually exist such types, and because it's trivial--and useful--to create a new one. > Maybe, then we (or > their authors) will deal with them later (and I tried to show you > the options for your single linked list example). And you only came up with options that destroy vital features of the type, making it useless for most people who would want to use it. But, more importantly, it is very simple to design and implement new collection types in Python today. Adding a new requirement that's hard enough to reach--one that you haven't been able to pull it off for the first example you were given--implies that it would no longer be easy. >> If you agree that your design is not a general solution for al >> sequences, then all of your misunderstandings about cons lists are >> a red herring, and we can drop them. > > I guess you misunderstand "my design" (or whatever you call that). Your design is the design explicitly described in your email: sum uses __iadd__, and has special casing for at least one type. The argument against this is that it makes an existing mild attractive nuisance much worse, by implying that sum is actually fast for concatenation in general. Your counter was that sum can be actually fast for concatenation in general. That's not true. If you're now saying that it can't, only for certain types, then you need a new justification for why it isn't an attractive nuisance. The one you seem to be implying is "everybody expects non-builtin types to be less usable than builtins", which is wrong. > Let's put it like that. Currently: > The only way to make a type O(N) summable is to implement fast > __add__ for it. > So I suggested: > It is often easier to implement fast __iadd__ than fast __add__. > So let's change sum so that it took advantage of __iadd__ if it > exists. So far, so good--but again, it would really help your case to find number-like types that this is true for. I already suggested some (np.matrix, for example). > Then someone said: > You still cannot make sum fast for everything, i.e. for tuples > I understood that as: > If you already changing sum() you should make it fast for > tuples too, so that we could say "sum() is fast now". Then you misunderstood it. Tuples were offered as one example, out of many, that won't be fast. Solving that one example by adding a special case in the C code doesn't help the general problem unless you're prepared to do the same for every other example, which is impossible. > So I replied: > Yes, that patch does not meet "sum() is fast now" goal, because > there's one more type `tuple` that is still slow. So, if we want > to make sum fast for all built-in types, we must make it fast for > tuples too. Here's a small patch, that just adds a special case > for tuples, as that is the only type that needs it. This patch > can be also extended to other types, e.g. lists and strings. > > Yes, authors of custom types won't have that simple option. But we > have it Who is this "we" here? Most users of Python use custom types. That's inherent in an OO language. The stdlib is full of custom types, and so are most third party libs and most applications. > , so why not use it, if it's MUCH easier than alternatives? The obvious alternative--just not doing it--is much easier. > >>> It's just you said: >> [...] >> First, that wasn't me; please try to keep your attributions straight. > > Oops, sorry, my mistake. > >> But I agree with the sentiment. There is an efficient way to >> concatenate a lot of cons lists (basically, just fold backward >> instead of forward), but sum is not it. > > Hm... If you implement fast __radd__ instead of __add__ sum would > use that, wouldn't it? Is that the easy way you were looking for? First, how do you propose that sum find out whether adding or radding is faster for a given type? More importantly: that wouldn't actually do anything in this case. The key to the optimization is doing the sequence of adds in reverse order, not flipping each add. >> So, whoever said that is right?encouraging people to treat sum() >> as the preferred way to concatenate large amounts of data is a bad >> idea. > > Then, what way you suggest to be preferred? For example how would you > do that in py4k? I guess you would prefer sum() to reject lists and > tuples as it now rejects strings and use some other function for them > instead? Or what? What is YOUR preferred way? I've already answered this, but I'll repeat it. sum is not the obvious way to concatenate sequences today, and my preferred way is for that to stay true. So, I'm: * +1 on sum using __iadd__ if it helps actual sums, -0 if it only helps list concatenation. * -1 on special casing tuple. * -1 on changing the docs to imply that sum should be used for sequences. * +0 on changing the docs to say "not sequences" instead of strings, and maybe even expand on it by showing how to concatenate sequences properly. * -0 on explicitly rejecting sequences or non-numbers or whatever in sum (largely because it's too hard to determine--and explain--what it should try to reject, but also because I don't think it's a common enough problem to be worth a change). * +0 on moving chain.from_iterable to builtins and renaming it. In other words, I don't think summing tuples is enough of an attractive nuisance that it's worth bending over backward to prevent it--but that doesn't mean we should bend over backward to improve something people shouldn't be doing, especially since that would make summing many other types into an attractive nuisance that doesn't exist today. >> Agreed. That makes the APIs a little more complicated (you need to >> a list and a list::iterator instead of just a node), but that's not >> a big deal. And, with (list, list::iterator) being equivalent to a >> node, it leads to exactly the same issues as you started with in >> having just a node type. > > We have 'list' and 'listiterator', 'tuple' and 'tupleiterator', 'set' > and 'setiterator'. Nothing unusual here. And no issues about them. But they aren't the same kind of thing at all. I don't want to explain the differences between what C++ calls iterators and what Python calls iterators unless it's necessary. But briefly, a std::list::iterator is a mutable reference to a node. Exposing that type means--as I've already said twice--that you end up with exactly the same problems you have exposing the node directly. If you don't understand why that's true, that's fine, but please stop ignoring it completely. >> Yes, deque is a great tool, but it's not the same tool as a linked >> list, and doesn't support the same algorithms. > > Not all of them, but some. I.e. if you used your cons-lists as queue > or stack, then deque is a good replacement. Well, yes, but a dynamic array like Python's list is also a perfectly good stack. So what? I honestly can't tell at this point whether you're being deliberately obtuse, or just don't understand the basics of why we have different data structures in the first place. >>> That wasn't saying "just make it do something different". That was >>> saying "you can have linked lists in python, that are O(N) summable". >> >> Which is exactly the point you were arguing against. If you now >> agree with everyone else, fine. There are types that can be >> efficiently concatenated, but not with sum. That's why everyone >> else thinks you shouldn't encourage people to use sum for general >> concatenation. > > Really, I don't understand that point. Are you saying, that sum() > must remain slow for FREQUENTLY USED standard types just because > there MAY BE some other types for which it would still be slow? You're twisting the emphasis drastically, but basically yes. Today, sum is not the best way to concatenate sequences. Making it work better for some sequences but not others would mean it's still not the best way to concatenate sequences, but it would _appear_ to be. That's the very definition of an attractive nuisance. >> Using a global variable (or a class attribute, which is the >> same thing) means that sum isn't reentrant, or thread-safe, or >> generator-safe. > > Is it now? No? Then what changes? Yes, it is now. So that's what changes. Again, this is something general and basic--operations that use global variables are not reentrant--and I can't tell whether you're being deliberately obtuse or whether you really don't understand that. From tjreedy at udel.edu Tue Jul 9 23:17:31 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 09 Jul 2013 17:17:31 -0400 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 7/9/2013 12:51 PM, Bruce Leban wrote: > If you want to do any operation on the clusters other than in iteration > order, without indexed access you're going to end up doing > list(grapheme_clusters(...)) first to give you indexed access. Maybe > that's the right thing to do sometimes but I wouldn't force it on > people. The string already provides indexed access but I need to know > cluster boundaries. I think the best alternative to a list subclass of grapheme substrings (a subclass so can add methods), might be a GraphemeSeq wrapper class that contains a string (perhaps in a known normal form) and a list of indexes to grapheme start positions. That would also allow grapheme-oriented methods. If not already done, either or both of these would be good pypi modules. -- Terry Jan Reedy From me at dpk.io Wed Jul 10 01:15:23 2013 From: me at dpk.io (David Kendal) Date: Wed, 10 Jul 2013 00:15:23 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <2627AA56-B8D3-4F9C-8ABE-A95108B711FF@dpk.io> On 9 Jul 2013, at 17:51, Bruce Leban wrote: > If you want to do any operation on the clusters other than in iteration > order, without indexed access you're going to end up doing > list(grapheme_clusters(...)) first to give you indexed access. Maybe that's > the right thing to do sometimes but I wouldn't force it on people. The > string already provides indexed access but I need to know cluster > boundaries. There's no reason the iterator returned can't be of a new type that allows indexing with the subscript operator. > --- Bruce dpk -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail URL: From joshua at landau.ws Wed Jul 10 01:40:27 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 00:40:27 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: <2627AA56-B8D3-4F9C-8ABE-A95108B711FF@dpk.io> References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> <2627AA56-B8D3-4F9C-8ABE-A95108B711FF@dpk.io> Message-ID: On 10 July 2013 00:15, David Kendal wrote: > On 9 Jul 2013, at 17:51, Bruce Leban wrote: > >> If you want to do any operation on the clusters other than in iteration >> order, without indexed access you're going to end up doing >> list(grapheme_clusters(...)) first to give you indexed access. Maybe that's >> the right thing to do sometimes but I wouldn't force it on people. The >> string already provides indexed access but I need to know cluster >> boundaries. > > There's no reason the iterator returned can't be of a new type that allows indexing with the subscript operator. I've only loosely followed this thread but that sounds like a really weird idea to me. The standard is to have an object with the properties you want that can be coerced to an iterator through its __iter__ method. Maybe that's what you meant, though. >>> range(133)[32] 32 >>> iter(range(133))[32] Traceback (most recent call last): File "", line 1, in TypeError: 'range_iterator' object is not subscriptable From me at dpk.io Wed Jul 10 03:06:13 2013 From: me at dpk.io (David Kendal) Date: Wed, 10 Jul 2013 02:06:13 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> <2627AA56-B8D3-4F9C-8ABE-A95108B711FF@dpk.io> Message-ID: <5EAB14A6-262B-4887-927C-748845C7201C@dpk.io> On 10 Jul 2013, at 00:40, Joshua Landau wrote: > I've only loosely followed this thread but that sounds like a really > weird idea to me. The standard is to have an object with the > properties you want that can be coerced to an iterator through its > __iter__ method. Maybe that's what you meant, though. Well, right. I meant "a new type" like dict.keys() and dict.values() are "view types" on a dictionary that provide iterator interfaces. This would just be a "grapheme view" on a string. dpk From antony.lee at berkeley.edu Wed Jul 10 04:27:06 2013 From: antony.lee at berkeley.edu (Antony Lee) Date: Tue, 9 Jul 2013 19:27:06 -0700 Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body In-Reply-To: References: Message-ID: "Part of the problem here would be maintaining the linkage when the temp enum object from _EnumDict was translated into an actual Enum member." I implemented the required behavior here: https://github.com/anntzer/enum Instead of creating a new enum object from the temp object stored in _EnumDict, I directly change the class of the temp object to its actual value once that class is built, thus keeping references correct (see test_backward_reference). However, this behavior breaks down if the Enum class also inherits from a type with a different layout (e.g., int), because I can't change the class of such objects. In fact, even class A(int): pass class B(int): pass A().__class__ = B fails (which is puzzling to me... I understand that you can't transform an instance of an int-subclass into an instance of an str-subclass), but here both classes should have the same layout... On the other hand IntEnums shouldn't need that kind of behavior anyways, so I just kept the old implementation for them (for any class for which instances can't be instantiated by object.__new__(cls), in fact). I haven't worked on forward-references but this should be not too hard to implement either: just add a __missing__ to _EnumDict (cf. the discussion on implicit enums) that creates temporary placeholder members on the fly. When these members are actually defined, initialize them. When class body finishes to execute, check that all placeholders have been initialized, throwing an error otherwise. Antony 2013/7/8 Antony Lee > Currently, during the execution of the body of the Enum declaration, > member names are bound to the values, not to the Enum members themselves. > For example > > class StateMachine(Enum): > A = {} > B = {1: A} # e.g. a transition table > > StateMachine.B[1] == {}, when one could have expected StateMachine.B[1] == > StateMachine.A > > It seems to me that a behavior where member names are bound to the members > instead of being bound to the values is more useful, as one can easily > retrieve the values from the members but not the other way round (at least > during the execution of class body). > > Initially, I thought that this could be changed by modifying _EnumDict, so > that its __setitem__ method sets the member in the dict, instead of the > value, but in fact this doesn't work because while the values are being set > in the _EnumDict the class itself doesn't exist yet (and for good reason: > the __init__ and __new__ methods may be defined later but there is no way > to know that). However, a possible solution could to momentarily create > Enum members as instances of some dummy class, and then later, after > execution of class body has completed, change the members' class to the > actual Enum and initialize them as needed (if an __init__ or a __new__ are > actually defined). Well, there are limitations with this approach (e.g. > the members are not fully initialized before class body finishes to > execute) but this seems better than the current behavior(?) > > Best, > > Antony > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Wed Jul 10 08:09:53 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 07:09:53 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DC36C2.8000509@pearwood.info> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> Message-ID: On 9 July 2013 17:13, Steven D'Aprano wrote: > On 09/07/13 19:35, Sergey wrote: >> >> On Jul 5, 2013 Stefan Behnel wrote: >> >>> No, please. Using sum() on lists is not more than a hack that >>> seems to be a cool idea but isn't. Seriously - what's the sum of >>> lists? Intuitively, it makes no sense at all to say sum(lists). >> >> >> It's the same as it is now. What else can you think about when you >> see: [1, 2, 3] + [4, 5] ? > > > Some of us think that using + for concatenation is an abuse of terminology, > or at least an unfortunate choice of operator, and are wary of anything that > encourages that terminology. > > Nevertheless, you are right, in Python 3 both + and sum of lists is > well-defined. At the moment sum is defined in terms of __add__. You want to > change it to be defined in terms of __iadd__. That is a semantic change that > needs to be considered carefully, it is not just an optimization. I agree it's not totally backward-compatible, but AFAICT that's only for broken code. __iadd__ should always just be a faster, in-place __add__ and so this change should never cause problems in properly written code. That makes it anything but a semantic change. It's the same way people discuss the order of __hash__ calls on updates to code but no-one calls it a *semantics* change. > I am uncomfortable about changing the semantics to use > __iadd__ instead of __add__, because I expect that this will change the > behaviour of sum() for non-builtins. Other than broken stuff, any guesses as to what? I'm trying to think of maybe an IO thing (directories where __add__ makes a new "directory viewer" and __iadd__ does a "cd") but none of them actually *change* behaviour. > I worry about increased complexity > making maintenance harder for no good reason. It's the "for no good reason" > that concerns me: you could answer some of my objections if you showed: The move to __iadd__, in my opinion, is such a trivial thing that "maintainability" shouldn't be concerned. Overriding for multiple types is definitely going to cause a hazard, but this is adding like 1 line to the codebase. > - bug reports or other public complaints by people (other than you) > complaining that sum(lists) is slow; I don't think that is a good measure -- I've personally found cases where "sum" looks nicer but isn't the best algorithm yet I've never complained because 2-3 lines is really not that big a deal and it *felt* like sum *had* to be O(n**2). I largely don't think of sum(list_of_lists) as a nice looking construct, but that could just be a learnt opinion and I'd never think of "sum(list_of_lists, [])" as counterintuitive. I might think "OMG INEFFICIENCY" for a long time coming, but I find it so hard to agree with those of you who say it doesn't make sense. I also think that holding back potential cases where __iadd__ is better (which is every __iadd__) because you think a fast "sum(list_of_lists, [])" would encourage that construct is a bit silly. Just say "that's not the best way to do it because it's not generic enough? whereas chain.from_iterables is" if you really feel that way. This is especially true if others agree that "chain.from_iterables" is deserving of __builtins__. > Earlier in this discussion, you posted benchmarks for the patched sum using > Python 2.7. Would you be willing to do it again for 3.3? And confirm that > the Python test suite continues to pass? Seconded. ? Doesn't work for anything other than mutatable, addable objects From p.f.moore at gmail.com Wed Jul 10 08:33:13 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Jul 2013 07:33:13 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> Message-ID: On 10 July 2013 07:09, Joshua Landau wrote: > I agree it's not totally backward-compatible, but AFAICT that's only > for broken code. __iadd__ should always just be a faster, in-place > __add__ and so this change should never cause problems in properly > written code. That makes it anything but a semantic change. It's the > same way people discuss the order of __hash__ calls on updates to code > but no-one calls it a *semantics* change. > It will stop working for tuples (which have no __iadd__. Or were you suggesting trying __iadd__ and falling back to __add__ (that's more complex, obviously, and I don't think I'd assume it's "trivial" extra complexity) or special-caseing tuples (that's even more complex, and doesn't solve the problem for other iimmutables)? Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Wed Jul 10 08:34:28 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 10 Jul 2013 08:34:28 +0200 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> Message-ID: <605747C3-0089-46DC-80E8-2B404431E128@mac.com> On 9 Jul, 2013, at 19:11, Andrew Barnert wrote: > On Jul 9, 2013, at 6:42, Sergey wrote: > >> On Jul 8, 2013 Andrew Barnert wrote: >> >>> I'm -1 on adding special-casing for tuples that would not be >>> available for any other immutable type. >> >> Ok, let's be honest, I don't like that special case too. :( >> But when I had two options: >> >> 1. Make sum faster for everything BUT tuples and write in a manual: >> ... >> sum() is fast for all built-in types except `tuple`. For tuples >> you have to manually convert it to list, i.e. instead of: >> sum(list_of_tuples, tuple()) >> you have to write: >> tuple(sum(map(list,list_of_tuples),[])) >> or >> tuple(itertools.chain.from_iterable(list_of_tuples)) >> ... >> >> 2. Implement just one (!) special case for the only type in python >> needing it and write: >> ... >> sum() is fast for all built-in types! >> ... >> >> I chose #2. Tuple is one of the most frequently used types in python, >> and it's the only type that needs such a special case. >> >> Until someone writes a better solution: Practicality beats purity. >> That was my motivation. > > #3 is for the docs to say what they currently say--sum is fast for numbers--but change the "not strings" to "not sequences", and maybe add a note saying _how_ to concatenate sequences (usually join for strings and chain for everything else). Good idea, I hope Guido hasn't noticed that his time machine was gone for a while ;-) sum(iterable[, start]) Sums start and the items of an iterable from left to right and returns the total. start defaults to 0. The iterable?s items are normally numbers, and the start value is not allowed to be a string. For some use cases, there are good alternatives to sum(). The preferred, fast way to concatenate a sequence of strings is by calling ''.join(sequence). To add floating point values with extended precision, see math.fsum(). To concatenate a series of iterables, consider using itertools.chain(). This is from http://docs.python.org/3/library/functions.html (and is in the python 2.7 version as well) Ronald From joshua at landau.ws Wed Jul 10 08:58:26 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 07:58:26 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> Message-ID: On 10 July 2013 07:33, Paul Moore wrote: > On 10 July 2013 07:09, Joshua Landau wrote: >> >> I agree it's not totally backward-compatible, but AFAICT that's only >> for broken code. __iadd__ should always just be a faster, in-place >> __add__ and so this change should never cause problems in properly >> written code. That makes it anything but a semantic change. It's the >> same way people discuss the order of __hash__ calls on updates to code >> but no-one calls it a *semantics* change. > > It will stop working for tuples (which have no __iadd__. Or were you > suggesting trying __iadd__ and falling back to __add__ (that's more complex, > obviously, and I don't think I'd assume it's "trivial" extra complexity) or > special-caseing tuples (that's even more complex, and doesn't solve the > problem for other iimmutables)? Surely it just does the equivalent of "a += b", which handles fallback for you. People don't write "a.__iadd__(b)" as much as they don't write "a.__add__(b)". Python's C API has an inplace addition that mimics this, btw. And I'm not supporting special-casing anyway. From flying-sheep at web.de Wed Jul 10 14:04:13 2013 From: flying-sheep at web.de (Philipp A.) Date: Wed, 10 Jul 2013 14:04:13 +0200 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: <5EAB14A6-262B-4887-927C-748845C7201C@dpk.io> References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> <2627AA56-B8D3-4F9C-8ABE-A95108B711FF@dpk.io> <5EAB14A6-262B-4887-927C-748845C7201C@dpk.io> Message-ID: 2013/7/10 David Kendal Well, right. I meant ?a new type? like dict.keys() and dict.values() are ?view types? on a dictionary that provide iterator interfaces. This would just be a ?grapheme view? on a string. i think that?s the way to go. who would want dozens of new functions in unicodedata? how about something like the following? it can easily be extended to get a reverse iterator. setting its pos and calling find_grapheme or __next__ or previous allows for bruce?s usecases. class GraphemeIterator: def __init__(self, string, start=0): self.string = string self.pos = start def __iter__(self): return self def __next__(self): _, next_pos, grapheme = self.find_grapheme() self.pos = next_pos return grapheme def previous(self): prev_pos, _, grapheme = self.find_grapheme(backwards=True) self.pos = prev_pos return grapheme def find_grapheme(self, i=None, *, backwards=False): """finds next complete grapheme in string, starting at position i if backwards is not set, finds grapheme starting at i, or the next one if i is in the middle of one if it is set, it finds the grapheme which i points to, even if that?s the middle. if str[i] is the beginning of a grapheme, backwards finds the one before it. """ if i is None: i = self.pos ... return (start, end, grapheme) def find_grapheme(string, i, backwards=False): """ convenience function for oneshotting it """ return GraphemeIterator(string, i).find_grapheme(backwards=backwards) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Wed Jul 10 14:10:37 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 13:10:37 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> <2627AA56-B8D3-4F9C-8ABE-A95108B711FF@dpk.io> <5EAB14A6-262B-4887-927C-748845C7201C@dpk.io> Message-ID: On 10 July 2013 13:04, Philipp A. wrote: > 2013/7/10 David Kendal > > Well, right. I meant ?a new type? like dict.keys() and dict.values() are > ?view types? > on a dictionary that provide iterator interfaces. This would just be a > ?grapheme view? on a string. > > i think that?s the way to go. who would want dozens of new functions in > unicodedata? You've missed both of our points. Consider: >>> {}.keys() dict_keys([]) >>> iter({}.keys()) There are good reasons why a "view" should not be its iterator. > how about something like the following? it can easily be extended to get a > reverse iterator. > > setting its pos and calling find_grapheme or __next__ or previous allows for > bruce?s usecases. > > class GraphemeIterator: > def __init__(self, string, start=0): > self.string = string > self.pos = start > > def __iter__(self): > return self > > def __next__(self): > _, next_pos, grapheme = self.find_grapheme() > self.pos = next_pos > return grapheme > > def previous(self): > prev_pos, _, grapheme = self.find_grapheme(backwards=True) > self.pos = prev_pos > return grapheme > > def find_grapheme(self, i=None, *, backwards=False): > """finds next complete grapheme in string, starting at position i > if backwards is not set, finds grapheme starting at i, or the next > one if i is in the middle of one > if it is set, it finds the grapheme which i points to, even if > that?s the middle. > if str[i] is the beginning of a grapheme, backwards finds the one > before it. > """ > if i is None: > i = self.pos > ... > return (start, end, grapheme) > > def find_grapheme(string, i, backwards=False): > """ convenience function for oneshotting it """ > return GraphemeIterator(string, i).find_grapheme(backwards=backwards) From ronaldoussoren at mac.com Wed Jul 10 16:26:47 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 10 Jul 2013 16:26:47 +0200 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> Message-ID: <543AFB7F-6751-4601-977B-A817C68B7BB2@mac.com> On 10 Jul, 2013, at 8:33, Paul Moore wrote: > > On 10 July 2013 07:09, Joshua Landau wrote: > I agree it's not totally backward-compatible, but AFAICT that's only > for broken code. __iadd__ should always just be a faster, in-place > __add__ and so this change should never cause problems in properly > written code. That makes it anything but a semantic change. It's the > same way people discuss the order of __hash__ calls on updates to code > but no-one calls it a *semantics* change. > > It will stop working for tuples (which have no __iadd__. Or were you suggesting trying __iadd__ and falling back to __add__ (that's more complex, obviously, and I don't think I'd assume it's "trivial" extra complexity) or special-caseing tuples (that's even more complex, and doesn't solve the problem for other iimmutables)? Both "+=" in Python and its C API equivalent (PyNumber_InPlaceAdd) perform an in place addition (__iadd__) when possible and fall back to using normal addition (__add__) when the in place method is not supported. Thus barring bugs and creative misuse of in place operators using += instead of + in sum shouldn't affect the result of sum. Still-unconviced-about-the-usefulness-ly yours, Ronald From sergemp at mail.ru Wed Jul 10 18:10:07 2013 From: sergemp at mail.ru (Sergey) Date: Wed, 10 Jul 2013 19:10:07 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51DB728B.2040709@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> Message-ID: <20130710191007.5f525fb3@sergey> On Jul 9, 2013 Steven D'Aprano wrote: > The fact that sum(lists) has had quadratic performance since sum > was first introduced in Python 2.3, and I've *never* seen anyone > complain about it being slow, suggests very strongly that this is not > a use-case that matters. Never seen? Are you sure? ;) > http://article.gmane.org/gmane.comp.python.general/658630 > From: Steven D'Aprano @ 2010-03-29 > In practical terms, does anyone actually ever use sum on more than a > handful of lists? I don't believe this is more than a hypothetical > problem. This is definitely not the first time people ask to fix this O(N**2) "surprise". Maybe if the problem appears year after year it is not so hypothetical? sum() was the first answer suggested for many list-of-lists questions [1], and sometimes it wasn't even obvious for people why it might be slow [2]. IMHO, many of those who actually spot the slowness, will just think "heh, python is so slow", and won't blame sum() for bug. Why do you think sum() code has explicit comment about using sum for list of lists? Why test_builtin.py checks this case? Isn't it because this is the common case? Or do you mean that nobody suggested a patch before? Well, think about it like that: 1. How many people in the world use python? 2. How many of them need to add lists? 3. How many of them are careful enough to notice that sum is slow? 4. How many of them are experienced enough to blame sum for that? 5. How many of those are smart enough to understand how it can be fixed, not just workarounded by using another function? 6. How many of those are skilled enough to dig into python code and able to fix the bug there? 7. How many of those have enough free time to come here and start asking to accept the patch? Not too many, right? And among those someone should be the first. Well, I am. :) How many do you need? > No no no. The objection is that complicating the implementation of > a function in order to optimize a use-case that doesn't come up in > real-world use is actually harmful. Maintaining sum will be harder, > for the sake of some benefit that very possibly nobody will actually > receive. Are you sure it complicates the implementation? Here's how sum function looks NOW, without my patches (well, it looks different in C, but idea is the same): def sum(seq, start = 0): it = iter(seq) if isinstance(start, str): raise TypeError( "sum() can't sum strings [use ''.join(seq) instead]") if isinstance(start, bytes): raise TypeError( "sum() can't sum bytearray [use b''.join(seq) instead]") # SPECIAL CASES if isinstance(start, int): i_result = int(start, overflow) if not overflow: try: start = None while start is None: item = next(it) if isinstance(item, int): b = int(item, overflow) x = i_result + b if not overflow and ((x^i_result) >= 0 or (x^b) >= 0) i_result = x continue start = i_result start = start + item except StopIteration: return i_result if isinstance(start, float): f_result = float(start) try: start = None while start is None: item = next(it) if isinstance(item, float): f_result += float(item) continue if isinstance(item, int): value = int(item, overflow) if not overflow: f_result += float(value); continue start = f_result start = start + item except StopIteration: return f_result # END OF SPECIAL CASES try: while True: item = next(it) result = result + item except StopIteration: return start So simple and obvious. :) My original patch changes the part: while True: item = next(it) result = result + item to: result = result + item while True: item = next(it) result += item (In python that effectively adds one line, in C that's 6 lines) This does not really complicate the Alternative to that patch is one more special case for "known slow types" i.e. lists and tuples. That was my second patch. In python that would be like: if isinstance(start, list) or isinstance(start, tuple): optimize_for = type(start) l_result = list(start) try: while True: item = next(it) if not isinstance(item, optimize_for): start = optimize_for(l_result) start = start + item break l_result.extend(item) except StopIteration: return optimize_for(l_result) Yes, that's not just one line, but does it really complicates existing code that much? Theoretically this code could be extended to any iterable. So, *theoretically* it could be: if start_is_iterable: optimize_for = type(start) ... same code here ... but there's a technical problem in the line: start = optimize_for(l_result) This line works for lists and tuples. It will even work for set and frozenset, if needed. But the problem is that I cannot guarantee that it will work for arbitrary type. For example it does not work for strings (even worse, it works, but not as I would want it to). In my patch I also showed how strings can be handled for that case. Basically, this very single line is the only line holding me from making my "special case" into "general case" for all iterables in the world. So if somebody knows how to solve this problem ? any suggestions welcome. > I don't care that sum() is O(N**2) on strings, linked lists, > tuples, lists. I don't think we should care. Sergey thinks we should > care, and is willing to change the semantics of sum AND include as > many special cases as needed in order to "guarantee" that sum will be > "always fast". I don't believe that guarantee can possibly hold, and Just ONE special case is needed ? the one for iterables. And yes, the "guarantee" can hold, because it only affects built-in types, and my patch covers all of them, even those that are not supported by sum() yet. But it's also good for third-party types, because it gives them an option to implement a fast __iadd__. They don't have this option now. > Flattening sequences is not sum. You have to consider ... Yet people think [1] that sum() is useful for that. Every year somebody comes and tries to use sum(), and often someone else says "Hey, don't use sum() it's slow". "BECAUSE IT'S SLOW!" All that talks about "sum() is not designed for...", "it's just because + is used for concatenation...", "sum() should not be used...", "you have to consider..." ? all of these are just excuses to explain why the bug is there. Maybe it's time to stop searching for excuses and finally fix the bug? Especially if it's so easy to fix it. -- [1] Some questions about lists flattening: http://stackoverflow.com/questions/716477/join-list-of-lists-in-python http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python [2] "explain for a noob python learner please: Is O(n^2) good or bad in this case?" http://article.gmane.org/gmane.comp.python.general/441831 > A fast implementation would probably allocate the output list just > once and then stream the values into place with a simple index. That's what I hoped "sum" would do, but instead it barfs with a type error. From steve at pearwood.info Wed Jul 10 18:29:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Jul 2013 02:29:15 +1000 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> Message-ID: <51DD8BDB.6050101@pearwood.info> On 10/07/13 16:09, Joshua Landau wrote: > On 9 July 2013 17:13, Steven D'Aprano wrote: [...] >> Nevertheless, you are right, in Python 3 both + and sum of lists is >> well-defined. At the moment sum is defined in terms of __add__. You want to >> change it to be defined in terms of __iadd__. That is a semantic change that >> needs to be considered carefully, it is not just an optimization. > > I agree it's not totally backward-compatible, but AFAICT that's only > for broken code. __iadd__ should always just be a faster, in-place > __add__ and so this change should never cause problems in properly > written code. "Always"? Immutable objects cannot define __iadd__ as an in-place __add__. In any case, sum() currently does not modify the start argument in-place. >That makes it anything but a semantic change. __iadd__ is optional for classes that support addition. Failure to define an __iadd__ method does not make your class broken. Making __iadd__ mandatory to support sum would be a semantic change, since there will be objects (apart from strs and bytes, which are special-cased) that support addition with + but will no longer be summable since they don't define __iadd__. Even making __iadd__ optional will potentially break working code. Python doesn't *require* that __iadd__ perform the same operation as __add__. That is the normal expectation, of course, but it's not enforced. (How could it be?) We might agree that objects where __add__ and __iadd__ do different things are "broken" in some sense, but you're allowed to write broken code, and Python should (in principle) avoid making it even more broken by changing behaviour unnecessarily. But maybe the right answer there is simply "don't call sum if you don't want __iadd__ called". -- Steven From ethan at stoneleaf.us Wed Jul 10 18:50:45 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Jul 2013 09:50:45 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> Message-ID: <51DD90E5.1020505@stoneleaf.us> On 07/09/2013 11:09 PM, Joshua Landau wrote: > On 9 July 2013 17:13, Steven D'Aprano wrote: >> On 09/07/13 19:35, Sergey wrote: >>> >>> On Jul 5, 2013 Stefan Behnel wrote: >>> >>>> No, please. Using sum() on lists is not more than a hack that >>>> seems to be a cool idea but isn't. Seriously - what's the sum of >>>> lists? Intuitively, it makes no sense at all to say sum(lists). >>> >>> >>> It's the same as it is now. What else can you think about when you >>> see: [1, 2, 3] + [4, 5] ? >> >> >> Some of us think that using + for concatenation is an abuse of terminology, >> or at least an unfortunate choice of operator, and are wary of anything that >> encourages that terminology. >> >> Nevertheless, you are right, in Python 3 both + and sum of lists is >> well-defined. At the moment sum is defined in terms of __add__. You want to >> change it to be defined in terms of __iadd__. That is a semantic change that >> needs to be considered carefully, it is not just an optimization. > > I agree it's not totally backward-compatible, but AFAICT that's only > for broken code. __iadd__ should always just be a faster, in-place > __add__ and so this change should never cause problems in properly > written code. That makes it anything but a semantic change. It's the > same way people discuss the order of __hash__ calls on updates to code > but no-one calls it a *semantics* change. Currently, sum() does not modify its arguments. You (or whoever) are suggesting that it should modify one of them. That makes it a semantic change, and a bad one. -1 -- ~Ethan~ From steve at pearwood.info Wed Jul 10 19:20:38 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Jul 2013 03:20:38 +1000 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DD90E5.1020505@stoneleaf.us> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> Message-ID: <51DD97E6.2070105@pearwood.info> On 11/07/13 02:50, Ethan Furman wrote: > Currently, sum() does not modify its arguments. > > You (or whoever) are suggesting that it should modify one of them. > > That makes it a semantic change, and a bad one. > > -1 Actually, Sergey's suggestion is a bit more clever than that. I haven't tested his C version, but the intention of his pure-Python demo code is to make a temporary list, modify the temporary list in place for speed, and then convert to whatever type is needed. That will avoid modifying any of the arguments[1]. So credit to Sergey for avoiding that trap. [1] Except possibly in truly pathological cases which I for one don't care about. -- Steven From flying-sheep at web.de Wed Jul 10 19:39:48 2013 From: flying-sheep at web.de (Philipp A.) Date: Wed, 10 Jul 2013 19:39:48 +0200 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> <2627AA56-B8D3-4F9C-8ABE-A95108B711FF@dpk.io> <5EAB14A6-262B-4887-927C-748845C7201C@dpk.io> Message-ID: 2013/7/10 Joshua Landau joshua at landau.ws >>> {}.keys() dict_keys([]) >>> iter({}.keys()) There are good reasons why a ?view? should not be its iterator. you?re right, but one would expect the view?s __getitem__(i) method to return the ith grapheme, which implies constant-time access. and we can only support linear-time access to that (i.e. by iterating stuff) if we don?t want to build a complex index. so should we do a view object that only allows something like my find_grapheme and iteration? -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Jul 10 19:49:12 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Jul 2013 03:49:12 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130710191007.5f525fb3@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> Message-ID: <51DD9E98.5030203@pearwood.info> On 11/07/13 02:10, Sergey wrote: > On Jul 9, 2013 Steven D'Aprano wrote: > >> The fact that sum(lists) has had quadratic performance since sum >> was first introduced in Python 2.3, and I've *never* seen anyone >> complain about it being slow, suggests very strongly that this is not >> a use-case that matters. > > Never seen? Are you sure? ;) >> http://article.gmane.org/gmane.comp.python.general/658630 >> From: Steven D'Aprano @ 2010-03-29 >> In practical terms, does anyone actually ever use sum on more than a >> handful of lists? I don't believe this is more than a hypothetical >> problem. Yes, and I stand by what I wrote back then. > Not too many, right? And among those someone should be the first. > Well, I am. :) How many do you need? > >> No no no. The objection is that complicating the implementation of >> a function in order to optimize a use-case that doesn't come up in >> real-world use is actually harmful. Maintaining sum will be harder, >> for the sake of some benefit that very possibly nobody will actually >> receive. > > Are you sure it complicates the implementation? No, I'm not sure. > Here's how sum function looks NOW, without my patches > (well, it looks different in C, but idea is the same): [snip code] > Alternative to that patch is one more special case for "known slow > types" i.e. lists and tuples. That was my second patch. In python that > would be like: > if isinstance(start, list) or isinstance(start, tuple): > optimize_for = type(start) > l_result = list(start) > try: > while True: > item = next(it) > if not isinstance(item, optimize_for): > start = optimize_for(l_result) > start = start + item > break > l_result.extend(item) > except StopIteration: > return optimize_for(l_result) > > Yes, that's not just one line, but does it really complicates > existing code that much? Of course, I understand that this is not the actual C code your patch contains. But I can see at least three problems with the above Python version, and I assume your C version will have the same flaws. 1) You test start using isinstance(start, list), but it should be "type(start) is list". If start is a subclass of list that overrides __add__ (or __iadd__), you should call the overridden methods. But your code does not, it calls list.extend instead. (Same applies for tuples.) 2) You assume that summing a sequence must return the type of the start argument. But that is not correct. See example below. 3) This can use twice as much memory as the current implementation. You build a temporary list containing the result, then you make a copy using the original type. If the result is very large, you might run out of memory trying to make the copy. So there are three potential problems with your patch. One will potentially cause code that currently works to fail with MemoryError. The other two will potentially cause code to return different results. These are exactly the sort of subtle, and unintended, changes in behaviour that I consider bugs. Here is an example of a multi-type sum: py> class A(list): ... def __add__(self, other): ... return type(self)(super().__add__(other)) ... def __radd__(self, other): ... return type(self)(other) + self ... py> result = sum([[1], [2], A([3]), [4]], []) py> type(result) It looks to me that your code will return a list instead of an A. By the way, Sergey, I should say that even though I have been hard on your suggestion, I do thank you for spending the time on this and value your efforts. -- Steven From abarnert at yahoo.com Wed Jul 10 19:47:59 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Jul 2013 10:47:59 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130710191007.5f525fb3@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> Message-ID: <8D61E0C1-943F-4BAD-A9DA-A519F0955853@yahoo.com> Let's split this into parts. What you have is (a) a patch to make iaddable types sum faster, (b) a patch to make tuples sum faster, and (c) 37 different half-baked ideas for how to maybe come up with a patch to make either all iterables or all addable types sum faster, most of which you later deny having ever offered. If you drop all the stuff related to sequences, and come up with a numeric case that's helped by (a), I'm pretty sure you could get that approved with little to no objection, and then come back to the rest later. And there probably are such types. Someone suggested some C-implemented quaternion library. I suggested numpy.matrix. Someone else suggested a different vector class. And there are plenty of other obvious things to try--huge pygmp ints, a BCD class, ... All you have to do is try them out and show that at least one of them gets a significant speedup. Once you get (a) accepted (or fail to find any good use for it besides lists, which I think is less likely), come back to the rest, and pick an argument and stick to it. Either (b) is good on its own because tuples are special and there's no reason to make it general, or tuples aren't special and it should be general and therefore you have a concrete proposal that actually works as expected on a variety of types that you've actually tested. On Jul 10, 2013, at 9:10, Sergey wrote: > On Jul 9, 2013 Steven D'Aprano wrote: > >> No no no. The objection is that complicating the implementation of >> a function in order to optimize a use-case that doesn't come up in >> real-world use is actually harmful. Maintaining sum will be harder, >> for the sake of some benefit that very possibly nobody will actually >> receive. > > Are you sure it complicates the implementation? The iadd case doesn't complicate it much. But the tuple case, halfway hacked up toward a more general solution that you haven't quite envisioned, does. Meanwhile, the fact that ints and floats _also_ complicate things isn't a good argument here. Adding ints is the paradigm use case for sum, so it's arguably worth extra complexity. Also, the fact that it's been maintained for over a decade, through the int/ long unification and py3k, means nobody has to guess whether it will be a maintenance burden. > Alternative to that patch is one more special case for "known slow > types" i.e. lists and tuples. That was my second patch. In python that > would be like: > if isinstance(start, list) or isinstance(start, tuple): > optimize_for = type(start) > l_result = list(start) > try: > while True: > item = next(it) > if not isinstance(item, optimize_for): > start = optimize_for(l_result) > start = start + item > break > l_result.extend(item) > except StopIteration: > return optimize_for(l_result) > > Yes, that's not just one line, but does it really complicates > existing code that much? > > Theoretically this code could be extended to any iterable. So, > *theoretically* it could be: > if start_is_iterable: > optimize_for = type(start) > ... same code here ... No it can't. Or, rather, it would be a very bad idea. Matrices would be unraveled and concatenated into vectors instead of added, types that aren't addable for good reason like dicts and files would do various different odd things instead of raising, intentionally lazy (and potentially infinite) types would be forced strict, types that can append faster than list like deque would slow down (as would set if it became addable) and/or use much more memory, ... > but there's a technical problem in the line: > start = optimize_for(l_result) > This line works for lists and tuples. It will even work for set and > frozenset, if needed. But the problem is that I cannot guarantee that > it will work for arbitrary type. For example it does not work for > strings (even worse, it works, but not as I would want it to). > In my patch I also showed how strings can be handled for that case. > > Basically, this very single line is the only line holding me from > making my "special case" into "general case" for all iterables in > the world. And that's lucky, because as a general case it would be a very bad thing. >> I don't care that sum() is O(N**2) on strings, linked lists, >> tuples, lists. I don't think we should care. Sergey thinks we should >> care, and is willing to change the semantics of sum AND include as >> many special cases as needed in order to "guarantee" that sum will be >> "always fast". I don't believe that guarantee can possibly hold, and > > Just ONE special case is needed ? the one for iterables. And yes, > the "guarantee" can hold, because it only affects built-in types, > and my patch covers all of them, even those that are not supported > by sum() yet. A guarantee that only holds for builtin types and cannot be extended in any way to other types is more misleading than useful. Today, a few people misuse sum on sequences and get quadratic behavior. That's exactly what you want to fix. But with your change, many more people would use sum on sequences and get quadratic behavior, because the whole point of your patch is to make sum the obvious way to concatenate sequences. And, while today, it's a bug in their code that can be explained with a simple link to the docs, with your change, it will be a bug in python requiring a workaround explained by a link to some FAQ or blog post. And if that isn't what you intend, then you don't intend for sum to be the obvious way to concatenate sequences, and your patch is just bending over backward to make buggy code run better. You can't have it both ways. > But it's also good for third-party types, because it > gives them an option to implement a fast __iadd__. They don't have > this option now. The first patch alone does that for all fast-iaddable types, including non-sequences as well as list-like sequences. The second patch adds nothing to it, nor do any of your attempts to generalize it. It only works for mutable types that can iadd faster than they can add. You've already been given examples of types that isn't true for--any immutable type, cons lists, etc. Even if you could find a theoretical answer for those cases--which so far you haven't, but there's no harm in continuing to try--that won't actually help anyone using an actual python interpreter rather than some vague theoretically possible one. >> Flattening sequences is not sum. You have to consider ... > > Yet people think [1] that sum() is useful for that. People also think that "if ('foo' or 'bar') in baz" is useful. It comes up every few days, not just a few times a year like summing tuples. It's even the basis of a running joke on StackOverflow. Does this mean we should change Python so it does what they expect? And again, your patch doesn't solve the problem, it makes it worse. If people mistakenly think that sum is useful for tuples, they're also going to think that it's useful for all kinds of other sequences. They'll still be wrong--but now the docs, and their early experiences, will tell them otherwise. From joshua at landau.ws Wed Jul 10 20:08:37 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 19:08:37 +0100 Subject: [Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes) In-Reply-To: References: <87a9lwuwog.fsf@uwakimon.sk.tsukuba.ac.jp> <2627AA56-B8D3-4F9C-8ABE-A95108B711FF@dpk.io> <5EAB14A6-262B-4887-927C-748845C7201C@dpk.io> Message-ID: On 10 July 2013 18:39, Philipp A. wrote: > 2013/7/10 Joshua Landau joshua at landau.ws > >>>> {}.keys() > dict_keys([]) >>>> iter({}.keys()) > > > There are good reasons why a ?view? should not be its iterator. > > you?re right, but one would expect the view?s __getitem__(i) method to > return the ith grapheme, which implies constant-time access. and we can only > support linear-time access to that (i.e. by iterating stuff) if we don?t > want to build a complex index. > > so should we do a view object that only allows something like my > find_grapheme and iteration? I haven't followed much of this because it's not very relevant to me now. I just thought it extremely odd to have an interface inconsistent with Python's standard. However, if what you want is something that works akin to a IOWrapper, then I'm wrong and an iterator that has lots of methods is actually already standard. Hence I've changed my mind. That may not have been what you expected me to say. From joshua at landau.ws Wed Jul 10 20:21:08 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 19:21:08 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DD8BDB.6050101@pearwood.info> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> Message-ID: On 10 July 2013 17:29, Steven D'Aprano wrote: > On 10/07/13 16:09, Joshua Landau wrote: >> >> On 9 July 2013 17:13, Steven D'Aprano wrote: > > [...] > >>> Nevertheless, you are right, in Python 3 both + and sum of lists is >>> well-defined. At the moment sum is defined in terms of __add__. You want >>> to >>> change it to be defined in terms of __iadd__. That is a semantic change >>> that >>> needs to be considered carefully, it is not just an optimization. >> >> >> I agree it's not totally backward-compatible, but AFAICT that's only >> for broken code. __iadd__ should always just be a faster, in-place >> __add__ and so this change should never cause problems in properly >> written code. > > > "Always"? Immutable objects cannot define __iadd__ as an in-place __add__. > > In any case, sum() currently does not modify the start argument in-place. Now you're just (badly) playing semantics. If I say that gills are always like lungs except they work underwater, would you contradict me by stating that mammals don't have gills? >> That makes it anything but a semantic change. > > __iadd__ is optional for classes that support addition. Failure to define an > __iadd__ method does not make your class broken. > > Making __iadd__ mandatory to support sum would be a semantic change, since > there will be objects (apart from strs and bytes, which are special-cased) > that support addition with + but will no longer be summable since they don't > define __iadd__. Why are you saying these things? I never suggested anything like that. > Even making __iadd__ optional will potentially break working code. Python > doesn't *require* that __iadd__ perform the same operation as __add__. That > is the normal expectation, of course, but it's not enforced. (How could it > be?) We might agree that objects where __add__ and __iadd__ do different > things are "broken" in some sense, but you're allowed to write broken code, > and Python should (in principle) avoid making it even more broken by > changing behaviour unnecessarily. But maybe the right answer there is simply > "don't call sum if you don't want __iadd__ called". Python has previously had precedents where broken code does not get to dictate the language as long as that code was very rare. This is more than very rare. Additionally, Python does (unclearly, but it does do so) define __iadd__ to be an inplace version of __add__, so the code isn't just ?broken? -- it's broken. From abarnert at yahoo.com Wed Jul 10 22:14:20 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Jul 2013 13:14:20 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DD97E6.2070105@pearwood.info> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> Message-ID: <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> On Jul 10, 2013, at 10:20, Steven D'Aprano wrote: > On 11/07/13 02:50, Ethan Furman wrote: > >> Currently, sum() does not modify its arguments. >> >> You (or whoever) are suggesting that it should modify one of them. >> >> That makes it a semantic change, and a bad one. >> >> -1 > > > Actually, Sergey's suggestion is a bit more clever than that. I haven't tested his C version, but the intention of his pure-Python demo code is to make a temporary list, modify the temporary list in place for speed, and then convert to whatever type is needed. That will avoid modifying any of the arguments[1]. So credit to Sergey for avoiding that trap. Actually, he has two versions. The first does a + once and then a += repeatedly on the result. This solves the problem neatly (except with empty iterables, but that's trivial to fix, and I think his C code actually doesn't have that problem...). There's no overhead, it automatically falls back to __add__ if __iadd__ is missing, and the only possible semantic differences are for types that are already broken. The second makes a list of the argument (which means copying it if it's already a list), then calls extend repeatedly on the result, then converts back. This doesn't solve the problem in many cases, does the wrong thing in many others, and always adds overhead. And that's exactly why I think it's worth splitting into separate pieces. It's very easy for people to see problems with the second version and wrongly assume they also apply to the first (and the way he presents and argues for his ideas doesn't help). As far as I know, nobody has yet found any problem with the first version, except for the fact that it would encourage people to use sum on lists. I don't think that's a serious problem--the docs already say not to do it--and if it's a useful optimization for any number-like types, I think it's worth having. It's the second version, together with all of the attempts to make it fully generally for any concatenable type--or, alternatively, to argue that only builtin concatenable types matter--that I have a problem with. From joshua at landau.ws Wed Jul 10 22:45:37 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 21:45:37 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> Message-ID: On 10 July 2013 21:14, Andrew Barnert wrote: > [Sergey] has two versions. > > The first does a + once and then a += repeatedly on the result. This solves the problem neatly (except with empty iterables, but that's trivial to fix, and I think his C code actually doesn't have that problem...). There's no overhead, it automatically falls back to __add__ if __iadd__ is missing, and the only possible semantic differences are for types that are already broken. > > The second makes a list of the argument (which means copying it if it's already a list), then calls extend repeatedly on the result, then converts back. This doesn't solve the problem in many cases, does the wrong thing in many others, and always adds overhead. > > And that's exactly why I think it's worth splitting into separate pieces. It's very easy for people to see problems with the second version and wrongly assume they also apply to the first (and the way he presents and argues for his ideas doesn't help). > > As far as I know, nobody has yet found any problem with the first version, except for the fact that it would encourage people to use sum on lists. I don't think that's a serious problem--the docs already say not to do it--and if it's a useful optimization for any number-like types, I think it's worth having. > > It's the second version, together with all of the attempts to make it fully generally for any concatenable type--or, alternatively, to argue that only builtin concatenable types matter--that I have a problem with. If Sergey doesn't do this separation, would it be fine if I did it? I like the idea for __iadd__ sum, and I don't want Sergey block progress on the issue. From ron3200 at gmail.com Wed Jul 10 23:00:31 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 10 Jul 2013 16:00:31 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51DD9E98.5030203@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> Message-ID: On 07/10/2013 12:49 PM, Steven D'Aprano wrote: > On 11/07/13 02:10, Sergey wrote: >> On Jul 9, 2013 Steven D'Aprano wrote: >> >>> The fact that sum(lists) has had quadratic performance since sum >>> was first introduced in Python 2.3, and I've *never* seen anyone >>> complain about it being slow, suggests very strongly that this is not >>> a use-case that matters. >> >> Never seen? Are you sure? ;) >>> http://article.gmane.org/gmane.comp.python.general/658630 >>> From: Steven D'Aprano @ 2010-03-29 >>> In practical terms, does anyone actually ever use sum on more than a >>> handful of lists? I don't believe this is more than a hypothetical >>> problem. > > Yes, and I stand by what I wrote back then. Just curious, how does your sum compare with fsum() in the math module? (Yes, I know it's specialised for floats?) It says that much in the docs. fsum(...) fsum(iterable) Return an accurate floating point sum of values in the iterable. Assumes IEEE-754 floating point arithmetic. Have you looked at it? Cheers, Ron From ethan at stoneleaf.us Wed Jul 10 23:03:52 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Jul 2013 14:03:52 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> Message-ID: <51DDCC38.2030603@stoneleaf.us> On 07/10/2013 01:45 PM, Joshua Landau wrote: > On 10 July 2013 21:14, Andrew Barnert wrote: >> [Sergey] has two versions. >> >> The first does a + once and then a += repeatedly on the result. This solves the problem neatly (except with empty iterables, but that's trivial to fix, and I think his C code actually doesn't have that problem...). There's no overhead, it automatically falls back to __add__ if __iadd__ is missing, and the only possible semantic differences are for types that are already broken. >> >> The second makes a list of the argument (which means copying it if it's already a list), then calls extend repeatedly on the result, then converts back. This doesn't solve the problem in many cases, does the wrong thing in many others, and always adds overhead. >> >> And that's exactly why I think it's worth splitting into separate pieces. It's very easy for people to see problems with the second version and wrongly assume they also apply to the first (and the way he presents and argues for his ideas doesn't help). >> >> As far as I know, nobody has yet found any problem with the first version, except for the fact that it would encourage people to use sum on lists. I don't think that's a serious problem--the docs already say not to do it--and if it's a useful optimization for any number-like types, I think it's worth having. >> >> It's the second version, together with all of the attempts to make it fully generally for any concatenable type--or, alternatively, to argue that only builtin concatenable types matter--that I have a problem with. > > If Sergey doesn't do this separation, would it be fine if I did it? I > like the idea for __iadd__ sum, and I don't want Sergey block progress > on the issue. Make a patch and add it to the tracker. A word of warning/advice: keep the __add__ fallback or it won't fly. __iadd__ is /optional/. If the new sum() suddenly stops working on classes it worked fine with before, it will not be accepted. Mind you, I haven't checked if PyNumber_InPlaceAdd will fall back to PyNumber_Add on its own. -- ~Ethan~ From joshua at landau.ws Wed Jul 10 23:27:49 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 22:27:49 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DDCC38.2030603@stoneleaf.us> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> <51DDCC38.2030603@stoneleaf.us> Message-ID: On 10 July 2013 22:03, Ethan Furman wrote: > Make a patch and add it to the tracker. Gah! Now I need to learn C... :P > A word of warning/advice: keep the __add__ fallback or it won't fly. > __iadd__ is /optional/. If the new sum() suddenly stops working on classes > it worked fine with before, it will not be accepted. Mind you, I haven't > checked if PyNumber_InPlaceAdd will fall back to PyNumber_Add on its own. It does. From joshua at landau.ws Wed Jul 10 23:36:16 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 22:36:16 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DDCC38.2030603@stoneleaf.us> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> <51DDCC38.2030603@stoneleaf.us> Message-ID: On 10 July 2013 22:03, Ethan Furman wrote: > On 07/10/2013 01:45 PM, Joshua Landau wrote: >> >> On 10 July 2013 21:14, Andrew Barnert wrote: >>> >>> [Sergey] has two versions. >>> >>> The first does a + once and then a += repeatedly on the result. This >>> solves the problem neatly (except with empty iterables, but that's trivial >>> to fix, and I think his C code actually doesn't have that problem...). >>> There's no overhead, it automatically falls back to __add__ if __iadd__ is >>> missing, and the only possible semantic differences are for types that are >>> already broken. >>> >>> The second makes a list of the argument (which means copying it if it's >>> already a list), then calls extend repeatedly on the result, then converts >>> back. This doesn't solve the problem in many cases, does the wrong thing in >>> many others, and always adds overhead. >>> >>> And that's exactly why I think it's worth splitting into separate pieces. >>> It's very easy for people to see problems with the second version and >>> wrongly assume they also apply to the first (and the way he presents and >>> argues for his ideas doesn't help). >>> >>> As far as I know, nobody has yet found any problem with the first >>> version, except for the fact that it would encourage people to use sum on >>> lists. I don't think that's a serious problem--the docs already say not to >>> do it--and if it's a useful optimization for any number-like types, I think >>> it's worth having. >>> >>> It's the second version, together with all of the attempts to make it >>> fully generally for any concatenable type--or, alternatively, to argue that >>> only builtin concatenable types matter--that I have a problem with. >> >> >> If Sergey doesn't do this separation, would it be fine if I did it? I >> like the idea for __iadd__ sum, and I don't want Sergey block progress >> on the issue. > > > Make a patch and add it to the tracker. Actually, there is already a bug on the tracker at http://bugs.python.org/issue18305 and the response was "discuss it on Python-Ideas". Hence, I want to discuss it on Python ideas. So should I spawn it off onto a seperate thread about *just* the __iadd__ enhancement? From ethan at stoneleaf.us Wed Jul 10 23:53:53 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Jul 2013 14:53:53 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> <51DDCC38.2030603@stoneleaf.us> Message-ID: <51DDD7F1.9040708@stoneleaf.us> On 07/10/2013 02:27 PM, Joshua Landau wrote: > On 10 July 2013 22:03, Ethan Furman wrote: >> Make a patch and add it to the tracker. > > Gah! Now I need to learn C... :P Heh. I'm in a similar boat trying to update the json module. :/ -- ~Ethan~ From joshua at landau.ws Thu Jul 11 00:02:05 2013 From: joshua at landau.ws (Joshua Landau) Date: Wed, 10 Jul 2013 23:02:05 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DDD79A.80401@stoneleaf.us> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> <51DDCC38.2030603@stoneleaf.us> <51DDD79A.80401@stoneleaf.us> Message-ID: On 10 July 2013 22:52, Ethan Furman wrote: > On 07/10/2013 02:36 PM, Joshua Landau wrote: >> >> On 10 July 2013 22:03, Ethan Furman wrote: >>> >>> On 07/10/2013 01:45 PM, Joshua Landau wrote: >>>> If Sergey doesn't do this separation, would it be fine if I did it? I >>>> like the idea for __iadd__ sum, and I don't want Sergey block progress >>>> on the issue. >>> >>> >>> >>> Make a patch and add it to the tracker. >> >> >> Actually, there is already a bug on the tracker at >> http://bugs.python.org/issue18305 and the response was "discuss it on >> Python-Ideas". >> >> Hence, I want to discuss it on Python ideas. So should I spawn it off >> onto a seperate thread about *just* the __iadd__ enhancement? > > > A separate thread on Python Ideas is probably appropriate, but you can add > your __iadd__ only patch to that issue. I would think it would have a > better chance of acceptance since it would be a smaller change. A cursory glance yields that that is the original patch. From sergemp at mail.ru Wed Jul 10 23:58:42 2013 From: sergemp at mail.ru (Sergey) Date: Thu, 11 Jul 2013 00:58:42 +0300 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> Message-ID: <20130711005842.13ea7ec1@sergey> On Jul 9, 2013 Ron Adam wrote: >> Seriously, why there's so much holy wars about that? I'm not asking >> to rewrite cpython on Java or C#. I'm not adding a bunch of new >> functions, I'm not even changing signatures of existing functions. > > It's the nature of this particular news group. We focus on improving > python, and that includes new things and improving old things, but also > includes discussing any existing or potential problems. > > You will almost always get a mix of approval and disapproval on just about > every thing here. It's not a war, it's just different people having > different opinions. > > Quite often that leads to finding better ways to do things, and in the long > run, helps avoid adding features and changes that could be counter > productive to python. I must agree that I was indeed inspired with some new ideas during this discussion. It's just that those "inspirations" come in a very non-constructive form of "it makes no sense", "cannot always be fast", "you can't", "everyone else thinks you shouldn't", etc. Or is that a lifehack [1] in action? I.e. "You can't make it fast for that type. Oh, you can? Then you can't make it fast for that type. Oh, you did that too? But you can't make it fast for all the types!" What if I can? ;) It's just instead of discussing what is the best way to fix a slowness, I'm spending most time trying to convince people that slowness should be fixed. ? sum is slow for lists, let's fix that! ? you shouldn't use sum... ? why can't I use sum? ? because it's slow ? then let's fix that! ? you shouldn't use sum... I haven't thought that somebody can truly believe that something should be slow, and will find one excuse after another instead of just fixing the slowness. > If it only makes an existing function faster and doesn't change any other > behaviour, and all the tests still pass for it. Just create an issue on > the tracker, with the patch posted there, and it will probably be accepted > after it goes through a much more focused review process. I've done that first [2] And there I was asked to write here. :) > But discussing it here will invite a lot of opinions about how it works, > how it shouldn't work, what would work better, and etc... And about what *I* shouldn't do, what *I* can't and what *I* need. As if I'm the bug that should be fixed. :( > It's what this board if for. ;-) -- [1] http://bash.org/?152037 [2] http://bugs.python.org/issue18305 From abarnert at yahoo.com Thu Jul 11 00:11:28 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Jul 2013 15:11:28 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> <51DDCC38.2030603@stoneleaf.us> <51DDD79A.80401@stoneleaf.us> Message-ID: <27202BA8-AA6E-4435-8552-E455A63F36FB@yahoo.com> On Jul 10, 2013, at 15:02, Joshua Landau wrote: > On 10 July 2013 22:52, Ethan Furman wrote: >> On 07/10/2013 02:36 PM, Joshua Landau wrote: >>> >>> On 10 July 2013 22:03, Ethan Furman wrote: >>>> >>>> On 07/10/2013 01:45 PM, Joshua Landau wrote: >>>>> If Sergey doesn't do this separation, would it be fine if I did it? I >>>>> like the idea for __iadd__ sum, and I don't want Sergey block progress >>>>> on the issue. >>>> >>>> >>>> >>>> Make a patch and add it to the tracker. >>> >>> >>> Actually, there is already a bug on the tracker at >>> http://bugs.python.org/issue18305 and the response was "discuss it on >>> Python-Ideas". >>> >>> Hence, I want to discuss it on Python ideas. So should I spawn it off >>> onto a seperate thread about *just* the __iadd__ enhancement? >> >> >> A separate thread on Python Ideas is probably appropriate, but you can add >> your __iadd__ only patch to that issue. I would think it would have a >> better chance of acceptance since it would be a smaller change. > > A cursory glance yields that that is the original patch. Exactly. I believe Sergey's first patch already gets the __iadd__ thing exactly right. Of course it's worth reviewing the patch, and testing it, and writing a pure Python version that other implementations can use, and discussing whether there should be any doc changes, and finding cases that are clearly summing number-like things that benefit (seriously, why has nobody who's +1 on this done the simple test with numpy.matrix yet?) so nobody can complain that it's useless, ... But you don't need to write C to do any of that. From ethan at stoneleaf.us Wed Jul 10 23:52:26 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Jul 2013 14:52:26 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD90E5.1020505@stoneleaf.us> <51DD97E6.2070105@pearwood.info> <7EB44A1F-AB87-482E-A5A7-81F82F822C28@yahoo.com> <51DDCC38.2030603@stoneleaf.us> Message-ID: <51DDD79A.80401@stoneleaf.us> On 07/10/2013 02:36 PM, Joshua Landau wrote: > On 10 July 2013 22:03, Ethan Furman wrote: >> On 07/10/2013 01:45 PM, Joshua Landau wrote: >>> >>> On 10 July 2013 21:14, Andrew Barnert wrote: >>>> >>>> [Sergey] has two versions. >>>> >>>> The first does a + once and then a += repeatedly on the result. This >>>> solves the problem neatly (except with empty iterables, but that's trivial >>>> to fix, and I think his C code actually doesn't have that problem...). >>>> There's no overhead, it automatically falls back to __add__ if __iadd__ is >>>> missing, and the only possible semantic differences are for types that are >>>> already broken. >>>> >>>> The second makes a list of the argument (which means copying it if it's >>>> already a list), then calls extend repeatedly on the result, then converts >>>> back. This doesn't solve the problem in many cases, does the wrong thing in >>>> many others, and always adds overhead. >>>> >>>> And that's exactly why I think it's worth splitting into separate pieces. >>>> It's very easy for people to see problems with the second version and >>>> wrongly assume they also apply to the first (and the way he presents and >>>> argues for his ideas doesn't help). >>>> >>>> As far as I know, nobody has yet found any problem with the first >>>> version, except for the fact that it would encourage people to use sum on >>>> lists. I don't think that's a serious problem--the docs already say not to >>>> do it--and if it's a useful optimization for any number-like types, I think >>>> it's worth having. >>>> >>>> It's the second version, together with all of the attempts to make it >>>> fully generally for any concatenable type--or, alternatively, to argue that >>>> only builtin concatenable types matter--that I have a problem with. >>> >>> >>> If Sergey doesn't do this separation, would it be fine if I did it? I >>> like the idea for __iadd__ sum, and I don't want Sergey block progress >>> on the issue. >> >> >> Make a patch and add it to the tracker. > > Actually, there is already a bug on the tracker at > http://bugs.python.org/issue18305 and the response was "discuss it on > Python-Ideas". > > Hence, I want to discuss it on Python ideas. So should I spawn it off > onto a seperate thread about *just* the __iadd__ enhancement? A separate thread on Python Ideas is probably appropriate, but you can add your __iadd__ only patch to that issue. I would think it would have a better chance of acceptance since it would be a smaller change. -- ~Ethan~ From abarnert at yahoo.com Thu Jul 11 00:35:10 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Jul 2013 15:35:10 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <20130711005842.13ea7ec1@sergey> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> Message-ID: <3ACB64FF-7A58-4AD2-AD9F-3BCB0AE5707C@yahoo.com> On Jul 10, 2013, at 14:58, Sergey wrote: > On Jul 9, 2013 Ron Adam wrote: > > Or is that a lifehack [1] in action? I.e. "You can't make it fast for > that type. Oh, you can? Then you can't make it fast for that type. > Oh, you did that too? But you can't make it fast for all the types!" > What if I can? ;) But you haven't found a workable solution for all of the other types people have brought up--or even for a single one of them. So that's a serious misrepresentation. > It's just instead of discussing what is the best way to fix a slowness, > I'm spending most time trying to convince people that slowness should > be fixed. > ? sum is slow for lists, let's fix that! > ? you shouldn't use sum... > ? why can't I use sum? > ? because it's slow But that's not why you shouldn't use sum. So, you're trying to answer people who (a) are wrong, and (b) aren't on this list instead of answering the people who are actually here. Besides, being fast for list and tuple but slow for other collection types would be an attractive nuisance. Your only response to that has been to claim that it can be fast for every possible collection type, but it can't. You haven't gotten it to work. And there are good reasons to believe it's not just hard, but impossible. So, if that's true, where does it leave you? You can't argue that sum is the obvious way to concatenate collections. You either have to say that it's the obvious way to concatenate only builtin collections, and something else should be used for everything else, or you have to argue that the benefits to novices who do the wrong thing with tuples outweighs the costs. Or, of course, accept that it's not a good idea. > ? then let's fix that! > ? you shouldn't use sum... > I haven't thought that somebody can truly believe that something should > be slow, and will find one excuse after another instead of just fixing > the slowness. Calling recv(1) over and over on a socket is slow. We could fix that by adding an implicit buffer to all socket objects. Can you believe that someone might object to that patch? >> If it only makes an existing function faster and doesn't change any other >> behaviour, and all the tests still pass for it. Just create an issue on >> the tracker, with the patch posted there, and it will probably be accepted >> after it goes through a much more focused review process. > > I've done that first [2] And there I was asked to write here. :) > >> But discussing it here will invite a lot of opinions about how it works, >> how it shouldn't work, what would work better, and etc... > > And about what *I* shouldn't do, what *I* can't and what *I* need. > As if I'm the bug that should be fixed. :( > >> It's what this board if for. ;-) > > -- > [1] http://bash.org/?152037 > [2] http://bugs.python.org/issue18305 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From sergemp at mail.ru Thu Jul 11 00:47:38 2013 From: sergemp at mail.ru (Sergey) Date: Thu, 11 Jul 2013 01:47:38 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB6BF6.9030608@pearwood.info> <20130709165449.6b124367@sergey> Message-ID: <20130711014738.7609f82a@sergey> On Jul 9, 2013 Ronald Oussoren wrote: >> For example, rewrite tuple to internally store its values in a >> list, and have `localcount` variable saying how many items from >> that list belong to this tuple. Then __add__ could extend that >> list and reuse it for new tuple. > > That's not going to happen, not only breaks that backward compatibility > for users for the C API, It depends on implementation details, it's possible to keep it backward compatible. BTW, what C API do you expect to break? > it has nasty side effects and is incorrect. > Nasty side effect: > a = (1,) > b = (2,) * 1000 > c = a + b > del b > del c > With the internal list 'a' keeps alive the extra storage used for 'c'. Yes, technically it's possible to implement tuple so that it would realloc internal list to save some ram, but why? List does not do that when you remove elements from it, why should tuple do that? On the other hand: a = (1,) * 1000 b = a + (2,3) c = b + (4,5) And you have 3 variables for the price of one. Lot's of memory saved! > Incorrect: > a = (1,) > b = a + (2,) > c = a + (3,) > Now 'b' and 'c' can't possibly both share storage with 'a'. Nothing incorrect here, of course __add__ should handle that, and if it cannot reuse list it would copy it. As it does now. Such tuple would never allocate more RAM than before, often it should use either same RAM or less. In some rare cases it may not free some RAM, that it could free before. But I'm not sure they worth the effort fixing. PS: I don't think anybody wants to see that implementation anyway, I guess it was just an attempt to bug me with "you can't". ?? From anntzer.lee at gmail.com Thu Jul 11 00:47:51 2013 From: anntzer.lee at gmail.com (Antony Lee) Date: Wed, 10 Jul 2013 15:47:51 -0700 (PDT) Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body In-Reply-To: References: <51DB5573.5070004@stoneleaf.us> Message-ID: <6fc3f1ea-e643-40c4-aa2c-6e0d42bd7b6e@googlegroups.com> Forward references are now implemented (https://github.com/anntzer/enum). They require an explicit declaration, ? la class C(Enum, declaration=...): B = ... A = {1: B} B = {1: A} I had implemented a version where the initial declaration wasn't needed, but as mentioned in previous enum-related threads this can create many problems. For example, consider class C(Enum): A = {1: B}; B = {1: A} @property def also_value(self): return self.value how is Python supposed to know that when it tries to resolve "B" in the class dict, it must create a new member, but when it tries to resolve "property" in the class dict and doesn't find it, it must look in the enclosing scope? You can decide that a name lookup creates a new member if the name isn't defined in the enclosing scope either (I implemented this using sys._getframe in a previous commit of my fork) but this leads to other (somewhat contrieved) problems: x = 1 class C(Enum): y = x # <- what is this supposed to mean? x = 2 Note that even AST macros don't (fully) this issue because you can't really even know the list of all names that are defined in the class body: x = 1 def inject(**kwargs): for k, v in kwargs.items(): sys._getframe(1).f_locals[k] = v # interestingly using dict.update does not trigger the use-defined __setitem__ class C(Enum): y=x # <- ??? inject_value(x=2) Antony On Monday, July 8, 2013 6:03:27 PM UTC-7, Haoyi Li wrote: > > > then the other methods can either dereference the name with an > __getitem__ look-up, or the class can be post-processed with a decorator to > change the strings back to actual members... hmmm, maybe a post_process > hook in the metaclass would make sense? > > Having real strings be part of the enums data members is a pretty common > thing, and working through and trying to identify the linkage-strings from > normal-strings seems very magical to me. Is there some metaclass-magic way > to intercept the usage of A, to instead put the enum instance there? > > Also, for this to be useful for your described use case, (state machines > yay!) you'd probably want to be able to define back/circular references, > which i think isn't currently possible. The obvious thing to do would be to > somehow make the RHS of the assignments lazy, which would allow > out-of-order and circular assignments with a very nice, unambigious: > > class StateMachine(Enum): > "Useless ping-pong state machine" > A = {1: B} > B = {1: A} > > But short of using macros to do an AST transform, I don't know if such a > thing is possible at all. > > -Haoyi > > > On Tue, Jul 9, 2013 at 8:12 AM, Ethan Furman > > wrote: > >> On 07/08/2013 02:27 PM, Antony Lee wrote: >> >>> Currently, during the execution of the body of the Enum declaration, >>> member names are bound to the values, not to the >>> Enum members themselves. For example >>> >>> class StateMachine(Enum): >>> A = {} >>> B = {1: A} # e.g. a transition table >>> >>> StateMachine.B[1] == {}, when one could have expected StateMachine.B[1] >>> == StateMachine.A >>> >>> It seems to me that a behavior where member names are bound to the >>> members instead of being bound to the values is more >>> useful, as one can easily retrieve the values from the members but not >>> the other way round (at least during the >>> execution of class body). >>> >>> Initially, I thought that this could be changed by modifying _EnumDict, >>> so that its __setitem__ method sets the member >>> in the dict, instead of the value, but in fact this doesn't work because >>> while the values are being set in the _EnumDict >>> the class itself doesn't exist yet (and for good reason: the __init__ >>> and __new__ methods may be defined later but there >>> is no way to know that). However, a possible solution could to >>> momentarily create Enum members as instances of some >>> dummy class, and then later, after execution of class body has >>> completed, change the members' class to the actual Enum >>> and initialize them as needed (if an __init__ or a __new__ are actually >>> defined). Well, there are limitations with this >>> approach (e.g. the members are not fully initialized before class body >>> finishes to execute) but this seems better than >>> the current behavior(?) >>> >> >> Part of the problem here would be maintaining the linkage when the temp >> enum object from _EnumDict was translated into an actual Enum member. >> >> One possible work around is to store the name of the member instead: >> >> class StateMachine(Enum): >> A = {} >> B = {1:'A'} >> >> then the other methods can either dereference the name with an >> __getitem__ look-up, or the class can be post-processed with a decorator to >> change the strings back to actual members... hmmm, maybe a post_process >> hook in the metaclass would make sense? >> >> -- >> ~Ethan~ >> ______________________________**_________________ >> Python-ideas mailing list >> Python... at python.org >> http://mail.python.org/**mailman/listinfo/python-ideas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jul 11 03:03:40 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Jul 2013 11:03:40 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> Message-ID: <51DE046C.4070301@pearwood.info> On 11/07/13 07:00, Ron Adam wrote: > > > On 07/10/2013 12:49 PM, Steven D'Aprano wrote: >> On 11/07/13 02:10, Sergey wrote: >>> On Jul 9, 2013 Steven D'Aprano wrote: >>> >>>> The fact that sum(lists) has had quadratic performance since sum >>>> was first introduced in Python 2.3, and I've *never* seen anyone >>>> complain about it being slow, suggests very strongly that this is not >>>> a use-case that matters. >>> >>> Never seen? Are you sure? ;) >>>> http://article.gmane.org/gmane.comp.python.general/658630 >>>> From: Steven D'Aprano @ 2010-03-29 >>>> In practical terms, does anyone actually ever use sum on more than a >>>> handful of lists? I don't believe this is more than a hypothetical >>>> problem. >> >> Yes, and I stand by what I wrote back then. > > > Just curious, how does your sum compare with fsum() in the math module? math.fsum is a high-precision floating point sum, keeping extra precision that the built-in loses. Compare these: data = [1e-100, 1e100, 1e-100, -1e100]*1000 sum(data) math.fsum(data) The exact value for the sum is 2e-97. -- Steven From ethan at stoneleaf.us Thu Jul 11 03:52:24 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Jul 2013 18:52:24 -0700 Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body In-Reply-To: <6fc3f1ea-e643-40c4-aa2c-6e0d42bd7b6e@googlegroups.com> References: <51DB5573.5070004@stoneleaf.us> <6fc3f1ea-e643-40c4-aa2c-6e0d42bd7b6e@googlegroups.com> Message-ID: <51DE0FD8.4050301@stoneleaf.us> On 07/10/2013 03:47 PM, Antony Lee wrote: > > Forward references are now implemented (https://github.com/anntzer/enum). They require an explicit declaration, ? la Do they work with a custom __new__ ? __init__ ? -- ~Ethan~ From steve at pearwood.info Thu Jul 11 04:36:59 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Jul 2013 12:36:59 +1000 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> Message-ID: <51DE1A4B.4070108@pearwood.info> On 11/07/13 04:21, Joshua Landau wrote: > On 10 July 2013 17:29, Steven D'Aprano wrote: >> On 10/07/13 16:09, Joshua Landau wrote: >>> >>> On 9 July 2013 17:13, Steven D'Aprano wrote: >> >> [...] >> >>>> Nevertheless, you are right, in Python 3 both + and sum of lists is >>>> well-defined. At the moment sum is defined in terms of __add__. You want >>>> to >>>> change it to be defined in terms of __iadd__. That is a semantic change >>>> that >>>> needs to be considered carefully, it is not just an optimization. >>> >>> >>> I agree it's not totally backward-compatible, but AFAICT that's only >>> for broken code. __iadd__ should always just be a faster, in-place >>> __add__ and so this change should never cause problems in properly >>> written code. >> >> >> "Always"? Immutable objects cannot define __iadd__ as an in-place __add__. >> >> In any case, sum() currently does not modify the start argument in-place. > > Now you're just (badly) playing semantics. If I say that gills are > always like lungs except they work underwater, would you contradict me > by stating that mammals don't have gills? I don't actually understand your objection. You made a general statement that __iadd__ should ALWAYS be an in-place add, and I pointed out that this cannot be the case for immutable classes. What is your objection? Surely you're not suggesting that immutable classes can define in-place __iadd__? That's not a rhetorical question, I do not understand the point you are trying to make. Perhaps you should be explicit, rather than argue by analogy. [Aside: it's a poor analogy. Gills are not like lungs, they differ greatly in many ways, e.g. fluid flow is unidirectional in gills, bidirectional in lungs, the interface to the blood system is counter-current in gills, with an efficiency of about 80%, versus concurrent in lungs, with an efficiency around 25%. There are other significant differences too, and lungs evolved independently of gills, which is why lungfish have both.] >>> That makes it anything but a semantic change. >> >> __iadd__ is optional for classes that support addition. Failure to define an >> __iadd__ method does not make your class broken. >> >> Making __iadd__ mandatory to support sum would be a semantic change, since >> there will be objects (apart from strs and bytes, which are special-cased) >> that support addition with + but will no longer be summable since they don't >> define __iadd__. > > Why are you saying these things? I never suggested anything like that. You want to change sum from using __add__ to __iadd__. That means that there are two possibilities: for a class to be summable, either __iadd__ is mandatory, or it is optional with a fallback to __add__. I considered both possibilities, and they both result in changes to the behaviour of sum, that is, a semantic change. If __iadd__ becomes mandatory, then some currently summable classes will become non-summable. If __iadd__ becomes optional, but preferred over __add__, then some currently summable classes will change their behaviour (although you call those classes "broken"). In either case, this is a semantic change to sum, which is what you explicitly denied. I think that it is a reasonable position to take that we should not care about "broken" classes that define __iadd__ differently to __add__. I'm not sure that I agree, but regardless, it is a reasonable position. But arguing that the proposed change from __add__ to __iadd__ is not a semantic change to sum is simply unreasonable. >> Even making __iadd__ optional will potentially break working code. Python >> doesn't *require* that __iadd__ perform the same operation as __add__. That >> is the normal expectation, of course, but it's not enforced. (How could it >> be?) We might agree that objects where __add__ and __iadd__ do different >> things are "broken" in some sense, but you're allowed to write broken code, >> and Python should (in principle) avoid making it even more broken by >> changing behaviour unnecessarily. But maybe the right answer there is simply >> "don't call sum if you don't want __iadd__ called". > > Python has previously had precedents where broken code does not get to > dictate the language as long as that code was very rare. This is more > than very rare. Additionally, Python does (unclearly, but it does do > so) define __iadd__ to be an inplace version of __add__, so the code > isn't just ?broken? -- it's broken. Not so. The docs for __iadd__ and other augmented assignment operators state: "These methods are called to implement the augmented arithmetic assignments (+=, -=, *=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=). These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self)." So, according to the docs, "x += y" might modify x in place and return a different instance, or even a completely different value. It is normal, and expected, for "x += y" to be the same as "x = x + y", but not compulsory. Python will fall back on the usual __add__ if __iadd__ is not defined, but (say) if you define your own DSL where += has some distinct meaning, you are free to define it to do something completely different. You consider it broken if a class defines += differently to +. I consider it unusual, but permitted. I believe the docs support my interpretation. http://docs.python.org/release/3.1/reference/datamodel.html#object.__iadd__ E.g. I have a DSL where = reassigns to a data structure, += appends to an existing one, and + is not defined at all. You can say "x += value" but not "x = x + value". It makes sense in context. As I said, I am prepared to consider that the right answer to this is "well don't call sum on your data structure then", but it is a change in behaviour, not just an optimization. -- Steven From joshua at landau.ws Thu Jul 11 05:05:51 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 11 Jul 2013 04:05:51 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DE1A4B.4070108@pearwood.info> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> Message-ID: On 11 July 2013 03:36, Steven D'Aprano wrote: > On 11/07/13 04:21, Joshua Landau wrote: >> >> On 10 July 2013 17:29, Steven D'Aprano wrote: >>> >>> On 10/07/13 16:09, Joshua Landau wrote: >>>> >>>> >>>> On 9 July 2013 17:13, Steven D'Aprano wrote: >>> >>> >>> [...] >>> >>>>> Nevertheless, you are right, in Python 3 both + and sum of lists is >>>>> well-defined. At the moment sum is defined in terms of __add__. You >>>>> want >>>>> to >>>>> change it to be defined in terms of __iadd__. That is a semantic change >>>>> that >>>>> needs to be considered carefully, it is not just an optimization. >>>> >>>> >>>> >>>> I agree it's not totally backward-compatible, but AFAICT that's only >>>> for broken code. __iadd__ should always just be a faster, in-place >>>> __add__ and so this change should never cause problems in properly >>>> written code. >>> >>> >>> >>> "Always"? Immutable objects cannot define __iadd__ as an in-place >>> __add__. >>> >>> In any case, sum() currently does not modify the start argument in-place. >> >> >> Now you're just (badly) playing semantics. If I say that gills are >> always like lungs except they work underwater, would you contradict me >> by stating that mammals don't have gills? > > > I don't actually understand your objection. You made a general statement > that __iadd__ should ALWAYS be an in-place add, Which it should. > and I pointed out that this > cannot be the case for immutable classes. What is your objection? Surely > you're not suggesting that immutable classes can define in-place __iadd__? Of course not. > That's not a rhetorical question, I do not understand the point you are > trying to make. Perhaps you should be explicit, rather than argue by > analogy. Rather, I shall explain by analogy :P. I said that __iadd__ should always be a faster __add__. This is saying that all have property , aka. the analogy of stating that all gills are like lungs (except [that] they work underwater). You objected by saying there are some things that cannot implement __iadd__. This has *no correlation* to the previous statement - that was about the properties that all __iadd__ have. That is why I made an analogy of you contradicting me by saying that mammals don't have gills. So basically, I made no claim in any way about what objects have __iadd__, but about what __iadd_ does (which is of course for only those circumstances where it applies -- I know that's a tautology but this whole sub-discussion seems to be one). > [Aside: it's a poor analogy. Gills are not like lungs, they differ greatly > in many ways, This isn't really relevant, but alas they *are* like lungs. Sure, it's an imperfect relation, but that's why I said "like lungs" and not "are lungs". > e.g. fluid flow is unidirectional in gills, bidirectional in > lungs, the interface to the blood system is counter-current in gills, with > an efficiency of about 80%, versus concurrent in lungs, with an efficiency > around 25%. There are other significant differences too, and lungs evolved > independently of gills, which is why lungfish have both.] I honestly didn't know that. Interesting. >>>> That makes it anything but a semantic change. >>> >>> >>> __iadd__ is optional for classes that support addition. Failure to define >>> an >>> __iadd__ method does not make your class broken. >>> >>> Making __iadd__ mandatory to support sum would be a semantic change, >>> since >>> there will be objects (apart from strs and bytes, which are >>> special-cased) >>> that support addition with + but will no longer be summable since they >>> don't >>> define __iadd__. >> >> >> Why are you saying these things? I never suggested anything like that. > > > You want to change sum from using __add__ to __iadd__. That means that there > are two possibilities: for a class to be summable, either __iadd__ is > mandatory, or it is optional with a fallback to __add__. I considered both > possibilities, and they both result in changes to the behaviour of sum, that > is, a semantic change. > > If __iadd__ becomes mandatory, then some currently summable classes will > become non-summable. I don't believe that was ever suggested; there is a good reason "+=" falls back on "+" by default. > If __iadd__ becomes optional, but preferred over __add__, then some > currently summable classes will change their behaviour (although you call > those classes "broken"). That is what I was doing - calling them broken. > In either case, this is a semantic change to sum, which is what you > explicitly denied. I'm not sure not supporting broken code counts as a semantic change. That is what I was debating. > I think that it is a reasonable position to take that we should not care > about "broken" classes that define __iadd__ differently to __add__. I'm not > sure that I agree, but regardless, it is a reasonable position. But arguing > that the proposed change from __add__ to __iadd__ is not a semantic change > to sum is simply unreasonable. But that is what I am doing :P. If a spec is undefined, you don't require results to be consistent. This is what would happen. That changes nothing, as far as I am concerned -- and hence is not a semantic change. >>> Even making __iadd__ optional will potentially break working code. Python >>> doesn't *require* that __iadd__ perform the same operation as __add__. >>> That >>> is the normal expectation, of course, but it's not enforced. (How could >>> it >>> be?) We might agree that objects where __add__ and __iadd__ do different >>> things are "broken" in some sense, but you're allowed to write broken >>> code, >>> and Python should (in principle) avoid making it even more broken by >>> changing behaviour unnecessarily. But maybe the right answer there is >>> simply >>> "don't call sum if you don't want __iadd__ called". >> >> >> Python has previously had precedents where broken code does not get to >> dictate the language as long as that code was very rare. This is more >> than very rare. Additionally, Python does (unclearly, but it does do >> so) define __iadd__ to be an inplace version of __add__, so the code >> isn't just ?broken? -- it's broken. > > > Not so. The docs for __iadd__ and other augmented assignment operators > state: > > "These methods are called to implement the augmented arithmetic assignments > (+=, -=, *=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=). These methods should > attempt to do the operation in-place (modifying self) and return the result > (which could be, but does not have to be, self)." > > So, according to the docs, "x += y" might modify x in place and return a > different instance, or even a completely different value. It is normal, and > expected, for "x += y" to be the same as "x = x + y", but not compulsory. > Python will fall back on the usual __add__ if __iadd__ is not defined, but > (say) if you define your own DSL where += has some distinct meaning, you are > free to define it to do something completely different. I read it differently. I am not sure why one would do anything other that return self, but I also read "to implement the augmented arithmetic assignments" and "should attempt to do the operation in-place (modifying self) and return the result". The final qualifier only applies to circumstances, as far as I can glean, that are still *attempted in-place additions*. Good examples could be when in-place attempts fail and it would be faster to do __add__ right there and then. Other good examples are taking a while to come, but this is quite a niche area. > You consider it broken if a class defines += differently to +. I consider it > unusual, but permitted. I believe the docs support my interpretation. > > http://docs.python.org/release/3.1/reference/datamodel.html#object.__iadd__ > > E.g. I have a DSL where = reassigns to a data structure, += appends to an > existing one, and + is not defined at all. You can say "x += value" but not > "x = x + value". It makes sense in context. As I said, I am prepared to > consider that the right answer to this is "well don't call sum on your data > structure then", but it is a change in behaviour, not just an optimization. That is... really quite a good argument. I think I may have to think on that final point, but you've probably just about won it. Why didn't you just say this from the start? From joshua at landau.ws Thu Jul 11 05:29:54 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 11 Jul 2013 04:29:54 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> Message-ID: On 11 July 2013 04:05, Joshua Landau wrote: > On 11 July 2013 03:36, Steven D'Aprano wrote: >> E.g. I have a DSL where = reassigns to a data structure, += appends to an >> existing one, and + is not defined at all. You can say "x += value" but not >> "x = x + value". It makes sense in context. As I said, I am prepared to >> consider that the right answer to this is "well don't call sum on your data >> structure then", but it is a change in behaviour, not just an optimization. > > That is... really quite a good argument. I think I may have to think > on that final point, but you've probably just about won it. Why didn't > you just say this from the start? This was so close to winning me over, but think about this: 1) __iadd__ is always, for working classes, an inplace __add__. The fact that some classes miss out one, the other or both is irrelevant to this. 2) The new sum requires __add__ for step 1. Hence you *couldn't* sum the DSLs. I think you would have to write a class which cheats for the first step, but I'm not sure that that is any better than broken code. I can at least assure you there are exactly 0 instances of this in the wild as of now. 3) Things that have __add__ but not __iadd__ experience no behaviour change, so the inverse of the DSLs don't change anything either. From stephen at xemacs.org Thu Jul 11 07:27:05 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 11 Jul 2013 14:27:05 +0900 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> Message-ID: <87txk1u0nq.fsf@uwakimon.sk.tsukuba.ac.jp> Joshua Landau writes: > > [Aside: it's a poor analogy. Gills are not like lungs, they > > differ greatly in many ways, > > This isn't really relevant, but alas they *are* like lungs. Sure, > it's an imperfect relation, but that's why I said "like lungs" and > not "are lungs". No, it is relevant. You provide a very high-level specification ("extract oxygen from ambient fluid"), and then make a slightly lower- level distinction ("ambient fluid is air vs. water"). But as far as I can see, many of the issues that make "sum(iterable of sequences)" an unattractive API are far lower-level ("counter-current vs. concurrent"). ISTM that the choice of *not* constraining the definitions of _i to be efficient, in-place versions of the corresponding _ was deliberate. So you need to take into account differences that are potential and ill-defined -- and when Steven provides a real use case at this level, you start preparing to concede. > > If __iadd__ becomes optional, but preferred over __add__, then > > some currently summable classes will change their behaviour > > (although you call those classes "broken"). > > That is what I was doing - calling them broken. > > > In either case, this is a semantic change to sum, which is what you > > explicitly denied. > > I'm not sure not supporting broken code counts as a semantic change. > That is what I was debating. It does count; it's a language change. It is not a bug-fix in which the implementation is brought into line with the language definition. CPython has historically taken the position if the language definition is ambiguous, a change in CPython behavior requires that *the language definition be changed* to clarify that the changed behavior is the mandated behavior. Note also that CPython is intended to be a reference implementation. Therefore existing behavior has a special significance unless explicitly specified to be implementation- dependent. Not only applications, but other implementations, may depend on existing behavior. > But that is what I am doing :P. If a spec is undefined, you don't > require results to be consistent. No, that may be a bug in the spec: the spec is incomplete, but it does include requiring results to be consistent. That's why reference implementations exist, and why "modern" specs explicitly state that behavior is undefined, or that a certain construct "is an error [even if the implementation doesn't signal it]". > This is what would happen. That changes nothing, as far as I am > concerned -- and hence is not a semantic change. Well, I see that you do know what you're doing. My opinion (which is not authoritative) is that you are using a different definition of "semantic change" from the one used by Python (!= CPython). > >> Python has previously had precedents where broken code does not > >> get to dictate the language as long as that code was very rare. I suspect Guido would not call code "broken" unless it depended on an actual bug in the implementation, or there was another way to do it that is TOOWTDI. Here there is another way to do it that is TOOWTDI *for the case you want to support* (not the code you consider broken). So this is a losing analogy for you. > > E.g. I have a DSL where = reassigns to a data structure, += > > appends to an existing one, and + is not defined at all. > > That is... really quite a good argument. I think I may have to think > on that final point, but you've probably just about won it. Why didn't > you just say this from the start? Because he thought his other arguments were even better. (another) Steve From abarnert at yahoo.com Thu Jul 11 07:44:25 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 10 Jul 2013 22:44:25 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> Message-ID: <224D736E-D044-468D-88ED-BDCC1CB18E8D@yahoo.com> On Jul 10, 2013, at 20:05, Joshua Landau wrote: >> E.g. I have a DSL where = reassigns to a data structure, += appends to an >> existing one, and + is not defined at all. You can say "x += value" but not >> "x = x + value". It makes sense in context. As I said, I am prepared to >> consider that the right answer to this is "well don't call sum on your data >> structure then", but it is a change in behaviour, not just an optimization. > > That is... really quite a good argument. I think I may have to think > on that final point, but you've probably just about won it. Why didn't > you just say this from the start? SymPy and another expression template library were already brought up earlier in the discussion. But I suspect most Python programmers have no experience with this kind of programming and/or have never heard of the libraries in question, and therefore had no idea what the point was until now. SymPy does actually work with the add-once-then-iadd-to-that patch. I just tested summing together a list of symbolic expressions, and the result was the correct expression. Which makes sense, because it doesn't use __iadd__ anywhere (and in fact "x += y" rebinds x to the expression "x+y"). Anyway, thanks to Steven for explaining the point, and not using an example that fails to show what's intended. :) From ronaldoussoren at mac.com Thu Jul 11 07:53:21 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 11 Jul 2013 07:53:21 +0200 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <87txk1u0nq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> <87txk1u0nq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 Jul, 2013, at 7:27, Stephen J. Turnbull wrote: >> >> >>> In either case, this is a semantic change to sum, which is what you >>> explicitly denied. >> >> I'm not sure not supporting broken code counts as a semantic change. >> That is what I was debating. > > It does count; it's a language change. It is not a bug-fix in which > the implementation is brought into line with the language definition. That doesn't mean that using += instead of + in sum isn't a valid change to make for 3.4. BTW. This thread has been rehashing the same arguments over and over again, and it's pretty likely that most core devs have stopped following this thread because of that. It's probably time for someone to write a summary the discussion (what are the proposals and the arguments in favor and against them) Ronald From ronaldoussoren at mac.com Thu Jul 11 08:15:06 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 11 Jul 2013 08:15:06 +0200 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130711014738.7609f82a@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB6BF6.9030608@pearwood.info> <20130709165449.6b124367@sergey> <20130711014738.7609f82a@sergey> Message-ID: <4081F575-3271-48CC-A2FF-AD2B406FD32F@mac.com> On 11 Jul, 2013, at 0:47, Sergey wrote: > On Jul 9, 2013 Ronald Oussoren wrote: > >>> For example, rewrite tuple to internally store its values in a >>> list, and have `localcount` variable saying how many items from >>> that list belong to this tuple. Then __add__ could extend that >>> list and reuse it for new tuple. >> >> That's not going to happen, not only breaks that backward compatibility >> for users for the C API, > > It depends on implementation details, it's possible to keep it > backward compatible. BTW, what C API do you expect to break? If a tuple stores the values in a separate list the structure of a PyTupleObject changes. That structure is exposed to users of the C API, and hence changes shouldn't be made lightly. > >> it has nasty side effects and is incorrect. >> Nasty side effect: >> a = (1,) >> b = (2,) * 1000 >> c = a + b >> del b >> del c >> With the internal list 'a' keeps alive the extra storage used for 'c'. > > Yes, technically it's possible to implement tuple so that it would > realloc internal list to save some ram, but why? List does not do > that when you remove elements from it, why should tuple do that? Actually, list does resize when you remove items but only does so when the amount of free items gets too large (see list_resize in Object/listobject.c) Keeping memory blocks alive unnecessarily is bad because this can increase the amount of memory used by a script, without their being a clear reason for it when you inspect the python code. This has been one of the reasons for not making string slicing operations views on the entire string (that is, a method like aStr.partition could return objects that reference aStr for the character storage which could save memory but at the significant risk of keeping too much memory alive when aStr is discarded earlier than one of the return values) > > On the other hand: > a = (1,) * 1000 > b = a + (2,3) > c = b + (4,5) > And you have 3 variables for the price of one. Lot's of memory saved! How do you do that? As for as I know Python isn't a quantum system, item 1000 of the hidden storage list can't be both 2 and 4 at the same time. > >> Incorrect: >> a = (1,) >> b = a + (2,) >> c = a + (3,) >> Now 'b' and 'c' can't possibly both share storage with 'a'. > > Nothing incorrect here, of course __add__ should handle that, and > if it cannot reuse list it would copy it. As it does now. And then you introduce unpredictable performance for tuple addition, currently you can reason about the performance of code that does tuple addition and with this change you no longer can (it sometimes is fast, and sometimes it is slow). That's a major gotcha that will cause confusion (string addition also has this feature, and that optimization wouldn't have gotten in with the knowlegde we now have). Anyways, I'm still +0 using += in sum, and -1 on trying to special case particular types. Ronald From stephen at xemacs.org Thu Jul 11 09:26:10 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 11 Jul 2013 16:26:10 +0900 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> <87txk1u0nq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ppuptv59.fsf@uwakimon.sk.tsukuba.ac.jp> Ronald Oussoren writes: > > It does count; it's a language change. It is not a bug-fix in which > > the implementation is brought into line with the language definition. > > That doesn't mean that using += instead of + in sum isn't a valid change > to make for 3.4. Agreed, it might be. I was just addressing Joshua's statement that: > >> I'm not sure not supporting broken code counts as a semantic change. > >> That is what I was debating. AFAIK, whether some code depending on current behavior is broken is irrelevant to Python's definition of "semantic change". > It's probably time for someone to write a summary of the discussion > (what are the proposals and the arguments in favor and against > them) +1 Steve From p.f.moore at gmail.com Thu Jul 11 09:37:59 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 11 Jul 2013 08:37:59 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> Message-ID: On 11 July 2013 04:05, Joshua Landau wrote: > > E.g. I have a DSL where = reassigns to a data structure, += appends to an > > existing one, and + is not defined at all. You can say "x += value" but > not > > "x = x + value". It makes sense in context. As I said, I am prepared to > > consider that the right answer to this is "well don't call sum on your > data > > structure then", but it is a change in behaviour, not just an > optimization. > > That is... really quite a good argument. I think I may have to think > on that final point, but you've probably just about won it. Why didn't > you just say this from the start? Another example - if I have an event class with an API modeled after the C# approach, += is used to add a listener. But + on events makes no sense and is undefined. This class is not currently summable, but would become summable. Again, the right answer is possibly "don't use sum on these objects", but it *is* a semantic change. Also, if "do't use sum here" is a valid statement in these cases, why is it so impossible for "don't use sum on containers" to be a valid argument in the current situation? Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Thu Jul 11 11:07:30 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Jul 2013 10:07:30 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> Message-ID: On 10 July 2013 19:21, Joshua Landau wrote: > > Python has previously had precedents where broken code does not get to > dictate the language as long as that code was very rare. This is more > than very rare. Additionally, Python does (unclearly, but it does do > so) define __iadd__ to be an inplace version of __add__, so the code > isn't just ?broken? -- it's broken. Although I'm quoting Joshua above, there are many people here who have made the erroneous assertion that __iadd__ should always be equivalent to __add__ and that there isn't much code that could be semantically affected by the proposed change. Numpy arrays treat += differently from + in the sense that a += b coerces b to the same dtype as a and then adds in place whereas a + b uses Python style type promotion. This behaviour is by design and it is useful. It is also entirely appropriate (and not pathological) that someone would use sum() to add numpy arrays. An example where + and += give different results: >>> from numpy import array >>> a1 = array([1, 2, 3], dtype=int) >>> a1 array([1, 2, 3]) >>> a2 = array([.5, .5, .5], dtype=float) >>> a2 array([ 0.5, 0.5, 0.5]) >>> a1 + a2 array([ 1.5, 2.5, 3.5]) >>> a1 += a2 >>> a1 array([1, 2, 3]) Oscar From ron3200 at gmail.com Thu Jul 11 12:47:19 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Jul 2013 05:47:19 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51DE046C.4070301@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On 07/10/2013 08:03 PM, Steven D'Aprano wrote: > On 11/07/13 07:00, Ron Adam wrote: >> >> >> On 07/10/2013 12:49 PM, Steven D'Aprano wrote: >>> On 11/07/13 02:10, Sergey wrote: >>>> On Jul 9, 2013 Steven D'Aprano wrote: >>>> >>>>> The fact that sum(lists) has had quadratic performance since sum >>>>> was first introduced in Python 2.3, and I've *never* seen anyone >>>>> complain about it being slow, suggests very strongly that this is not >>>>> a use-case that matters. >>>> >>>> Never seen? Are you sure? ;) >>>>> http://article.gmane.org/gmane.comp.python.general/658630 >>>>> From: Steven D'Aprano @ 2010-03-29 >>>>> In practical terms, does anyone actually ever use sum on more than a >>>>> handful of lists? I don't believe this is more than a hypothetical >>>>> problem. >>> >>> Yes, and I stand by what I wrote back then. >> >> >> Just curious, how does your sum compare with fsum() in the math module? > > math.fsum is a high-precision floating point sum, keeping extra precision > that the built-in loses. Compare these: > > data = [1e-100, 1e100, 1e-100, -1e100]*1000 > sum(data) > math.fsum(data) > > The exact value for the sum is 2e-97. I was thinking more on the lines of how it worked internally compared to sum. And how it handles different inputs. Of course it is quite a bit slower too. >>> timeit("fsum(r)", "from __main__ import fsum\nr=list(range(100))") 15.151492834091187 >>> timeit("sum(r)", "r=list(range(100))") 2.282749891281128 So fsum will take integers, and converts (or casts) them to floats. And bytes, as they are integers. >>> fsum(b'12345') 255.0 But not strings, even if they can be converted to floats. >>> float("12.0") 12.0 >>> fsum(['12.0', '13.0']) Traceback (most recent call last): File "", line 1, in TypeError: a float is required I would like sum to (eventually) be moved to the math module and have it's API and behaviour be the same as fsum. That would have the least surprises and it reduces the mental load when two similar functions act the same and can be found near each other in the library. Cheers, Ron From oscar.j.benjamin at gmail.com Thu Jul 11 12:59:52 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Jul 2013 11:59:52 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On 11 July 2013 11:47, Ron Adam wrote: > On 07/10/2013 08:03 PM, Steven D'Aprano wrote: >> On 11/07/13 07:00, Ron Adam wrote: >>> >>> Just curious, how does your sum compare with fsum() in the math module? >> >> math.fsum is a high-precision floating point sum, keeping extra precision >> that the built-in loses. Compare these: [snip] > > I was thinking more on the lines of how it worked internally compared to > sum. And how it handles different inputs. Of course it is quite a bit > slower too. math.fsum converts all inputs to float and then adds them using (I assume) Kahan summation [1]. A demonstration: >>> class A: ... pass ... >>> a = A() >>> math.fsum([a]) Traceback (most recent call last): File "", line 1, in AttributeError: A instance has no attribute '__float__' >>> class A: ... def __float__(self): ... return -1 ... >>> a = A() >>> math.fsum([a]) Traceback (most recent call last): File "", line 1, in TypeError: nb_float should return float object >>> class A: ... def __float__(self): ... return -1.0 ... >>> a = A() >>> math.fsum([a]) -1.0 > >>>> timeit("fsum(r)", "from __main__ import fsum\nr=list(range(100))") > 15.151492834091187 > >>>> timeit("sum(r)", "r=list(range(100))") > 2.282749891281128 The above is not a fair comparison since fsum converts them to floats as you say. fsum is useless for integers since the additional computation is all about floating point precision. You should use a list of floats for a fair test: >>> timeit("fsum(r)", "from __main__ import fsum\nr=list(map(float, range(100)))") 3.432480121620099 >>> timeit("sum(r)", "from __main__ import fsum\nr=list(map(float, range(100)))") 1.6432468412557402 [1] http://en.wikipedia.org/wiki/Kahan_summation_algorithm Oscar From oscar.j.benjamin at gmail.com Thu Jul 11 13:05:58 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Jul 2013 12:05:58 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On 11 July 2013 11:47, Ron Adam wrote: > > I would like sum to (eventually) be moved to the math module and have it's > API and behaviour be the same as fsum. That would have the least surprises > and it reduces the mental load when two similar functions act the same and > can be found near each other in the library. Sum is *really* useful. You can use it for ints, for floats, for numpy arrays, for Decimals, for Fractions, for sympy.Expressions, for gmpy2... math.fsum is only useful for scalar floats but sum does so much more. There's no need to cripple it just just to stop people from summing lists. Oscar From ronaldoussoren at mac.com Thu Jul 11 13:15:11 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 11 Jul 2013 13:15:11 +0200 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> Message-ID: On 11 Jul, 2013, at 11:07, Oscar Benjamin wrote: > On 10 July 2013 19:21, Joshua Landau wrote: >> >> Python has previously had precedents where broken code does not get to >> dictate the language as long as that code was very rare. This is more >> than very rare. Additionally, Python does (unclearly, but it does do >> so) define __iadd__ to be an inplace version of __add__, so the code >> isn't just ?broken? -- it's broken. > > Although I'm quoting Joshua above, there are many people here who have > made the erroneous assertion that __iadd__ should always be equivalent > to __add__ and that there isn't much code that could be semantically > affected by the proposed change. > > Numpy arrays treat += differently from + in the sense that a += b > coerces b to the same dtype as a and then adds in place whereas a + b > uses Python style type promotion. This behaviour is by design and it > is useful. It is also entirely appropriate (and not pathological) that > someone would use sum() to add numpy arrays. That, and Paul's example about using += for something else than addition, pretty much kills the idea of using += in the implementation of sum() as that would break to much code. Ronald From ron3200 at gmail.com Thu Jul 11 14:23:51 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Jul 2013 07:23:51 -0500 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <20130711005842.13ea7ec1@sergey> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> Message-ID: On 07/10/2013 04:58 PM, Sergey wrote: > On Jul 9, 2013 Ron Adam wrote: > >>> Seriously, why there's so much holy wars about that? I'm not asking >>> to rewrite cpython on Java or C#. I'm not adding a bunch of new >>> functions, I'm not even changing signatures of existing functions. >> >> It's the nature of this particular news group. We focus on improving >> python, and that includes new things and improving old things, but also >> includes discussing any existing or potential problems. >> >> You will almost always get a mix of approval and disapproval on just about >> every thing here. It's not a war, it's just different people having >> different opinions. >> >> Quite often that leads to finding better ways to do things, and in the long >> run, helps avoid adding features and changes that could be counter >> productive to python. > > I must agree that I was indeed inspired with some new ideas during > this discussion. It's just that those "inspirations" come in a very > non-constructive form of "it makes no sense", "cannot always be fast", > "you can't", "everyone else thinks you shouldn't", etc. > > Or is that a lifehack [1] in action? I.e. "You can't make it fast for > that type. Oh, you can? Then you can't make it fast for that type. > Oh, you did that too? But you can't make it fast for all the types!" > What if I can? ;) > > It's just instead of discussing what is the best way to fix a slowness, > I'm spending most time trying to convince people that slowness should > be fixed. > ? sum is slow for lists, let's fix that! > ? you shouldn't use sum... > ? why can't I use sum? > ? because it's slow > ? then let's fix that! > ? you shouldn't use sum... > I haven't thought that somebody can truly believe that something should > be slow, and will find one excuse after another instead of just fixing > the slowness. My advise is to not try so hard to change an individuals mind. Are you familiar with the informal voting system we use? Basically take a look though the discussion and look for +1,-1, or things inbetween or like those, and try to get a feel for how strong we feel as a group on the different suggestions. It's not perfect. What you are looking for is strong(er) indications one way or the other(more -1's or +1's), so you identify the suggestions that can be eliminated and the ones that are worth following up on. >> If it only makes an existing function faster and doesn't change any other >> behaviour, and all the tests still pass for it. Just create an issue on >> the tracker, with the patch posted there, and it will probably be accepted >> after it goes through a much more focused review process. > > I've done that first [2] And there I was asked to write here. :) I haven't checked that issue, but I'm guessing you were referred to here either because, their was a slight API change in your patch, or you were proposing an addition that would include a change, and so they suggested discussing it here first. Or they wanted you to discuss the possibility of making some changes while you are doing this patch. You can always go back with... "there was no consensus concerning... adding or removing... ", and ask to get just the speed increase parts approved. As long as it doesn't use ugly hacks or code that is difficult to maintain, it should be ok. (and doesn't have any other side effects.) My preference on this is to not extend the API, but go ahead and make it faster if you can. Down the road, I'd like the API's of sum() and fsum() to match. And for them to only work on numbers, and possibly be extended to work on vectors. So.. make the numbers case faster, but probably don't bother changing the non numbers case. (It seems like this is the preferred view so far.) There might be some support for depreciating the non-numbers case. I'm not sugesting that *you* do that btw... see below. :-) >> But discussing it here will invite a lot of opinions about how it works, >> how it shouldn't work, what would work better, and etc... > > And about what *I* shouldn't do, what *I* can't and what *I* need. > As if I'm the bug that should be fixed. :( These things that are expressed as *YOU* usually mean ... *WE*. It's just an American English bad habit to overly use "you". Python has millions of useers, so *WE* can't do a lot of things *WE* would like to do. :-/ Cheers, Ron >> It's what this board if for. ;-) > From ron3200 at gmail.com Thu Jul 11 15:20:53 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Jul 2013 08:20:53 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On 07/11/2013 06:05 AM, Oscar Benjamin wrote: > On 11 July 2013 11:47, Ron Adam wrote: >> > >> >I would like sum to (eventually) be moved to the math module and have it's >> >API and behaviour be the same as fsum. That would have the least surprises >> >and it reduces the mental load when two similar functions act the same and >> >can be found near each other in the library. > Sum is*really* useful. You can use it for ints, for floats, for numpy > arrays, for Decimals, for Fractions, for sympy.Expressions, for > gmpy2... > > math.fsum is only useful for scalar floats but sum does so much more. > There's no need to cripple it just just to stop people from summing > lists. Isn't it just a matter of spelling it a bit different? sum(iters, []) # how often do you actually use this? vs... chain(iters) # better in most cases... list(chain(iters)) # when you actually need a combined list. I'd like to see chain as a builtin in any case. Or to look at it in another way... I wouldn't want to add the ability to sum items in a list to chain(). Note, because to get sum() to join lists requires it to be explicitly spelled with a starting list, it doesn't need to be changed. I'm +1 for doing that, only if there is a consensus for doing that. How do you feel about adding the ability of sum to sum vectors or lists of values to each other? sum([[x1, y1], [x2, y2], ...]) ---> [x1+x2, y1+y2] Cheers, Ron From oscar.j.benjamin at gmail.com Thu Jul 11 16:31:48 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Jul 2013 15:31:48 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On 11 July 2013 14:20, Ron Adam wrote: > > On 07/11/2013 06:05 AM, Oscar Benjamin wrote: >> On 11 July 2013 11:47, Ron Adam wrote: > > chain(iters) # better in most cases... I think you mean chain.from_iterable rather than chain > I'd like to see chain as a builtin in any case. chain.from_iterable should be the builtin not chain. > How do you feel about adding the ability of sum to sum vectors or lists of > values to each other? > > sum([[x1, y1], [x2, y2], ...]) ---> [x1+x2, y1+y2] That's the beauty of it. sum() already sums anything you want as long as __add__ is implemented. If I wanted to do the above with some vectors I would probably use numpy arrays which have precisely the __add__ method you want: >>> import numpy as np >>> arrays = [ ... np.array([1, 2, 3]), ... np.array([2, 3, 4]), ... np.array([3, 4, 5]), ... ] >>> arrays [array([1, 2, 3]), array([2, 3, 4]), array([3, 4, 5])] >>> sum(arrays) array([ 6, 9, 12]) This is possible because of the simplicity of the core algorithm in sum() i.e. just calling 'total = total + item' in a loop. Anyone who wants to use sum() with their own type can already do so. Earlier you seemed to be advocating changing that by restricting the types that sum() accepts. Oscar From abarnert at yahoo.com Thu Jul 11 18:45:51 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jul 2013 09:45:51 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> Message-ID: On Jul 11, 2013, at 2:07, Oscar Benjamin wrote: > Numpy arrays treat += differently from + in the sense that a += b > coerces b to the same dtype as a and then adds in place whereas a + b > uses Python style type promotion. This behaviour is by design and it > is useful. It is also entirely appropriate (and not pathological) that > someone would use sum() to add numpy arrays. I forgot about this. I was positive on the first patch (+ first, then += for the rest) mainly because it speeds up sum for numpy. You probably won't _often_ sum arrays of different dtypes... But if you do, you certainly don't want the result to have the dtype resulting from just coercing start.dtype and iter[0].dtype. Of course this could be marked as a caveat for numpy--pass a scalar or array of the right dtype for start, and you get the right answer, after all. But if numpy is the only example of a use case for sum that's reasonable today and gets faster with the patch, I don't think the tradeoff is worth it. Since I've already asked the proponents for such examples multiple times and gotten none beyond the ones I tried to find myself, I'm assuming they are rare or unusual, so I'm -1 now. From abarnert at yahoo.com Thu Jul 11 18:54:16 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jul 2013 09:54:16 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: <03CAA84F-3F36-4CF5-BB95-75A4CFCC4D66@yahoo.com> On Jul 11, 2013, at 7:31, Oscar Benjamin wrote: > On 11 July 2013 14:20, Ron Adam wrote: >> >> On 07/11/2013 06:05 AM, Oscar Benjamin wrote: >>> On 11 July 2013 11:47, Ron Adam wrote: >> >> chain(iters) # better in most cases... > > I think you mean chain.from_iterable rather than chain > >> I'd like to see chain as a builtin in any case. > > chain.from_iterable should be the builtin not chain. This is the problem. We can't rename chain.from_iterable to chain (or, equivalently, change the API of chain) while moving it without a lot of confusion. So it seems like we probably need to name it something completely different--concat, flatten, chainiter, ... But none of those names feels right. Concat seems like it should just take two sequences, not a sequence of sequences. Flatten is something you'd only reach for when you're thinking of it as one nested sequence rather than a collection of sequences. Chainiter is ugly. Chain really would be the perfect name if it didn't already have the wrong connotation thanks to its years of life in itertools. Other than that, I love the idea. The right function for this task should return an iterator, for the same reasons map and zip should, and also because it completely avoids all of the issues with trying to define what it means for different types by only handling iterables and treating them as iterables. The docs for sum already hint that it's the obvious way to concatenate sequences, but it should be more obvious. If someone can come up with a good name (or just nudge my feeling on one of the already proposed names), I'm definitely +1 on this. From mertz at gnosis.cx Thu Jul 11 19:02:03 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 11 Jul 2013 10:02:03 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <03CAA84F-3F36-4CF5-BB95-75A4CFCC4D66@yahoo.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> <03CAA84F-3F36-4CF5-BB95-75A4CFCC4D66@yahoo.com> Message-ID: I'm not in love with it, but what about 'ichain()' following imap() and izip(), ifilter(), etc. On Thu, Jul 11, 2013 at 9:54 AM, Andrew Barnert wrote: > On Jul 11, 2013, at 7:31, Oscar Benjamin > wrote: > > > On 11 July 2013 14:20, Ron Adam wrote: > >> > >> On 07/11/2013 06:05 AM, Oscar Benjamin wrote: > >>> On 11 July 2013 11:47, Ron Adam wrote: > >> > >> chain(iters) # better in most cases... > > > > I think you mean chain.from_iterable rather than chain > > > >> I'd like to see chain as a builtin in any case. > > > > chain.from_iterable should be the builtin not chain. > > This is the problem. We can't rename chain.from_iterable to chain (or, > equivalently, change the API of chain) while moving it without a lot of > confusion. So it seems like we probably need to name it something > completely different--concat, flatten, chainiter, ... But none of those > names feels right. Concat seems like it should just take two sequences, not > a sequence of sequences. Flatten is something you'd only reach for when > you're thinking of it as one nested sequence rather than a collection of > sequences. Chainiter is ugly. Chain really would be the perfect name if it > didn't already have the wrong connotation thanks to its years of life in > itertools. > > Other than that, I love the idea. The right function for this task should > return an iterator, for the same reasons map and zip should, and also > because it completely avoids all of the issues with trying to define what > it means for different types by only handling iterables and treating them > as iterables. The docs for sum already hint that it's the obvious way to > concatenate sequences, but it should be more obvious. > > If someone can come up with a good name (or just nudge my feeling on one > of the already proposed names), I'm definitely +1 on this. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jul 11 18:42:43 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 11 Jul 2013 09:42:43 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> Message-ID: <51DEE083.4000908@stoneleaf.us> On 07/11/2013 05:23 AM, Ron Adam wrote: > > So.. make the numbers case faster, but probably don't bother changing the non numbers case. (It seems like this is the > preferred view so far.) I thought the whole point was to make *non*-numbers faster? -- ~Ethan~ From ron3200 at gmail.com Thu Jul 11 19:12:59 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Jul 2013 12:12:59 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On 07/11/2013 09:31 AM, Oscar Benjamin wrote: > On 11 July 2013 14:20, Ron Adam wrote: >> >> On 07/11/2013 06:05 AM, Oscar Benjamin wrote: >>> On 11 July 2013 11:47, Ron Adam wrote: >> >> chain(iters) # better in most cases... > > I think you mean chain.from_iterable rather than chain > >> I'd like to see chain as a builtin in any case. > > chain.from_iterable should be the builtin not chain. I agree. >> How do you feel about adding the ability of sum to sum vectors or lists of >> values to each other? >> >> sum([[x1, y1], [x2, y2], ...]) ---> [x1+x2, y1+y2] > > That's the beauty of it. sum() already sums anything you want as long > as __add__ is implemented. If I wanted to do the above with some > vectors I would probably use numpy arrays which have precisely the > __add__ method you want: > >>>> import numpy as np >>>> arrays = [ > ... np.array([1, 2, 3]), > ... np.array([2, 3, 4]), > ... np.array([3, 4, 5]), > ... ] >>>> arrays > [array([1, 2, 3]), array([2, 3, 4]), array([3, 4, 5])] >>>> sum(arrays) > array([ 6, 9, 12]) > > This is possible because of the simplicity of the core algorithm in > sum() i.e. just calling 'total = total + item' in a loop. Anyone who > wants to use sum() with their own type can already do so. Earlier you > seemed to be advocating changing that by restricting the types that > sum() accepts. I think this is what it should do. I tried overiding lists __add__, but that didn't work as nice. It needs to have a starting list with all zeros in it. How does numpy get around that? Ron From ron3200 at gmail.com Thu Jul 11 19:19:48 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Jul 2013 12:19:48 -0500 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DEE083.4000908@stoneleaf.us> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> <51DEE083.4000908@stoneleaf.us> Message-ID: On 07/11/2013 11:42 AM, Ethan Furman wrote: >> So.. make the numbers case faster, but probably don't bother changing the >> non numbers case. (It seems like this is the >> preferred view so far.) > > I thought the whole point was to make *non*-numbers faster? Um... yep, like what it says in the title... Somewhere I got the idea he was making the numbers case faster too. Maybe I was mistaken. I need sum() more coffee, I think. ;-) From oscar.j.benjamin at gmail.com Thu Jul 11 19:19:49 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Jul 2013 18:19:49 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> Message-ID: On 11 July 2013 17:45, Andrew Barnert wrote: > On Jul 11, 2013, at 2:07, Oscar Benjamin wrote: > >> Numpy arrays treat += differently from + in the sense that a += b >> coerces b to the same dtype as a and then adds in place whereas a + b >> uses Python style type promotion. This behaviour is by design and it >> is useful. It is also entirely appropriate (and not pathological) that >> someone would use sum() to add numpy arrays. > > I forgot about this. I was positive on the first patch (+ first, then += for the rest) mainly because it speeds up sum for numpy. Only by a constant factor. Summing numpy arrays with sum is O(N) either way. If someone wants to speed that up they can use numpy to do so i.e.: total = np.zeros(shape, dtype=float) for a in arrays: total += a is not significantly slower than sum(arrays) if the arrays themselves are large. > You probably won't _often_ sum arrays of different dtypes... But if you do, you certainly don't want the result to have the dtype resulting from just coercing start.dtype and iter[0].dtype. It can easily happen: import numpy as np initial_velocity = np.array([1, 1, 1]) # Implicitly create an int array velocities = [initial_velocity] for n in range(1000): velocities.append(0.9 * velocities[-1]) # Append float arrays final_position = delta_t * sum(velocities) With the proposed patch all 1000 arrays after the first would count as zero in the final result so that the answer would be (delta_t * array([1, 1, 1])) instead of (delta_t * array([10.0, 10.0, 10.0])) > Of course this could be marked as a caveat for numpy--pass a scalar or array of the right dtype for start, and you get the right answer, after all. I don't think it's acceptable to pass off a backward incompatible change of this nature in a minor release. It's the worst kind of change since there's no DeprecationWarning, no TypeError, just code that silently produces the wrong result. The change might be small in some cases (and so not immediately obvious) but then massive in others. Oscar From oscar.j.benjamin at gmail.com Thu Jul 11 19:21:43 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Jul 2013 18:21:43 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On 11 July 2013 18:12, Ron Adam wrote: > I think this is what it should do. > > I tried overiding lists __add__, but that didn't work as nice. It needs to > have a starting list with all zeros in it. How does numpy get around that? Like so: >>> import numpy >>> a = numpy.array([1, 2, 3]) >>> a array([1, 2, 3]) >>> a + 0 array([1, 2, 3]) >>> a + 1 array([2, 3, 4]) Oscar From ron3200 at gmail.com Thu Jul 11 20:29:08 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Jul 2013 13:29:08 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On 07/11/2013 12:21 PM, Oscar Benjamin wrote: > On 11 July 2013 18:12, Ron Adam wrote: >> I think this is what it should do. >> >> I tried overiding lists __add__, but that didn't work as nice. It needs to >> have a starting list with all zeros in it. How does numpy get around that? > > Like so: > >>>> import numpy >>>> a = numpy.array([1, 2, 3]) >>>> a > array([1, 2, 3]) >>>> a + 0 > array([1, 2, 3]) >>>> a + 1 > array([2, 3, 4]) > > > Oscar Right answer to the wrong question. I was asking how numpy it gets around sum() needing a starting 'array' argument? Not how numpy arrays work with '+'. Cheers, Ron From tjreedy at udel.edu Thu Jul 11 21:20:01 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 11 Jul 2013 15:20:01 -0400 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> <87txk1u0nq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 7/11/2013 1:53 AM, Ronald Oussoren wrote: > > On 11 Jul, 2013, at 7:27, Stephen J. Turnbull wrote: >> It does count; it's a language change. It is not a bug-fix in which >> the implementation is brought into line with the language definition. > > That doesn't mean that using += instead of + in sum isn't a valid change > to make for 3.4. Breaking code in the way this would do, would require a PEP and deprecation cycle. I do not anticipate approval for a general change. A specialized change such that sum(iterable_of_lists, []) would extend rather than replace [] might be done since the result would be equal to the current result, just faster, and since [] must be nearly always passed without aliases that depend on it not changing. Even that should have a deprecation warning. Tuples could be linearly summed in a list with .extend and then converted at the end. I don't believe that would be a semantic change at all. > BTW. This thread has been rehashing the same arguments over and over again, > and it's pretty likely that most core devs have stopped following this thread Right, I just happened to pick this post because you are also a core dev. > because of that. It's probably time for someone to write a summary > the discussion (what are the proposals and the arguments in favor and against them) -- Terry Jan Reedy From alexander.belopolsky at gmail.com Thu Jul 11 22:08:44 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 11 Jul 2013 16:08:44 -0400 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: On Thu, Jul 11, 2013 at 2:29 PM, Ron Adam wrote: > I was asking how numpy it gets around sum() needing a starting 'array' > argument? When you sum arrays, you don't need to start with an array: >>> import numpy >>> 0 + numpy.array([1, 2, 3]) array([1, 2, 3]) >>> sum([_, _, _]) array([3, 6, 9]) The default scalar start gets broadcast to the shape of the array. >>> sum([numpy.zeros((2,2))], 42) array([[ 42., 42.], [ 42., 42.]]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From subbarker at gmail.com Thu Jul 11 22:39:49 2013 From: subbarker at gmail.com (Corey Sarsfield) Date: Thu, 11 Jul 2013 15:39:49 -0500 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) Message-ID: I've always found +=, -= and the like to be handy, but I had hoped like so many other things in python there would be a generic form of this functionality. x += 5 could be expressed as x = ? + 5 perhaps. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jul 11 22:30:42 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 11 Jul 2013 13:30:42 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: <51DF15F2.1090007@stoneleaf.us> On 07/11/2013 11:29 AM, Ron Adam wrote: > > > On 07/11/2013 12:21 PM, Oscar Benjamin wrote: >> On 11 July 2013 18:12, Ron Adam wrote: >>> I think this is what it should do. >>> >>> I tried overiding lists __add__, but that didn't work as nice. It needs to >>> have a starting list with all zeros in it. How does numpy get around that? >> >> Like so: >> >>>>> import numpy >>>>> a = numpy.array([1, 2, 3]) >>>>> a >> array([1, 2, 3]) >>>>> a + 0 >> array([1, 2, 3]) >>>>> a + 1 >> array([2, 3, 4]) >> >> >> Oscar > > Right answer to the wrong question. > > I was asking how numpy it gets around sum() needing a starting 'array' argument? Not how numpy arrays work with '+'. Because the default start is 0, and when you add 0 to a numpy array you get back the same* numpy array. *At least, a numpy array with all the same values. -- ~Ethan~ From michael.weylandt at gmail.com Thu Jul 11 23:00:08 2013 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 11 Jul 2013 16:00:08 -0500 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: Message-ID: On Thu, Jul 11, 2013 at 3:39 PM, Corey Sarsfield wrote: > I've always found +=, -= and the like to be handy, but I had hoped like so > many other things in python there would be a generic form of this > functionality. > > x += 5 could be expressed as x = ? + 5 perhaps. > > Can you flesh this out a bit further? Isn't x += 5 <--> x = x + 5 already defined unless a class specifically does something funny with __iadd__? Cheers, Michael From subbarker at gmail.com Thu Jul 11 23:07:07 2013 From: subbarker at gmail.com (Corey Sarsfield) Date: Thu, 11 Jul 2013 16:07:07 -0500 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: Message-ID: I came up with the idea after having some code on dicts that looked like: a[b][c] = foo(a[b][c]) So in this case there are twice as many look-ups going on as there need to be, even if a[b][c] were to be pulled out into x. If I were to do: a[b][c] += 1 Would it be doing the lookups twice behind the scenes? On Thu, Jul 11, 2013 at 4:00 PM, R. Michael Weylandt < michael.weylandt at gmail.com> wrote: > On Thu, Jul 11, 2013 at 3:39 PM, Corey Sarsfield > wrote: > > I've always found +=, -= and the like to be handy, but I had hoped like > so > > many other things in python there would be a generic form of this > > functionality. > > > > x += 5 could be expressed as x = ? + 5 perhaps. > > > > > > Can you flesh this out a bit further? Isn't x += 5 <--> x = x + 5 > already defined unless a class specifically does something funny with > __iadd__? > > Cheers, > Michael > -- Corey Sarsfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From anntzer.lee at gmail.com Thu Jul 11 23:07:33 2013 From: anntzer.lee at gmail.com (Antony Lee) Date: Thu, 11 Jul 2013 14:07:33 -0700 (PDT) Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body In-Reply-To: <51DE0FD8.4050301@stoneleaf.us> References: <51DB5573.5070004@stoneleaf.us> <6fc3f1ea-e643-40c4-aa2c-6e0d42bd7b6e@googlegroups.com> <51DE0FD8.4050301@stoneleaf.us> Message-ID: <334b5e00-2f0b-4231-9b86-1e82105c5e28@googlegroups.com> In the current version, they work with a custom __init__ (though of course, as long as the actual arguments that need to be passed to __init__ are provided, the pre-declared members are just "empty"). They do not work with a custom __new__ (not sure how I could make this work, given that at declaration time an "empty" member needs to be created but we don't know what arguments we need to pass to __new__...). As a side effect, however, the whole patch adds a new requirement: custom __new__s must be defined before the members themselves; otherwise they won't be called, for the same reason as above: if I don't know what __new__ is, I can't call it... Antony On Wednesday, July 10, 2013 6:52:24 PM UTC-7, stoneleaf wrote: > > On 07/10/2013 03:47 PM, Antony Lee wrote: > > > > Forward references are now implemented (https://github.com/anntzer/enum). > They require an explicit declaration, ? la > > Do they work with a custom __new__ ? __init__ ? > > -- > ~Ethan~ > _______________________________________________ > Python-ideas mailing list > Python... at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jul 11 23:35:55 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jul 2013 14:35:55 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> <03CAA84F-3F36-4CF5-BB95-75A4CFCC4D66@yahoo.com> Message-ID: On Jul 11, 2013, at 10:02, David Mertz wrote: > I'm not in love with it, but what about 'ichain()' following imap() and izip(), ifilter(), etc. Given that Python 3 renamed those functions to map, zip, and filter while moving them to builtins (and replacing the 2.x builtins of the same names), I don't think that's reasonable. > > > On Thu, Jul 11, 2013 at 9:54 AM, Andrew Barnert wrote: >> On Jul 11, 2013, at 7:31, Oscar Benjamin wrote: >> >> > On 11 July 2013 14:20, Ron Adam wrote: >> >> >> >> On 07/11/2013 06:05 AM, Oscar Benjamin wrote: >> >>> On 11 July 2013 11:47, Ron Adam wrote: >> >> >> >> chain(iters) # better in most cases... >> > >> > I think you mean chain.from_iterable rather than chain >> > >> >> I'd like to see chain as a builtin in any case. >> > >> > chain.from_iterable should be the builtin not chain. >> >> This is the problem. We can't rename chain.from_iterable to chain (or, equivalently, change the API of chain) while moving it without a lot of confusion. So it seems like we probably need to name it something completely different--concat, flatten, chainiter, ... But none of those names feels right. Concat seems like it should just take two sequences, not a sequence of sequences. Flatten is something you'd only reach for when you're thinking of it as one nested sequence rather than a collection of sequences. Chainiter is ugly. Chain really would be the perfect name if it didn't already have the wrong connotation thanks to its years of life in itertools. >> >> Other than that, I love the idea. The right function for this task should return an iterator, for the same reasons map and zip should, and also because it completely avoids all of the issues with trying to define what it means for different types by only handling iterables and treating them as iterables. The docs for sum already hint that it's the obvious way to concatenate sequences, but it should be more obvious. >> >> If someone can come up with a good name (or just nudge my feeling on one of the already proposed names), I'm definitely +1 on this. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jul 11 23:41:22 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jul 2013 14:41:22 -0700 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: Message-ID: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> On Jul 11, 2013, at 14:07, Corey Sarsfield wrote: > I came up with the idea after having some code on dicts that looked like: > > a[b][c] = foo(a[b][c]) > > So in this case there are twice as many look-ups going on as there need to be, even if a[b][c] were to be pulled out into x. > > If I were to do: > > a[b][c] += 1 > > Would it be doing the lookups twice behind the scenes? Effectively, the best it could possibly do is something like this: tmp = a[b] tmp,__setitem__('c', tmp.__getitem__('c').__iadd__(1)) So yes, there are two lookups. But if a[b] is a dict... Who cares? The lookup is a hash--which is cached after the first one--plus indexing into an array. > > > On Thu, Jul 11, 2013 at 4:00 PM, R. Michael Weylandt wrote: >> On Thu, Jul 11, 2013 at 3:39 PM, Corey Sarsfield wrote: >> > I've always found +=, -= and the like to be handy, but I had hoped like so >> > many other things in python there would be a generic form of this >> > functionality. >> > >> > x += 5 could be expressed as x = ? + 5 perhaps. >> > >> > >> >> Can you flesh this out a bit further? Isn't x += 5 <--> x = x + 5 >> already defined unless a class specifically does something funny with >> __iadd__? >> >> Cheers, >> Michael > > > > -- > Corey Sarsfield > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jul 12 00:00:21 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 11 Jul 2013 15:00:21 -0700 Subject: [Python-ideas] Allow Enum members to refer to each other during execution of body In-Reply-To: <334b5e00-2f0b-4231-9b86-1e82105c5e28@googlegroups.com> References: <51DB5573.5070004@stoneleaf.us> <6fc3f1ea-e643-40c4-aa2c-6e0d42bd7b6e@googlegroups.com> <51DE0FD8.4050301@stoneleaf.us> <334b5e00-2f0b-4231-9b86-1e82105c5e28@googlegroups.com> Message-ID: <51DF2AF5.6050804@stoneleaf.us> On 07/11/2013 02:07 PM, Antony Lee wrote: > On Wednesday, July 10, 2013 6:52:24 PM UTC-7, stoneleaf wrote: >> On 07/10/2013 03:47 PM, Antony Lee wrote: >>> >>> Forward references are now implemented (https://github.com/anntzer/enum ). >> >> Do they work with a custom __new__ ? __init__ ? > > In the current version, they work with a custom __init__ (though of course, as long as the actual arguments that need to > be passed to __init__ are provided, the pre-declared members are just "empty"). They do not work with a custom __new__ > (not sure how I could make this work, given that at declaration time an "empty" member needs to be created but we don't > know what arguments we need to pass to __new__...). > As a side effect, however, the whole patch adds a new requirement: custom __new__s must be defined before the members > themselves; otherwise they won't be called, for the same reason as above: if I don't know what __new__ is, I can't call > it... Hmm. Well, at this point I can offer kudos for getting it this far, but that's about it. The use-case this addresses seems fairly rare, and is definitely not a typical enumeration, and can be solved fairly easily with some extra post-processing code on a per-enumeration basis. -- ~Ethan~ From ben+python at benfinney.id.au Fri Jul 12 01:17:02 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 12 Jul 2013 09:17:02 +1000 Subject: [Python-ideas] Vote value range (was: Fast sum() for non-numbers - why so much worries?) References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> Message-ID: <7wa9lsoff5.fsf_-_@benfinney.id.au> Ron Adam writes: > Are you familiar with the informal voting system we use? Basically > take a look though the discussion and look for +1,-1, or things > inbetween or like those, and try to get a feel for how strong we feel > as a group on the different suggestions. Remember to *discard* any vote outside the range ?1.0?1.0 since nobody gets more than that on any single issue. Hyperbole is a ValueError! -- \ ?An idea isn't responsible for the people who believe in it.? | `\ ?Donald Robert Perry Marquis | _o__) | Ben Finney From joshua at landau.ws Fri Jul 12 01:33:27 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 00:33:27 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> Message-ID: On 11 July 2013 18:19, Oscar Benjamin wrote: > On 11 July 2013 17:45, Andrew Barnert wrote: >> You probably won't _often_ sum arrays of different dtypes... But if you do, you certainly don't want the result to have the dtype resulting from just coercing start.dtype and iter[0].dtype. > > It can easily happen: > > import numpy as np > initial_velocity = np.array([1, 1, 1]) # Implicitly create an int array > velocities = [initial_velocity] > for n in range(1000): > velocities.append(0.9 * velocities[-1]) # Append float arrays > final_position = delta_t * sum(velocities) > > With the proposed patch all 1000 arrays after the first would count as > zero in the final result so that the answer would be (delta_t * > array([1, 1, 1])) instead of (delta_t * array([10.0, 10.0, 10.0])) The points that have led to this point have pushed me from significantly in favour to significantly against. I'm not sure what counts as "significantly against", but without a proper deprecation cycle, PEP and whatnot (which I'm not sure would be worth the benefits of the change) I'm absolutely against this. We can't rush a semantic change for code that's in popular usage. Apologies, Sergey, but it seems I've left for the dark side. From jsbueno at python.org.br Fri Jul 12 01:57:07 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 11 Jul 2013 20:57:07 -0300 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> References: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> Message-ID: I don't know if what I miss most is the samething the OP is asking for - but the idea of re-using a value retrieved in an expression in the same expression - without the need to assign to a temporary variable. Like in: value = expensive_function(b) if expensive_function(b) else default_value (of course this is a trivial example - but nonetheless it would require an extra "if" statement to avoid the double call) Anyway, the proposed syntax in the O.P. would not suffice for this case. On 11 July 2013 18:41, Andrew Barnert wrote: > On Jul 11, 2013, at 14:07, Corey Sarsfield wrote: > > I came up with the idea after having some code on dicts that looked like: > > a[b][c] = foo(a[b][c]) > > So in this case there are twice as many look-ups going on as there need to > be, even if a[b][c] were to be pulled out into x. > > If I were to do: > > a[b][c] += 1 > > Would it be doing the lookups twice behind the scenes? > > > Effectively, the best it could possibly do is something like this: > > tmp = a[b] > tmp,__setitem__('c', tmp.__getitem__('c').__iadd__(1)) > > So yes, there are two lookups. > > But if a[b] is a dict... Who cares? The lookup is a hash--which is cached > after the first one--plus indexing into an array. > > > > > On Thu, Jul 11, 2013 at 4:00 PM, R. Michael Weylandt > wrote: >> >> On Thu, Jul 11, 2013 at 3:39 PM, Corey Sarsfield >> wrote: >> > I've always found +=, -= and the like to be handy, but I had hoped like >> > so >> > many other things in python there would be a generic form of this >> > functionality. >> > >> > x += 5 could be expressed as x = ? + 5 perhaps. >> > >> > >> >> Can you flesh this out a bit further? Isn't x += 5 <--> x = x + 5 >> already defined unless a class specifically does something funny with >> __iadd__? >> >> Cheers, >> Michael > > > > > -- > Corey Sarsfield > > _______________________________________________ > > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From mertz at gnosis.cx Fri Jul 12 02:07:30 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 11 Jul 2013 17:07:30 -0700 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> Message-ID: On Thu, Jul 11, 2013 at 4:57 PM, Joao S. O. Bueno wrote: > I don't know if what I miss most is the samething the OP is asking for - > but the idea of re-using a value retrieved in an expression in the > same expression - without the need > to assign to a temporary variable. > > Like in: > value = expensive_function(b) if expensive_function(b) else > default_value > > (of course this is a trivial example - but nonetheless it would require an > extra "if" statement to avoid the double call) > How about: value = expensive_function(b) or default_value One call, exact same behavior as you request. Available since Python 1.0. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jul 12 02:17:41 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Jul 2013 10:17:41 +1000 Subject: [Python-ideas] Vote value range In-Reply-To: <7wa9lsoff5.fsf_-_@benfinney.id.au> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> <7wa9lsoff5.fsf_-_@benfinney.id.au> Message-ID: <51DF4B25.8010709@pearwood.info> On 12/07/13 09:17, Ben Finney wrote: > Ron Adam writes: > >> Are you familiar with the informal voting system we use? Basically >> take a look though the discussion and look for +1,-1, or things >> inbetween or like those, and try to get a feel for how strong we feel >> as a group on the different suggestions. > > Remember to *discard* any vote outside the range ?1.0?1.0 since nobody > gets more than that on any single issue. Hyperbole is a ValueError! +1000 on that! But seriously, the given numbers are *strength of feeling*, not number of votes. That's why you can vote +0 or -0.5. -- Steven From steve at pearwood.info Fri Jul 12 02:23:47 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Jul 2013 10:23:47 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> Message-ID: <51DF4C93.3090206@pearwood.info> On 11/07/13 23:20, Ron Adam wrote: > How do you feel about adding the ability of sum to sum vectors or lists of values to each other? > > sum([[x1, y1], [x2, y2], ...]) ---> [x1+x2, y1+y2] What I think of that is that it is not backward compatible and would completely break any code relying on sum's current behaviour. We're in the middle of a long discussion arguing that a *tiny*, *subtle* shift in behaviour of sum is sufficient to disqualify the change, and you're suggesting something that completely changes the semantics from list concatenation to element-by-element addition. No offence Ron, but have you been reading the rest of the thread? If anyone wants element-by-element addition, then can either define a class that does so using the + operator (say, numpy arrays) or they define their own function. Or try to revive PEP 225: http://www.python.org/dev/peps/pep-0225/ -- Steven From steve at pearwood.info Fri Jul 12 02:52:56 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Jul 2013 10:52:56 +1000 Subject: [Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?] In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> Message-ID: <51DF5368.6020505@pearwood.info> This started out as a response to Ronald, but turned into a summary and pseudo-code implementation, so I've changed the subject line. On 11/07/13 21:15, Ronald Oussoren wrote: [...] > That, and Paul's example about using += for something else than addition, > pretty much kills the idea of using += in the implementation of sum() as that > would break to much code. Not quite :-) After spending so much time explaining why I don't think sum *should* optimize the list case, I'm now going to suggest a way which (I think) sum *could* optimize the list case without changing behaviour. [puts on Devil's Advocate cap] What we have to give up is Sergey's hope to make sum fast for "everything". I don't think that is possible, but even if it is, we should not let the Perfect be the enemy of the Good. Can we make sum fast for lists, without changing behaviour? I think we can. We know that list.__iadd__ is a fast, inplace version of __add__. We can't make that assumption about every class, not even list subclasses, but we don't have to speed up every class. Let's not be greedy: incremental improvements are better than no improvements. sum already has special cases for ints and floats and strings (the string case is to just prohibit the use of sum). We could add a special case for lists: if the start parameter is an actual built-in list, not a subclass (because subclasses might override __iadd__) then we start with an accumulator list, and call __iadd__ (or extend) on that list to concatenate on it. As soon as we find an item which is not a built-in list, we drop out of the fast-code and fall back to the normal non-fast path. This doesn't make it "fast for everything", but it's still an optimization. Here's some pseudo-code, modified from the code Sergey posted earlier: # untested def sum(values, start=0): values = iter(values) [... skip special cases for int, string, float, etc.] elif type(start) is list: result = [] result += start for value in values: if type(value) is not list: start = result + value break result += value else: # No break. return result # If we get here, fall back to the non-fast path. result = start for value in values: result = result + value return result (Credit to Sergey for coming up with the original code.) Since we cannot *guarantee* that sum is fast for all classes, we should not promise what we can't guarantee. Better to deliver more than you promise, than to make grandiose promises of "fast for everything" that you cannot live up to. Arguments in favour of this suggestion: - The simple use-case of summing a list of lists will speed up. - The behaviour of sum doesn't change (except for speed): * sum still returns a new list, it doesn't modify in place; * officially, sum still relies on __add__ only, not __iadd__. Arguments against: - Somebody has to write this code, and debug it, and maintain it. It won't be me. - Optimized code is complex code, and perhaps this extra complication is too much complication. After all, the Zen says "Simple is better than complex, complex is better than complicated." - If sum of lists is fast, it will lull people into a false sense of security. sum remains an attractive nuisance, since you cannot rely on it being fast. As soon as you add one list subclass, it falls back to the slow code again. - But on the other hand, that's not much worse than the situation now. People are already lulled into a false sense of security, because "Everything is fast for small enough N". Further questions: 1) Can we be less conservative about subclasses? E.g. if a subclass does not override __add__ or __iadd__, can we safely use the fast path, and only fall back on the slow path for those that override? - In favour: that would make sum more useful, rather than less. - Against: lots more complications... 2) Sergey wants to do something similar for tuples, using a temporary list accumulator. I've raised the possibility that if the accumulated list is huge enough, you might get a MemoryError trying to copy it back to a tuple. Does anyone care about this hypothetical case? If not, then we could extend the optimization to tuples (at least those that don't override __add__). 3) And the critical question: with this (theoretical) patch in place, do the test suites still pass? I asked Sergey this same question about his patch some days ago, but haven't seen an answer. -- Steven From python at mrabarnett.plus.com Fri Jul 12 03:12:06 2013 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 12 Jul 2013 02:12:06 +0100 Subject: [Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?] In-Reply-To: <51DF5368.6020505@pearwood.info> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DF5368.6020505@pearwood.info> Message-ID: <51DF57E6.8090206@mrabarnett.plus.com> On 12/07/2013 01:52, Steven D'Aprano wrote: > This started out as a response to Ronald, but turned into a summary > and pseudo-code implementation, so I've changed the subject line. > > > On 11/07/13 21:15, Ronald Oussoren wrote: [...] >> That, and Paul's example about using += for something else than >> addition, pretty much kills the idea of using += in the >> implementation of sum() as that would break to much code. > > Not quite :-) > > After spending so much time explaining why I don't think sum *should* > optimize the list case, I'm now going to suggest a way which (I > think) sum *could* optimize the list case without changing > behaviour. > > [puts on Devil's Advocate cap] > > What we have to give up is Sergey's hope to make sum fast for > "everything". I don't think that is possible, but even if it is, we > should not let the Perfect be the enemy of the Good. Can we make sum > fast for lists, without changing behaviour? I think we can. > > We know that list.__iadd__ is a fast, inplace version of __add__. We > can't make that assumption about every class, not even list > subclasses, but we don't have to speed up every class. Let's not be > greedy: incremental improvements are better than no improvements. > > sum already has special cases for ints and floats and strings (the > string case is to just prohibit the use of sum). We could add a > special case for lists: if the start parameter is an actual built-in > list, not a subclass (because subclasses might override __iadd__) > then we start with an accumulator list, and call __iadd__ (or extend) > on that list to concatenate on it. As soon as we find an item which > is not a built-in list, we drop out of the fast-code and fall back to > the normal non-fast path. This doesn't make it "fast for everything", > but it's still an optimization. > [snip] While you have your cap on, if you're going to special-case lists, then why not strings too (just passing them on to "".join())? From joshua at landau.ws Fri Jul 12 03:09:21 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 02:09:21 +0100 Subject: [Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?] In-Reply-To: <51DF5368.6020505@pearwood.info> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DF5368.6020505@pearwood.info> Message-ID: On 12 July 2013 01:52, Steven D'Aprano wrote: > After spending so much time explaining why I don't think sum *should* > optimize the list case, I'm now going to suggest a way which (I think) sum > *could* optimize the list case without changing behaviour. > > > Since we cannot *guarantee* that sum is fast for all classes, we should not > promise what we can't guarantee. Better to deliver more than you promise, > than to make grandiose promises of "fast for everything" that you cannot > live up to. That is true. however, I don't believe that a sharp cliff where there should be a gradient is the right approach. > Arguments in favour of this suggestion: > > > Arguments against: > > - If sum of lists is fast, it will lull people into a false sense of > security. sum remains an attractive nuisance, since you cannot rely on it > being fast. As soon as you add one list subclass, it falls back to the slow > code again. The subclass thing matters a lot to me. When the discussion was about "+=", I was for it -- a subclass that keeps behaviour consistent should not be treated like a second-class citizen, and so we should do everything we can to prevent people from using sum there. A three-line solution is miles better than a half-line one which bombs as soon as you make tiny should-be-irrelevant changes. > - But on the other hand, that's not much worse than the situation now. > People are already lulled into a false sense of security, because > "Everything is fast for small enough N". I don't agree that that's the same thing. That's being blindly uninterested in asymptotic performance whereas this change to sum actively tricks you into thinking that your algorithm has better asymptotic performance than it does. > Further questions: > > 1) Can we be less conservative about subclasses? E.g. if a subclass does not > override __add__ or __iadd__, can we safely use the fast path, and only fall > back on the slow path for those that override? I'm not sure how __add__ and __iadd__ are implemented -- do they call other methods on the class? > 2) Sergey wants to do something similar for tuples, using a temporary list > accumulator. I've raised the possibility that if the accumulated list is > huge enough, you might get a MemoryError trying to copy it back to a tuple. > Does anyone care about this hypothetical case? If not, then we could extend > the optimization to tuples (at least those that don't override __add__). Personally I do not care much; Python makes no guarantees about memory usage. Additionally, the alternative involves a lot of memory holding partially-added tuples which will be at worst no better than the list-accumulation method. > 3) And the critical question: with this (theoretical) patch in place, do the > test suites still pass? I asked Sergey this same question about his patch > some days ago, but haven't seen an answer. From sergemp at mail.ru Fri Jul 12 03:34:19 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 12 Jul 2013 04:34:19 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> Message-ID: <20130712043419.1f5c59e5@sergey> On Jul 9, 2013 Andrew Barnert wrote: >> 1. Make sum faster for everything BUT tuples and write in a manual: >> ... >> 2. Implement just one (!) special case for the only type in python >> needing it and write: >> ... >> >> I chose #2. Tuple is one of the most frequently used types in python, >> and it's the only type that needs such a special case. >> >> Until someone writes a better solution: Practicality beats purity. >> That was my motivation. > > #3 is for the docs to say what they currently say--sum is fast for > numbers--but change the "not strings" to "not sequences", and maybe > add a note saying _how_ to concatenate sequences (usually join for > strings and chain for everything else). Which effectively means do nothing and agree that slow sum is the best for python. Despite we CAN make it faster. For all cases, or for most of them, but we could at least start discussing options, instead of repeating excuses. >> Theoretically it's possible to rewrite a tuple type to internally use >> list type for storage, and additionally have a `localcount` variable >> saying how many items from that list belong to this tuple. Then >> __add__ for such tuple would just create a new tuple with exactly >> same internal list (with incremented ref.count) extended. This way, >> technically, __add__ would modify all the tuples around, but due to >> internal `localcount` variable you won't notice that. > > I was going to point out the side effects of such a change, but > someone beat me to it. Yes, as a side-effect it would sometimes use less memory (and it could also use more memory, however I can't think of any real-world cases where that could be a problem). But the main goal was met: it would make tuple O(N) summable. So: >> Would you like such a patch instead? Would you want to write it? ;) >> It's just this patch only optimizes add, which is ONLY needed for >> many sequential adds, i.e. for sum(), so I thought that it would be >> MUCH easier to add it to sum instead. > > And it's even easier to add neither. It was even easier to not create python at all, than we wouldn't have to spend our time here discussing it. It's always easy to do nothing. But sometimes it takes so much time to do something good... >> Which is just one type ? tuple. There's no other summable standard >> types in python having O(N) __add__, is it? > > Does nobody ever use types from the stdlib, third-party libs, or > their own code in your world? Builtin types are not generally > magical. A function that works well for all builtin types, but not > types in the stdlib, is a recipe for confusion. What types from stdlib it does not work for? >>> So, if you're suggesting that sum can be fast for anything >>> reasonable, you're just wrong. >> >> I suggested two ways how to do that. First, one can use the approach >> above, i.e. use mutable type internally. Second, for internal cpython >> types we have a simpler option to implement optimization directly >> in sum(). And there may be many others, specific to the types in >> question. > > Using a mutable type internally will almost always have side > effects, or at least complicate the implementation. Yes, optimization may or may not complicate implementation. Nothing new here. For example "optimizing" tuple most probably won't complicate implementation, it would just make it different, but faster to __add__. > What you're suggesting is that theoretically, for some different > language that placed a very strange and sometimes hard to meet > requirement on all types, sum could be fast for all types. That > doesn't mean sum can be fast for all Python types, because Python > doesn't, and shouldn't, have such a requirement. No. What I suggested is that sum() could be faster for SOME types. And I provided a patch for that. But you (or someone else) said that I can't make sum fast for other types, for example for tuples. So I provided a patch doing that. Then you said that I can't do that for tuples without patching sum. And I explained how it could be done too. I never put any requirements. It's just you somewhy placed a strange requirement on sum() that it must always remain slow... > And again, what would be the benefit? Hm. Benefits of O(N)-summable builtin types? * No more surprises "Oh, sum() is O(N**2) Why?" * People can use whatever builtin functions they want with any built-in types they want in any obvious to them way and nobody would say them that it will be too slow. A little slower, maybe, but not too much. * Programs become faster * Code becomes simpler and easier to read * Python becomes a better language * More people start using python * Everybody's happy, world piece, etc. >> I don't remember saying that every collection type in a world is >> O(N) summable, but ok. Would you agree that all summable built-in >> collection types of python become O(N) summable with "my design"? > > Yes, but if it's impossible to add a new collection type that > works like tuple, that makes python worse, not better. Why would it be impossible? In worst case it would be hard (but not impossible) to add a new collection type, that is as fast as tuple, yes, so what? It is already hard to do that, nothing new there. >> I.e. they were not O(N) summable before my patch, but they are O(N) >> after it. Then why don't you like the patch? >> >> Because somewhere in the world there could theoretically exist some >> other types that are still not O(N) summable? > > No, because all over the world there already actually exist such > types, and because it's trivial--and useful--to create a new one. Could you show me some code where a lot of those custom summable sequences are added together? Or maybe about someone, complaining about sum being slow for them? No examples? Then, this is not a problem, right? ;) >> Maybe, then we (or >> their authors) will deal with them later (and I tried to show you >> the options for your single linked list example). > > And you only came up with options that destroy vital features of > the type, making it useless for most people who would want to use it. I did not. Or are you saying that STL developers also destroyed vital features of your the because they have not implemented std::list the same way? Have python destroyed those vital features with it's list? Your type is not summable. You asked for the type that is summable. I suggested one to you, you did not liked it. That's it, nothing is destroyed. > But, more importantly, it is very simple to design and implement > new collection types in Python today. Adding a new requirement that's > hard enough to reach--one that you haven't been able to pull it off > for the first example you were given--implies that it would no longer > be easy. What requirements are you talking about? > Then you misunderstood it. Tuples were offered as one example, out > of many, that won't be fast. Solving that one example by adding a > special case in the C code doesn't help the general problem That's why I explained how this can also be done without writing a special case in sum(). It's just if you have a goal and two ways to reach it, one of them is extremely complicated, and another one is easy, other than that they're equal, which one will you choose? >>> But I agree with the sentiment. There is an efficient way to >>> concatenate a lot of cons lists (basically, just fold backward >>> instead of forward), but sum is not it. >> >> Hm... If you implement fast __radd__ instead of __add__ sum would >> use that, wouldn't it? Is that the easy way you were looking for? > > First, how do you propose that sum find out whether adding or > radding is faster for a given type? > > More importantly: that wouldn't actually do anything in this case. > The key to the optimization is doing the sequence of adds in reverse > order, not flipping each add. Yea, I agree, it won't solve the problem right away, since elements would still be radded left-to-right. Well, you still have at least two options for your cons-lists: either use a `tail` attribute of your list or as a some global variable. I.e. list_of_cons_lists[0].tail = find_tail(list_of_cons_lists[0]) sum(list_of_cons_lists) and use `tail` in your __add__ implementation. It's that simple. Or, since you're writing a custom type anyway, and you have to specify start class for sum() you can use it to hold a tail for you. Like: class ConsListsSum: tail = None def __add__(self, other): if self.tail is None: self.tail = other else: self.tail.next = other while self.tail.next is not None: self.tail = self.tail.next return self sum(list_of_cons_lists, ConsListsSum()) In that case to use sum you won't need __add__ in your list at all. >> Then, what way you suggest to be preferred? For example how would you >> do that in py4k? I guess you would prefer sum() to reject lists and >> tuples as it now rejects strings and use some other function for them >> instead? Or what? What is YOUR preferred way? > > I've already answered this, but I'll repeat it. > > sum is not the obvious way to concatenate sequences today, and my > preferred way is for that to stay true. So, I'm: [...] You're saying what IS NOT your preferred way. But I'm asking what IS your preferred way. Do you prefer to have a slow sum in python and people asking why it's slow forever? Do you see that as a best possible case for python? >> We have 'list' and 'listiterator', 'tuple' and 'tupleiterator', 'set' >> and 'setiterator'. Nothing unusual here. And no issues about them. > > But they aren't the same kind of thing at all. I don't want to > explain the differences between what C++ calls iterators and what > Python calls iterators unless it's necessary. But briefly, a > std::list::iterator is a mutable reference to a node. And std::list::const_iterator is not. So? > Exposing that type means--as I've already said twice--that you end > up with exactly the same problems you have exposing the node > directly. No, I won't, unless I make it mutable too. And I don't have to. >> Not all of them, but some. I.e. if you used your cons-lists as queue >> or stack, then deque is a good replacement. > > Well, yes, but a dynamic array like Python's list is also a > perfectly good stack. So what? Nothing. It's just you said: >>>> If you make the lists and nodes separate types, and >>>> the nodes private,?you have to create yet a third type, >>>>?like the list::erator type in C++ So I answered that I don't always have to do that. I may not need iterator for some of cons-list use cases. >> Really, I don't understand that point. Are you saying, that sum() >> must remain slow for FREQUENTLY USED standard types just because >> there MAY BE some other types for which it would still be slow? > > You're twisting the emphasis drastically, but basically yes. And that's the only reason? What if I solve that too? E.g. what if there would be a common way exposed to all the types in a world? For example imaging a special case (or is it "general case" now) like this: if hasattr(type(start), "__init_concatenable_sequence_from_iterable__"): optimize_for = type(start) l_result = list() l_result.extend(start) try: while True: item = next(it) if type(item) != optimize_for: start = optimize_for.__init_concatenable_sequence_from_iterable__(l_result) start = start + item break l_result.extend(item) except StopIteration: return optimize_for.__init_concatenable_sequence_from_iterable__(l_result) In that case every type would be able to implement an optional __init_concatenable_sequence_from_iterable__ method to benefit from optimized sum(). If they won't do that sum() would use general code. And of course it would be trivial to implement it for e.g. tuples. Is that what you want? > Today, sum is not the best way to concatenate sequences. Making it > work better for some sequences but not others would mean it's still > not the best way to concatenate sequences, but it would _appear_ to > be. That's the very definition of an attractive nuisance. Today python is not the best language in the world, because it still has bugs. Fixing some bugs but not others would mean that it's still not the best language in the world, but it would _appear_ to be. So, let's never fix any bugs? Don't you think that your logic is flawed? ;) >>> Using a global variable (or a class attribute, which is the >>> same thing) means that sum isn't reentrant, or thread-safe, or >>> generator-safe. >> >> Is it now? No? Then what changes? > > Yes, it is now. So that's what changes. sum() uses Py_DECREF. Py_DECREF is not thread-safe. It means that sum() is not thread safe too. Where am I wrong? > Again, this is something general and basic--operations that use > global variables are not reentrant--and I can't tell whether you're > being deliberately obtuse or whether you really don't understand that. Well, if you don't want to use a global variable ?? don't use it. :) From joshua at landau.ws Fri Jul 12 04:07:31 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 03:07:31 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130712043419.1f5c59e5@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> Message-ID: On 12 July 2013 02:34, Sergey wrote: > On Jul 9, 2013 Andrew Barnert wrote: > > but we could at least start discussing options, instead of > repeating excuses. This is rude, and I'd rather you try to avoid being rude. >>> Theoretically it's possible to rewrite a tuple type to internally use >>> list type for storage, and additionally have a `localcount` variable >>> saying how many items from that list belong to this tuple. Then >>> __add__ for such tuple would just create a new tuple with exactly >>> same internal list (with incremented ref.count) extended. This way, >>> technically, __add__ would modify all the tuples around, but due to >>> internal `localcount` variable you won't notice that. >> >> I was going to point out the side effects of such a change, but >> someone beat me to it. > > Yes, as a side-effect it would sometimes use less memory (and it could > also use more memory, however I can't think of any real-world cases > where that could be a problem). Is this similar to the shared-memory thing for dictionaries? If so, it might be a good proposal in and of itself. >>> Which is just one type ? tuple. There's no other summable standard >>> types in python having O(N) __add__, is it? >> >> Does nobody ever use types from the stdlib, third-party libs, or >> their own code in your world? Builtin types are not generally >> magical. A function that works well for all builtin types, but not >> types in the stdlib, is a recipe for confusion. > > What types from stdlib it does not work for? Shall I list some stdlib types; you can tell me what it's fast for: # do this later >>>> So, if you're suggesting that sum can be fast for anything >>>> reasonable, you're just wrong. >>> >>> I suggested two ways how to do that. First, one can use the approach >>> above, i.e. use mutable type internally. Second, for internal cpython >>> types we have a simpler option to implement optimization directly >>> in sum(). And there may be many others, specific to the types in >>> question. >> >> Using a mutable type internally will almost always have side >> effects, or at least complicate the implementation. > > Yes, optimization may or may not complicate implementation. Nothing new > here. For example "optimizing" tuple most probably won't complicate > implementation, it would just make it different, but faster to __add__. But you also make it complicated for everyone who subclasses, for example. It's not good to make lives harder for this. >> And again, what would be the benefit? > > Hm. Benefits of O(N)-summable builtin types? > * No more surprises "Oh, sum() is O(N**2) Why?" Only you haven't removed that, as when people use a list subclass instead it comes back -- in a much more devastating fashion. Then they have to rewrite their code. > * People can use whatever builtin functions they want with any built-in > types they want in any obvious to them way and nobody would say them > that it will be too slow. A little slower, maybe, but not too much. No they can't. You're conflating sum() with *all* builtins. > * Programs become faster Some. But not many. > * Code becomes simpler and easier to read Hardly. If "my_type(chain.from_iterable(iterable))" is the hardest part of code for you to read, you're doing it wrong. >>> Maybe, then we (or >>> their authors) will deal with them later (and I tried to show you >>> the options for your single linked list example). >> >> And you only came up with options that destroy vital features of >> the type, making it useless for most people who would want to use it. > > I did not. Or are you saying that STL developers also destroyed vital > features of your the because they have not implemented std::list the > same way? Have python destroyed those vital features with it's list? > Your type is not summable. You asked for the type that is summable. > I suggested one to you, you did not liked it. That's it, nothing is > destroyed. I think you need to re-read those posts. >> But, more importantly, it is very simple to design and implement >> new collection types in Python today. Adding a new requirement that's >> hard enough to reach--one that you haven't been able to pull it off >> for the first example you were given--implies that it would no longer >> be easy. > > What requirements are you talking about? Surely he means that extra flab you have to write to make it fast-summable. Otherwise other people's code that uses sum() because you said it was the "one true way" suddenly becomes painfully slow. Or they could have written the current method and it would've worked immediately. >> Then you misunderstood it. Tuples were offered as one example, out >> of many, that won't be fast. Solving that one example by adding a >> special case in the C code doesn't help the general problem > > That's why I explained how this can also be done without writing > a special case in sum(). It's just if you have a goal and two ways > to reach it, one of them is extremely complicated, and another one > is easy, other than that they're equal, which one will you choose? No matter which way you write it, it doesn't solve the problem. > In that case to use sum you won't need __add__ in your list at all. > >>> Then, what way you suggest to be preferred? For example how would you >>> do that in py4k? I guess you would prefer sum() to reject lists and >>> tuples as it now rejects strings and use some other function for them >>> instead? Or what? What is YOUR preferred way? >> >> I've already answered this, but I'll repeat it. >> >> sum is not the obvious way to concatenate sequences today, and my >> preferred way is for that to stay true. So, I'm: [...] > > You're saying what IS NOT your preferred way. But I'm asking what IS > your preferred way. Do you prefer to have a slow sum in python and > people asking why it's slow forever? Do you see that as a best > possible case for python? YES. Godamnit YES. 100% true-to-form YES. Now stop asking us. >>> Really, I don't understand that point. Are you saying, that sum() >>> must remain slow for FREQUENTLY USED standard types just because >>> there MAY BE some other types for which it would still be slow? >> >> You're twisting the emphasis drastically, but basically yes. > > And that's the only reason? What if I solve that too? E.g. what if > there would be a common way exposed to all the types in a world? > For example imaging a special case (or is it "general case" now) ... > ld be able to implement an optional > __init_concatenable_sequence_from_iterable__ method to benefit from > optimized sum(). If they won't do that sum() would use general code. > And of course it would be trivial to implement it for e.g. tuples. > Is that what you want? So your solution to make sum faster is to make everything else harder? >> Today, sum is not the best way to concatenate sequences. Making it >> work better for some sequences but not others would mean it's still >> not the best way to concatenate sequences, but it would _appear_ to >> be. That's the very definition of an attractive nuisance. > > Today python is not the best language in the world, because it still > has bugs. Fixing some bugs but not others would mean that it's still > not the best language in the world, but it would _appear_ to be. So, > let's never fix any bugs? > > Don't you think that your logic is flawed? ;) Does that actually seem like a good counterargument? It's a massively flawed analogy. Bugs are not the same as something being not-the-standard. A good analogy would be if Python was bad for . If Python changed itself so it was good at for one thing, that would be an attractive nuisance. Because as soon as you try to use Python, you find it isn't good for all things you need it for any you just implemented a useless program. If code uses sum() it needs to work for *all types* that the code receives. Python is duck-typed - so if you only want an indexable sequence then it needs to be fast for indexable sequences. There is no way to solve this without making people implement loads more code. When they could just use list(chain.from_iterable(...)) instead, sum looks like a really dumb idea. From dth4h95 at gmail.com Fri Jul 12 04:01:57 2013 From: dth4h95 at gmail.com (Daniel Rode) Date: Thu, 11 Jul 2013 20:01:57 -0600 Subject: [Python-ideas] Python Convert Message-ID: Since Python3, the python creators removed a lot of encodings from the str.encode() method. They did it because they weren't sure how to implement the feature in Python3. They wanted it to be better. I have an idea, add a built in method called "convert". Usage example: convert(data, current_state, desired_state) convert(data, from, to) Real world examples: dataBytes = b"hello" dataUTF8_Str = "?ahhhh hi all ?" convert(dataUTF8_Str, encodings.UTF8, encodings.BYTES) Returns: b'\xc6\x93ahhhh hi all \xcc\xae' convert(dataBytes, encodings.BYTES, encodings.HEX) Returns: b'c693616868686820686920616c6c20ccae' convert(dataUTF8_Str, encodings.UTF8, encodings.ASCII) Returns: TypeError: can't convert utf8 character "\u0193" to ascii Some other encodings: BASE64 UTF16 UTF32 BINARY Maybe even INT? Feel free to add suggestions! -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 04:28:11 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 03:28:11 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> Message-ID: On 12 July 2013 03:07, Joshua Landau wrote: > Shall I list some stdlib types; you can tell me what it's fast for: > > # do this later I knew one-day I was going to forget to go back and finish a post. OK, here is a list of all types in the stdlib with __add__ or __iadd__: bool bytearray bytes codecs.CodecInfo collections.Counter collections.UserList collections.UserString collections.abc.MutableSequence collections.abc.MutableSequence collections.deque collections.deque complex float functools.CacheInfo functools._HashedSeq inspect.ArgInfo inspect.ArgSpec inspect.Arguments inspect.Attribute inspect.ClosureVars inspect.FullArgSpec inspect.ModuleInfo inspect.Traceback inspect._ParameterKind int list os.terminal_size os.terminal_size posix.sched_param posix.sched_param posix.stat_result posix.stat_result posix.statvfs_result posix.statvfs_result posix.times_result posix.times_result posix.uname_result posix.uname_result posix.waitid_result posix.waitid_result signal.struct_siginfo str str tokenize.TokenInfo tuple weakcallableproxy weakcallableproxy weakproxy weakproxy You tell me whether you have 100% coverage of the stdlib things that "sum" is plausible for. From subbarker at gmail.com Fri Jul 12 04:51:58 2013 From: subbarker at gmail.com (Corey Sarsfield) Date: Thu, 11 Jul 2013 21:51:58 -0500 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> Message-ID: Actually, what I wanted was to be able to reference the variable being assigned to On Thu, Jul 11, 2013 at 7:07 PM, David Mertz wrote: > On Thu, Jul 11, 2013 at 4:57 PM, Joao S. O. Bueno wrote: > >> I don't know if what I miss most is the samething the OP is asking for - >> but the idea of re-using a value retrieved in an expression in the >> same expression - without the need >> to assign to a temporary variable. >> >> Like in: >> value = expensive_function(b) if expensive_function(b) else >> default_value >> >> (of course this is a trivial example - but nonetheless it would require an >> extra "if" statement to avoid the double call) >> > > How about: > > value = expensive_function(b) or default_value > > One call, exact same behavior as you request. Available since Python 1.0. > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Corey Sarsfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Jul 12 05:02:38 2013 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 12 Jul 2013 13:02:38 +1000 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> Message-ID: On Fri, Jul 12, 2013 at 12:51 PM, Corey Sarsfield wrote: > Actually, what I wanted was to be able to reference the variable being > assigned to I think I understand what you're trying to say, but it's not something that really exists in Python. With a classic assignment statement: foo = (expression) the previous value of foo is abandoned and a new value bound to that name. What you want is to reference that previous value (presumably with the same potential for NameError or UnboundLocalError if there is none). Effectively, elevate the magic of "foo = 5; foo += 10" to full feature. There HAVE been times when I've wanted something like this, but they're extremely rare. Also, the nature of Python makes the benefit somewhat less than it might be in, say, C++; it's not going to be possible to do the lookups once and keep track of the memory location, because Python simply doesn't work that way. Whether __setitem__ can take advantage of a previous __getitem__ is entirely up to the class. ChrisA From jsbueno at python.org.br Fri Jul 12 05:02:43 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 12 Jul 2013 00:02:43 -0300 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> Message-ID: On 11 July 2013 21:07, David Mertz wrote: > On Thu, Jul 11, 2013 at 4:57 PM, Joao S. O. Bueno > wrote: > > ... >> Like in: >> value = expensive_function(b) if expensive_function(b) else >> default_value >> >> (of course this is a trivial example - but nonetheless it would require an >> extra "if" statement to avoid the double call) > > > How about: > > value = expensive_function(b) or default_value > > One call, exact same behavior as you request. Available since Python 1.0. Because, as a I said, that was a naive example - if the predicate was: value = expensive_function(b) if expensive_function(b) > threshold else default_value There is no way "or" would work - Anyway - just forget about it -let's see where this e-mail thread leads (I worked around this particular issue with a couple of helper simple functions to push the value to a list and return it back ) From ron3200 at gmail.com Fri Jul 12 05:43:05 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Jul 2013 22:43:05 -0500 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51DF4C93.3090206@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> <51DE046C.4070301@pearwood.info> <51DF4C93.3090206@pearwood.info> Message-ID: On 07/11/2013 07:23 PM, Steven D'Aprano wrote: > On 11/07/13 23:20, Ron Adam wrote: >> How do you feel about adding the ability of sum to sum vectors or lists >> of values to each other? >> >> sum([[x1, y1], [x2, y2], ...]) ---> [x1+x2, y1+y2] > > What I think of that is that it is not backward compatible and would > completely break any code relying on sum's current behaviour. That's what I think too. ;-) Although it's possible to extend an API in a backwords copatible ways. Adding a new keyword would work. (Not suggesting that.) > We're in the middle of a long discussion arguing that a *tiny*, *subtle* > shift in behaviour of sum is sufficient to disqualify the change, and > you're suggesting something that completely changes the semantics from list > concatenation to element-by-element addition. No offence Ron, but have you > been reading the rest of the thread? No offence taken. And as you point out, we can't easily add element-by-element addition as long as it also does list concatenation. > If anyone wants element-by-element addition, then can either define a class > that does so using the + operator (say, numpy arrays) or they define their > own function. I'd rather just use a function and not use the '+'. > Or try to revive PEP 225: > > http://www.python.org/dev/peps/pep-0225/ Interesting. I can see why it didn't get in. PEP 225 tries to do too much. I Was looking in the operator module and notice there are __concat__() and concat() methods. I don't think I've seen them anywhere else. Maybe we could just add a few new functions to the operator module? Each specialised to only numbers, strings, immutable, and mutable types. That might open the door for them being used in a wider scope while not changing too much in the near future. I sort of wish we could add alias's to oppertor methods. (In a nice way.) @__add__ def __add_number__(self, other): ... @__add__ def __add_list__(self, other): ... @__add__ def __concat_strings__(self, other): ... Multi-methods for operators(?) Where __add__ and '+' would work with all of these, but __add_lists__ would work only on lists. It's pretty much how it works now, but sense they are all spelled the same, it can be confusing when you see these used from a subclass. Cheers, Ron From sergemp at mail.ru Fri Jul 12 05:45:51 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 12 Jul 2013 06:45:51 +0300 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <51DC36C2.8000509@pearwood.info> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> Message-ID: <20130712064551.15816dd0@sergey> On Jul 10 2013, Steven D'Aprano wrote: >>> No, please. Using sum() on lists is not more than a hack that >>> seems to be a cool idea but isn't. Seriously - what's the sum of >>> lists? Intuitively, it makes no sense at all to say sum(lists). >> >> It's the same as it is now. What else can you think about when you >> see: [1, 2, 3] + [4, 5] ? > > Some of us think that using + for concatenation is an abuse of > terminology, or at least an unfortunate choice of operator, and are > wary of anything that encourages that terminology. > > Nevertheless, you are right, in Python 3 both + and sum of lists > is well-defined. At the moment sum is defined in terms of __add__. > You want to change it to be defined in terms of __iadd__. That is a > semantic change that needs to be considered carefully, it is not just > an optimization. Yes, I understand that. On the other hand is this documented anywhere? Does anything says that sum() actually uses __add__, not __iadd__ or something else... I'm not trying to say that we can change that freely. I'm just trying to find out how tough could be such a change. > I have been very careful to say that I am only a little bit > against this idea, -0 not -1. I am uncomfortable about changing the > semantics to use __iadd__ instead of __add__, because I expect that > this will change the behaviour of sum() for non-builtins. > I worry about increased complexity making maintenance harder for > no good reason. It's the "for no good reason" that concerns me: > you could answer some of my objections if you showed: > > - real code written by people who sum() large (more than 100) > numbers of lists; That would be hard. You're asking to find someone using a bad function in exactly the case where it's bad. :) Even if nobody uses a bad function that does not mean it should stay bad. > - real code with comments like "this is a work-around for sum() > being slow on lists"; Even I probably wouldn't do that, I would just silently use another function. > - bug reports or other public complaints by people (other than > you) complaining that sum(lists) is slow; This is easier. sum() is constantly suggested as an option to list additions in those questions I could find, and often someone comes later and says "be careful, it may be slow". So this is kind of common attempt. Examples [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] > or similar. That would prove that people do call sum on lists. I can't say they do, but they definitely try. > Earlier in this discussion, you posted benchmarks for the patched > sum using Python 2.7. Would you be willing to do it again for 3.3? Well, I posted them for Python 2.7 because I expected that patch would not introduce any changes, so it could be applied for 2.7 too. The patch itself works for both 2.7 and 3.3. Python 3.3.2 + fastsum-special-tuplesandlists.patch [11] Before patch: list compr: 14.5 chain: 10.5 chain.from_iterable: 7.44 aug.assignment: 5.8 regular extend: 9.34 optimized extend: 5.59 sum: infinite After patch: list compr: 14.5 chain: 10.5 chain.from_iterable: 7.44 aug.assignment: 5.79 regular extend: 9.47 optimized extend: 5.53 sum: 2.58 I used the same tests as you [12] except a minor bug fixed: you did "l = []" only at start, while I do it for every iteration. I hope this patch would not introduce any changes. :) > And confirm that the Python test suite continues to pass? Yes, they do. But they don't test many cases. And they definitely test nothing like: http://bugs.python.org/issue18305#msg192919 -- [1] http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python [2] http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python [3] http://stackoverflow.com/questions/3021641/concatenation-of-many-lists-in-python [4] http://stackoverflow.com/questions/7895449/merging-a-list-of-lists [5] http://stackoverflow.com/questions/10424219/combining-lists-into-one [6] http://stackoverflow.com/questions/11331908/how-to-use-reduce-with-list-of-lists [7] http://stackoverflow.com/questions/17142101/concatenating-sublists-python [8] http://stackoverflow.com/questions/716477/join-list-of-lists-in-python 2009-04-04 CMS x = [["a","b"], ["c"]] result = sum(x, []) 2010-09-04 habnabit O(n^2) complexity yaaaaaay. [9] http://article.gmane.org/gmane.comp.python.general/658537 The mildly surprising part of sum() is that is does add vs. add-in- place, which leads to O(N) vs. O(1) for the inner loop calls, for certain data structures, notably lists, even though none of the intermediate results get used by the caller. For lists, you could make a more efficient variant of sum() that clones the start value and does add-in-place. [10] http://article.gmane.org/gmane.comp.python.general/441831 > A fast implementation would probably allocate the output list just > once and then stream the values into place with a simple index. That's what I hoped "sum" would do, but instead it barfs with a type error. [11] http://bugs.python.org/file30897/fastsum-special-tuplesandlists.patch [12] Two test numbers are before and after patch: $ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \ "[i for l in x for i in l]" 100 loops, best of 3: 14.5 msec per loop 100 loops, best of 3: 14.5 msec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \ --setup="from itertools import chain" \ "list(chain(*x))" 100 loops, best of 3: 10.5 msec per loop 100 loops, best of 3: 10.5 msec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \ --setup="from itertools import chain" \ "list(chain.from_iterable(x))" 100 loops, best of 3: 7.44 msec per loop 100 loops, best of 3: 7.44 msec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \ "l = []" \ "for i in x: l += i" 100 loops, best of 3: 5.8 msec per loop 100 loops, best of 3: 5.79 msec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \ "l = []" \ "for i in x: l.extend(i)" 100 loops, best of 3: 9.34 msec per loop 100 loops, best of 3: 9.47 msec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \ "l = []" \ "extend=l.extend" \ "for i in x: extend(i)" 100 loops, best of 3: 5.59 msec per loop 100 loops, best of 3: 5.53 msec per loop $ ./python -mtimeit --setup="x=[[1,2,3]]*100000" \ "sum(x,[])" infinite 100 loops, best of 3: 2.58 msec per loop From stephen at xemacs.org Fri Jul 12 05:52:12 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Jul 2013 12:52:12 +0900 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130712043419.1f5c59e5@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> Message-ID: <87ip0gtoyb.fsf@uwakimon.sk.tsukuba.ac.jp> Sergey writes: > Which effectively means do nothing and agree that slow sum is the best > for python. Sum is not slow for numbers, which many posters believe is the only plausible usage, despite the accident that it can be used for other types. > Despite we CAN make it faster. Faster is not necessarily better. Several posters have claimed that "sum(list_of_lists)" is unreadable (for them). While Python is a "consenting adults" language (so "if you don't like it, don't use it" is a plausible argument), Python aspires to "universal" readability. So plausibly "sum(list_of_lists)" is a bad thing in itself. If there's another "obvious way to do it" (here, itertools.chain which is both efficient computationally and lazy), then making sum() better only encourages use. There are a couple of similar optimizations that Guido himself admits he regrets. Consider "sum(sum(list_of_lists_of_numbers))". For me, that's a WTF. I'd much rather "sum(chain(list_of_lists_of_numbers))". I don't ask you to agree, let alone expect you to do so. I would hope you can offer the same courtesy for those who disagree with you. Here I am only trying to explain why they might disagree with you, and why it may be difficult to change their minds. > For all cases, or for most of them, but we could at least start > discussing options, instead of repeating excuses. "Do nothing" is an option, and it's the one preferred by most posters at this point. If you don't like that, write a PEP, present it here, and then on python-dev, and get it approved. As has been pointed out, a PEP is needed, and the candidates for PEP czar probably are no longer listening, and none of them have argued against, so you do have a chance to convince that way. But as also has been pointed out, you haven't presented a new argument for several days, and your old ones haven't convinced. You need to either try something new, or get a new audience (by writing the PEP). Nobody else is going to do it for you. BTW, "make the TOOWTDI for sequences a built-in" is a new option recently proposed and now being discussed. (Ie, the subthread discussing that for "itertools.chain.from_iterable".) I kinda like "ichain" for the name, and I like that proposal. Regards, From abarnert at yahoo.com Fri Jul 12 05:54:27 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jul 2013 20:54:27 -0700 Subject: [Python-ideas] Python Convert In-Reply-To: References: Message-ID: <9BE2F04C-4148-43A0-BBF6-F63C29503C49@yahoo.com> On Jul 11, 2013, at 19:01, Daniel Rode wrote: > Since Python3, the python creators removed a lot of encodings from the str.encode() method. They did it because they weren't sure how to implement the feature in Python3. They wanted it to be better. Only a few encodings were actually removed, the ones that encoded, in 2.x terms, str to str, which means they'd have to encode either bytes to bytes or bytes to str in 3.x. For some of them it's not clear which type the result should be. But more generally, calling "encode" on an encoded bytes string instead of a str looks wrong. In most cases, there are other ways to do it--base64.encode(), binascii.hexlify(), etc. It would be nice if there was a convenient and consistent way to do all of them instead of having to hunt around the stdlib, and yours is a good attempt, but I don't think it works. > > I have an idea, add a built in method called "convert". > Usage example: > > convert(data, current_state, desired_state) > convert(data, from, to) > > > Real world examples: > > dataBytes = b"hello" > dataUTF8_Str = "?ahhhh hi all ?" This is a misleading name, because it's not UTF-8, it's just a str, which doesn't have an encoding. (Under the covers, of course, it's actually stored as ASCII, UCS2, or UTF-32...) > convert(dataUTF8_Str, encodings.UTF8, encodings.BYTES) > Returns: b'\xc6\x93ahhhh hi all \xcc\xae' That's misleading as well. It's not converting from UTF-8 to bytes, it's converting from str to bytes, encoding _to_, not _from_ UTF-8 to do so. Plus, we already have a way to write this: dataUTF8_Str.encode('UTF-8'). And it's a little weird to pick one of the encodings that wasn't changed from 2.x to 3.x as your first example of restoring the encodings that were lost. > convert(dataBytes, encodings.BYTES, encodings.HEX) > Returns: b'c693616868686820686920616c6c20ccae' Why would converting to hex give you a bytes object instead of a str? More to the point, if you _wanted_ a str, how would you get it? Also, why do you even have to specify BYTES here? The function can already tell that it's a bytes, so there's no extra information there. Really, it seems like any time BYTES is useful, it will also be insufficient. How can you convert a str to BYTES without also specifying an encoding? (Hopefully not by using sys.getdefaultencoding(), because that would just bring back the same sloppy bugs we had in Python 2.) Anyway, the only situation I can imagine where it's useful to provide both arguments is when you want to decode and immediately re-encode. In every other case, you're either decoding (in which case "to" is useless) or encoding (in which case "from is useless). > convert(dataUTF8_Str, encodings.UTF8, encodings.ASCII) > Returns: TypeError: can't convert utf8 character "\u0193" to ascii There is no 'utf8 character "\u0193"'. UTF-8 doesn't have characters, it has bytes, because it's an encoding. If you encode the character '\u0193' as UTF-8, you get b'\xc6\x93'. If you actually gave this a UTF-8 included bytes instead of a str, this would be an example of a call that makes use of both parameters. But it would be exactly the same as s.decode('UTF-8').encode('ascii'). Why do we need another way to write that? > Some other encodings: > BASE64 > UTF16 > UTF32 > BINARY What is "BINARY"? What happens when you convert that to another encoding, or vice-versa? How is it different from BYTES? > Maybe even INT? What does that do if I use it? > > Feel free to add suggestions! > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jul 12 06:03:24 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jul 2013 21:03:24 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130712043419.1f5c59e5@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> Message-ID: On Jul 11, 2013, at 18:34, Sergey wrote: >> And again, what would be the benefit? > > Hm. Benefits of O(N)-summable builtin types? > * No more surprises "Oh, sum() is O(N**2) Why?" No, it would be _more_ surprises. "Oh, sum() is O(N**2). Why? The docs say it's fast, it actually is fast for tuples, my class is just like a tuple." This is exactly why people keep bringing up other types. I don't know what you're not getting here. If you want to argue that it's not a bad enough attractive nuisance to worry about, that might be a reasonable argument. But you've never made that argument; instead you just deny that the problem exists at all, going back and forth between claiming that we can make sum fast for all types and arguing that every type anyone brings up in an objection should be changed. From sergemp at mail.ru Fri Jul 12 06:05:41 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 12 Jul 2013 07:05:41 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <51DD9E98.5030203@pearwood.info> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB728B.2040709@pearwood.info> <20130710191007.5f525fb3@sergey> <51DD9E98.5030203@pearwood.info> Message-ID: <20130712070541.7aad943c@sergey> On 11 Jul 2013 Steven D'Aprano wrote: >>> The fact that sum(lists) has had quadratic performance since sum >>> was first introduced in Python 2.3, and I've *never* seen anyone >>> complain about it being slow, suggests very strongly that this is not >>> a use-case that matters. >> Never seen? Are you sure? ;) >>> http://article.gmane.org/gmane.comp.python.general/658630 >>> From: Steven D'Aprano @ 2010-03-29 >>> In practical terms, does anyone actually ever use sum on more than a >>> handful of lists? I don't believe this is more than a hypothetical >>> problem. > > Yes, and I stand by what I wrote back then. Hey, you saw at least two of us complaining: me and that guy! :) > Of course, I understand that this is not the actual C code your > patch contains. But I can see at least three problems with the above > Python version, and I assume your C version will have the same flaws. Thank you for reviewing the code. Even for just reviewing the "explanation". > 1) You test start using isinstance(start, list), but it should be > "type(start) is list". If start is a subclass of list that overrides > __add__ (or __iadd__), you should call the overridden methods. But > your code does not, it calls list.extend instead. (Same applies for > tuples.) > 2) You assume that summing a sequence must return the type of the > start argument. But that is not correct. See example below. The real C code was doing "CheckExact" for `start`, but subclass "Check" for `item`. Because that's what __add__ for list and tuple checked. > Here is an example of a multi-type sum: > py> class A(list): > ... def __add__(self, other): > ... return type(self)(super().__add__(other)) > ... def __radd__(self, other): > ... return type(self)(other) + self > ... > py> result = sum([[1], [2], A([3]), [4]], []) > py> type(result) > And that was a good example. I had to dig deep in sources to understand why: >>> type( [1] + A([2]) ) but: >>> type( [1].__add__(A([2])) ) So I updated the patch [1], now it does exact type check and falls back to general code otherwise. Now it's a little less general, but more safe. > 3) This can use twice as much memory as the current > implementation. You build a temporary list containing the result, > then you make a copy using the original type. If the result is very > large, you might run out of memory trying to make the copy. Basically, that's what str.join() is doing. You can try: ''.join('' for _ in xrange(100000000)) and watch you free memory being eaten. :) So if this is good for join I assumed that a smarter version of it for sum() should also be fine. > By the way, Sergey, I should say that even though I have been hard > on your suggestion, I do thank you for spending the time on this and > value your efforts. Thank you. I appreciate your words. -- [1] http://bugs.python.org/file30897/fastsum-special-tuplesandlists.patch From dth4h95 at gmail.com Fri Jul 12 06:10:53 2013 From: dth4h95 at gmail.com (Daniel Rode) Date: Thu, 11 Jul 2013 22:10:53 -0600 Subject: [Python-ideas] Fwd: Python Convert In-Reply-To: References: <9BE2F04C-4148-43A0-BBF6-F63C29503C49@yahoo.com> Message-ID: Wow, that's a lot to take in. I don't think things through very well sometimes. I am not trying to "bring back" Python2, I am trying to see if we can fix the problem that was always there that got worse in Python3. I guess the bottom line for me is what you said: > In most cases, there are other ways to do it--base64.encode(), > binascii.hexlify(), etc. *It would be nice if there was a convenient and > consistent way to do all of them instead of having to hunt around the stdlib > *... > What does that do if I use it? > It could be used to convert a string to an integer (if applicable). So have you thought of a solution for this problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Fri Jul 12 06:21:23 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Fri, 12 Jul 2013 12:21:23 +0800 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> Message-ID: If you look up "Lenses" in the context of Haskell/Scala, you'll see they're something similar. It's not exactly the same, but the basic idea is to reify a gettable/settable variable slot into a first class value. You can then pass around the object to pass around the get/set-ability, or create higher-order functions like transform that operate on the reified "slot": // pseudo-scala to increment something; a macro like this doesn't exist // but it wouldn't be too hard to write lens(var).transform(_ + 1) // increment a local lens(var.member).transform(_ + 1) // increment a member lens(var(0)(1)).transform(_ + 1) // increment an array; array accesses in scala use round braces // could use operator overloading/implicits to make it more concise var ** (_ + 1) // increment a local Generally these lenses are used for updating nested immutable data structures (it gets really verbose/unDRY normally) but it's something similar to what you want, so if you're looking for this sort of thing, that would be the place to start. None of this is going to end up in python any time soon, because it's all still pretty researchy and a solid implementation is still more "PhD" rather than "weekend hack". -------------------------------------------------------------------------------------------------------------------------- Another possible solution for the value = expensive(b) if expensive(b) else default problem, if you don't want a statement to assign to a temporary variable, is to use a `let` expression // raw version value = (lambda res: res if res else default)(expensive(b)) // with a helper function `let` value = let(expensive(b))(lambda res: res if res else default) I'd get skewered at code review if i ever wrote this manually, but I've actually used this quite a lot in auto-generated code (i.e. macros) where I want to "assign" to a local value but am unable to create a statement. In fact, if you want to you can forego assignments entirely in your program and use nested `let` bindings for everything since they're basically equivalent. Then you can start putting the braces outside the `let(...)` as in `(let ...)` because you'll be basically be writing Lisp. -Haoyi On Fri, Jul 12, 2013 at 11:02 AM, Joao S. O. Bueno wrote: > On 11 July 2013 21:07, David Mertz wrote: > > On Thu, Jul 11, 2013 at 4:57 PM, Joao S. O. Bueno > > > wrote: > > > ... > >> Like in: > >> value = expensive_function(b) if expensive_function(b) else > >> default_value > >> > >> (of course this is a trivial example - but nonetheless it would require > an > >> extra "if" statement to avoid the double call) > > > > > > How about: > > > > value = expensive_function(b) or default_value > > > > One call, exact same behavior as you request. Available since Python > 1.0. > > Because, as a I said, that was a naive example - if the predicate was: > value = expensive_function(b) if expensive_function(b) > threshold > else default_value > > There is no way "or" would work - > Anyway - just forget about it -let's see where this e-mail thread leads > > (I worked around this particular issue with a couple of helper simple > functions > to push the value to a list and return it back ) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Jul 12 06:30:08 2013 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 12 Jul 2013 14:30:08 +1000 Subject: [Python-ideas] Fwd: Python Convert In-Reply-To: References: <9BE2F04C-4148-43A0-BBF6-F63C29503C49@yahoo.com> Message-ID: On Fri, Jul 12, 2013 at 2:10 PM, Daniel Rode wrote: > It could be used to convert a string to an integer (if applicable). How do you convert the string "1234" to integer? Is it: * 1234 (decimal digits)? * 825373492 (big-endian 32-bit, treating the characters as ASCII)? * 875770417 (little-endian, ditto)? * Something else? The first one is spelled int(s), the second and third can probably best be done with ctypes or struct, but there's certainly no single obvious way to "encode" one as the other. ChrisA From abarnert at yahoo.com Fri Jul 12 06:22:00 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jul 2013 21:22:00 -0700 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130712043419.1f5c59e5@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> Message-ID: On Jul 11, 2013, at 18:34, Sergey wrote: >> >> sum is not the obvious way to concatenate sequences today, and my >> preferred way is for that to stay true. So, I'm: [...] > > You're saying what IS NOT your preferred way. But I'm asking what IS > your preferred way. You just quoted it. My preferred way to handle this is what we already do: don't encourage people to misuse sum for concatenating sequences, encourage them to use something appropriate. That could mean making the wording in the docs even stronger, or even explicitly rejecting sequences in sum, but I don't think the problem is anywhere near serious enough to do either of those. That could also mean making the obvious way more obvious. In other words: move itertools.chain.from_iterable to builtins under a new name. > Do you prefer to have a slow sum in python and > people asking why it's slow forever? Do you see that as a best > possible case for python? Yes. Given that it is impossible to make sum fast for all collection types, and that it's the wrong function for concatenating collections in the first place, and that, even if it were fast, it would still be inferior to chain.from_iterable in the exact same way that 2.x map is inferior to 3.x map, I do. Also, It's less surprising this way, not more. Today, people only have to learn one thing: Don't use sum on collections. That's much easier than having to learn a complex mishmash like: Don't use sum on immutable collections, except for tuple, and also don't use it on some mutable collections, but it's hard to characterize exactly which, and also don't use it on things that are iterable but that you don't want to treat as sequences, and... One of the reasons I hate PHP is that all of its rules work that way--everything does what you expect in 60% of the cases, and does something baffling in the other 40%. Finally, I've ignored your requests for examples because in every case you've already been given examples and haven't dealt with any of them. If I want to concatenate cons lists, chain does it in linear time, your design does not; instead of answering that, you just keep arguing that you can sum a different kind of linked list in linear time. That doesn't even approach answering the objection. So what's the point of offering another one that you're just going to treat in the same way? Especially since I and others have already given you other examples and you just ignored them? From mertz at gnosis.cx Fri Jul 12 07:05:37 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 11 Jul 2013 22:05:37 -0700 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> Message-ID: On Jul 11, 2013 8:02 PM, "Joao S. O. Bueno" wrote: > > > > value = expensive_function(b) or default_value > > > > One call, exact same behavior as you request. Available since Python 1.0. > > Because, as a I said, that was a naive example - if the predicate was: > value = expensive_function(b) if expensive_function(b) > threshold > else default_value > > There is no way "or" would work - > Anyway - just forget about it -let's see where this e-mail thread leads value = ([v for v in [expensive_function(b)] if v > threshold] or [default_value])[0] I admit it's not the most elegant, but it works generically. Or just use a temp variable. No shame in doing so. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Fri Jul 12 07:16:16 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 12 Jul 2013 08:16:16 +0300 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <3ACB64FF-7A58-4AD2-AD9F-3BCB0AE5707C@yahoo.com> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> <3ACB64FF-7A58-4AD2-AD9F-3BCB0AE5707C@yahoo.com> Message-ID: <20130712081616.174a3229@sergey> On Jul 10, 2013 Andrew Barnert wrote: > But you haven't found a workable solution for all of the other > types people have brought up--or even for a single one of them. So > that's a serious misrepresentation. Strings, tuples (2 options), lists (3 options) and your cons-lists (2 options). What other types have I missed? Most of those has nothing to do with my suggestion, it's just that you (or not just you) asked how one can optimize XXX for O(N) sum, so I found some ways. >> It's just instead of discussing what is the best way to fix a slowness, >> I'm spending most time trying to convince people that slowness should >> be fixed. >> ? sum is slow for lists, let's fix that! >> ? you shouldn't use sum... >> ? why can't I use sum? >> ? because it's slow > > But that's not why you shouldn't use sum. No, That's the MAIN reason! There's one more, about "+" (and sum) being confusing for concatenation, and I can understand that, however it's very weak as long "+" is used for joining sequences. But the main reason that most people, including you, are pushing is that sum() is slow. The whole our "you can't make it fast for" thread is the evidence. Even now: > Besides, being fast for list and tuple but slow for other > collection types would be an attractive nuisance. You're pushing "being slow" as a key argument. If due to some technical details sum() was fast from the very beginning then nobody would say that you shouldn't use it. Same as nobody is saying that you should use reduce() instead of sum() for numbers because sum() works only for __add__ while reduce works for every operator possible. People are free to use sum() to __add__ numbers, and use reduce() e.g. to __xor__ them. Nobody says "you shouldn't use sum() for numbers because there're operators that don't work with it". Nobody says that sum() is "attractive nuisance" compared to reduce(). For the same reason IF sum would be fast for list, but slow for set, people would use sum() where it's good and won't use it where it's bad. There's nothing unusual or wrong here. min() is good for list of tuples, right? max() is also good for list of tuples? Even any() is good for list of tuples (weird, but good). Then why sum() is bad for list of tuples? Because it's SLOW! So, yes, sum() being slow is the main reason. Everything else is just a "bonus", additional arguments that are supposed to convince your opponent even more. But speed is the key. That's why I'm saying so often: if this is the main reason then let's fix it. > Your only response to that has been to claim that it can be fast > for every possible collection type, but it can't. You haven't > gotten it to work. And there are good reasons to believe it's not > just hard, but impossible. Sorry, but can you please quote what response are you referencing? >> I haven't thought that somebody can truly believe that something should >> be slow, and will find one excuse after another instead of just fixing >> the slowness. > > Calling recv(1) over and over on a socket is slow. We could fix > that by adding an implicit buffer to all socket objects. Can you > believe that someone might object to that patch? Not sure what you mean, but sockets do have an internal buffer, its default size is configurable with `sysctl`. PS: maybe for 10 years people got used to the slow sum() and related arguments so much that they don't want to lose them? ;) ?? From steve at pearwood.info Fri Jul 12 07:16:57 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Jul 2013 15:16:57 +1000 Subject: [Python-ideas] Python Convert In-Reply-To: References: Message-ID: <51DF9149.600@pearwood.info> On 12/07/13 12:01, Daniel Rode wrote: > Since Python3, the python creators removed a lot of encodings from the > str.encode() method. They did it because they weren't sure how to implement > the feature in Python3. They wanted it to be better. That's wrong. They didn't remove them, they are just inaccessible from the string API. And they didn't do it because they weren't sure how to implement the feature, but because the feature was broken. Strings had both an encode and decode method, and people kept using the wrong one and getting weird results. Python 3 has the right API: you *encode* strings to bytes, and only bytes, and you *decode* bytes to strings, and only strings. However, the codec machinery is a lot more general than just str <-> bytes. Codecs can transform from bytes to bytes, or from strings to strings, or to other things, and you can still do so using the codecs module: py> codecs.encode(b"Hello World", "hex_codec") b'48656c6c6f20576f726c64' py> codecs.encode("Hello World", "rot_13") 'Uryyb Jbeyq' although the interface is a bit clunky. There's no way of telling ahead of time whether a codec expects bytes or strings. See also this open bug report: http://bugs.python.org/issue7475 and this one, pointing out that there's no easy way to know what codecs are available: http://bugs.python.org/issue17878 So there's a fair bit of improvement needed in the codec machinery. -- Steven From steve at pearwood.info Fri Jul 12 07:25:41 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Jul 2013 15:25:41 +1000 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: Message-ID: <51DF9355.7080604@pearwood.info> On 12/07/13 07:07, Corey Sarsfield wrote: > I came up with the idea after having some code on dicts that looked like: > > a[b][c] = foo(a[b][c]) > > So in this case there are twice as many look-ups going on as there need to > be, even if a[b][c] were to be pulled out into x. How do you reason that? You need two sets of lookups however you do it: first you call __getitem__ 'b' and 'c', then you call __setitem__. In the general case, you can't cache the reference 1) because that's not how references work in Python, and 2) even if it were, any of the __getitem__ or __setitem__ calls may have side-effects. > If I were to do: > > a[b][c] += 1 > > Would it be doing the lookups twice behind the scenes? Easy enough to find out with a helper class: class Test(list): def __getitem__(self, attr): print("Looking up item %s" % attr) return super(Test, self).__getitem__(attr) def __setitem__(self, attr, value): print("Setting item %s to %r" % (attr, value)) return super(Test, self).__setitem__(attr, value) instance = Test([1, 2, 3]) instance[1] = Test([4, 5, 6]) instance[1][2] += 100 print(instance) -- Steven From ncoghlan at gmail.com Fri Jul 12 08:01:07 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Jul 2013 16:01:07 +1000 Subject: [Python-ideas] Rehabilating reduce (as "fold") Message-ID: The strange contortions of the "fast sum for lists" discussions got me wondering about whether it was possible to rehabilitate reduce with a less error-prone API. It was banished to functools in 3.0 because it was so frequently used incorrectly, but now its disfavour seems to be causing people to propose ridiculous things. The 2.x reduce is modelled on map and filter: it accepts the combinator as the first argument, and then the iterable, and finally an optional initial value. The most common error was failing to handle the empty iterable case sensibly by leaving out the initial value, so you got a TypeError instead of returning a result. So, what if we instead added a new alternative API based on Haskell's "fold" [1] where the initial value is *mandatory*: def fold(op, start, iterable): ... Efficiently merging a collection of iterables into a list would then just be: data = fold(operator.iadd, [], iterables) I'd personally be in favour of the notion of also allowing strings as the first argument, so you could instead write: data = fold("+=", [], iterables) This could also be introduced as an alternative API in functools. (Independent of this idea, it would actually be nice if the operator module had a dictionary mapping from op symbols to names, like operator.by_symbol["+="] giving operator.iadd) Cheers, Nick. [1] https://en.wikipedia.org/wiki/Fold_%28higher-order_function%29 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertc at robertcollins.net Fri Jul 12 08:24:04 2013 From: robertc at robertcollins.net (Robert Collins) Date: Fri, 12 Jul 2013 18:24:04 +1200 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: Message-ID: On 12 July 2013 18:01, Nick Coghlan wrote: > The strange contortions of the "fast sum for lists" discussions got me > wondering about whether it was possible to rehabilitate reduce with a less > error-prone API. It was banished to functools in 3.0 because it was so > frequently used incorrectly, but now its disfavour seems to be causing > people to propose ridiculous things. > > The 2.x reduce is modelled on map and filter: it accepts the combinator as > the first argument, and then the iterable, and finally an optional initial > value. The most common error was failing to handle the empty iterable case > sensibly by leaving out the initial value, so you got a TypeError instead of > returning a result. > > So, what if we instead added a new alternative API based on Haskell's "fold" > [1] where the initial value is *mandatory*: +1 > def fold(op, start, iterable): > ... It would be nice to spell it fold(op, iterable, start), so that a trivial sed can migrate reduce using code to fold. Or perhaps we could just fix reduce? > I'd personally be in favour of the notion of also allowing strings as the > first argument, so you could instead write: > > data = fold("+=", [], iterables) This seems like an unusual thing in Python, but I can certainly see it's convenience. -Rob -- Robert Collins Distinguished Technologist HP Cloud Services From ronaldoussoren at mac.com Fri Jul 12 08:42:02 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 12 Jul 2013 08:42:02 +0200 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DE1A4B.4070108@pearwood.info> <87txk1u0nq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 Jul, 2013, at 21:20, Terry Reedy wrote: > On 7/11/2013 1:53 AM, Ronald Oussoren wrote: >> >> On 11 Jul, 2013, at 7:27, Stephen J. Turnbull wrote: > >>> It does count; it's a language change. It is not a bug-fix in which >>> the implementation is brought into line with the language definition. >> >> That doesn't mean that using += instead of + in sum isn't a valid change >> to make for 3.4. > > Breaking code in the way this would do, would require a PEP and deprecation cycle. I do not anticipate approval for a general change. I agree. At the time I wrote this I didn't know that in numpy + and += have slightly different semantics w.r.t. coercion of array element types. That means that using += in sum() would change the behavior of calling sum on a sequence of numpy arrays, and given that we try hard to maintain backward compatibility this means that this particular change probably won't happen. More so because the documentation clearly indicates that sum() is intended to be used with numbers and the change is meant to change the behavior for non-numbers (in particular lists). > > A specialized change such that sum(iterable_of_lists, []) would extend rather than replace [] might be done since the result would be equal to the current result, just faster, and since [] must be nearly always passed without aliases that depend on it not changing. Even that should have a deprecation warning. I don't think that a deprecation warning would be needed, as long as start got copied before extending it. > > Tuples could be linearly summed in a list with .extend and then converted at the end. I don't believe that would be a semantic change at all. It would make the implementation of sum more complicated. There is clear historical evidency that sum is not intented to be fast for everything that supports the + operator, calling sum on a sequence of strings raises an exception instead of trying to special case this. > >> BTW. This thread has been rehashing the same arguments over and over again, >> and it's pretty likely that most core devs have stopped following this thread > > Right, I just happened to pick this post because you are also a core dev. I am, but that doesn't mean my opinion caries a lot more weight in this discussion ;-). The higher profile core devs are conspicuously absent from this discussion, which is why I wrote that someone needs to write a summary of the discussion. I'm not that someone though. Ronald From abarnert at yahoo.com Fri Jul 12 08:47:24 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jul 2013 23:47:24 -0700 (PDT) Subject: [Python-ideas] Fwd: Python Convert In-Reply-To: References: <9BE2F04C-4148-43A0-BBF6-F63C29503C49@yahoo.com> Message-ID: <1373611644.93189.YahooMailNeo@web184706.mail.ne1.yahoo.com> From: Daniel Rode Sent: Thursday, July 11, 2013 9:10 PM >I don't think things through very well sometimes. That's exactly what this list is for. Except for trivial cases (which just get filed as bugs and fixed), nobody ever thinks through all the issues of a cool idea he just had. But a bunch of people together often can, and, if we're lucky, can even find solutions to them. >I guess the bottom line for me is what you said: >? >>In most cases, there are other ways to do it--base64.encode(), binascii.hexlify(), etc. It would be nice if there was a convenient and consistent way to do all of them instead of having to hunt around the stdlib... Yeah, I understand the motivation, and would love a good solution. >So have you thought of a solution for this problem?? I was going to say "If I had, I would have already written a proposal"?but then I thought a solution. Or maybe two. --- First, how about this: ? ? b'abc'.a2b('hex') == b'616263' ? ? b'616263'.b2a('hex') == b'abc' It works with four binary transfer encodings: 'quopri'/'qp', 'hex', 'b64'/'base64'/'base_64', and 'uu'. That avoids the confusion with encode/decode, and it implies the implementation pretty nicely, which is basically: ? ? def a2b(self, encoding): ? ? ? ? encoding = binascii.aliases.get(encoding, encoding) ? ? ? ? return getattr(binascii, 'a2b_' + encoding)(self) ? ? def b2a(self, encoding): ? ? ? ? encoding = binascii.aliases.get(encoding, encoding) ? ? ? ? return getattr(binascii, 'b2a_' + encoding)(self) Where binascii.aliases is something like?{'quopri': 'qp', 'base64': 'b64', 'base_64': 'b64'}?copy the exact set of aliases from the 2.7 encodings.aliases.aliases dict. I think these four are all we need, because they're almost all we lost going from 2.x to 3.x. All of the encodings are still there. However, the ones that can't encode str->bytes and decode bytes->str can't be used with the encode and decode functions, and they've had their friendly aliases removed for safety.?So, the real charsets and the text transfer encodings are fine; just the toy rot13, the binary transformations bz2 and gzip, and the binary transfer encodings hex, base64, quopri, and uu can no longer be used with encode/decode. That's it. I don't think anyone cares about rot13, and we can live without bz2 and gzip. So it's really just these four. --- Alternatively, maybe we don't need _any_ language change. The "right" way to do these four encodings today is to use the binascii, base64, quopri, and uu modules, respectively. However, they have different APIs. (And most of the docs still refer to "strings" instead of bytes, which implies just how often people are finding and using these modules.) But there's a perfectly consistent API to base64, quopri, and uu sitting in the binascii module. The only problem is that the documentation says "Normally, you will not use these functions directly but use wrapper modules like uu, base64, or binhex instead." For one thing, there is no wrapper module for hexlify. For another, some of these modules aren't actually wrappers around binascii. And as of 3.3, the "low level" methods are actually _more_ usable than the wrappers, because they can take ASCII-only str arguments.?I suspect the reason for that note is that, years ago, you used encode if you wanted the trivial use case, and dug into the specific modules (which offer things like filesystem-safe base64, encoding a whole file at once, etc.) when you need more. But in 3.x, there is no longer a way to get to the trivial use case. So, just strike that line from the docs, maybe reference binascii in the codecs docs and the 2->3 transition guide, and we're done. Yes, binascii is a bit ugly, with abbreviated names for everything, hexlify sitting alongside the "low-level" methods, and a bunch of helper functions for the long-obsolete binhex module, but so what? Is "binascii.a2b_hex(b'abc')" really any worse than "b'abc'.a2b('hex')"? --- Meanwhile, the left-over bytes-bytes encodings are still theoretically usable in some cases, but is anyone actually using them? Besides str.encode/bytes.decode, you also can't use them with the io module, or any of the higher-level stuff in codecs (which is mostly unnecessary now too, for that matter). So, maybe it's time to deprecate them and eventually remove them? From abarnert at yahoo.com Fri Jul 12 09:43:34 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Jul 2013 00:43:34 -0700 (PDT) Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <20130712081616.174a3229@sergey> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> <3ACB64FF-7A58-4AD2-AD9F-3BCB0AE5707C@yahoo.com> <20130712081616.174a3229@sergey> Message-ID: <1373615014.30749.YahooMailNeo@web184704.mail.ne1.yahoo.com> From: Sergey Sent: Thursday, July 11, 2013 10:16 PM > On Jul 10, 2013 Andrew Barnert wrote: > >>? But you haven't found a workable solution for all of the other >>? types people have brought up--or even for a single one of them. So >>? that's a serious misrepresentation. > > Strings, tuples (2 options), lists (3 options) and your cons-lists > (2 options). What other types have I missed? Again, you have not actually made cons lists work; you'd made different types work (deques, C++-style double-linked lists, etc.), and argued that those are "better types" or something. And then there's the whole class of immutable collection types besides tuple. You've suggested that we can patch sum for each one, and that you could theoretically rewrite every immutable collection type in the world to be fast-summable, but neither of those is even remotely feasible as a solution. Those aren't the only cases that have been brought up, but it hardly matters. If you can't answer any of the cases anyone brings up, why should anyone bother giving you new ones? >>>? It's just instead of discussing what is the best way to fix a > slowness, >>>? I'm spending most time trying to convince people that slowness > should >>>? be fixed. >>>? ? sum is slow for lists, let's fix that! >>>? ? you shouldn't use sum... >>>? ? why can't I use sum? >>>? ? because it's slow >> >>? But that's not why you shouldn't use sum. > > No, That's the MAIN reason! No, it really isn't. I and others have given multiple reasons. Let me try to put them all together in one place, roughly in order of importance: 1. If sum is the right way to concatenate sequences, it's ambiguous and confusing for types that are both addable and iterable, like numpy.array and other vectors. (The fact that you proposed a solution that would cause sum to flatten arrays is proof that this confusion is not just possible, but likely.) 2. The obvious way to concatenate sequences should work on iterators and other iterables; sum can't. 3. The word "sum" doesn't mean concatenating a bunch of sequences together. This isn't as much of a problem for +, because + isn't a word?people read it as "add", "plus", "and", then", etc. in different contexts, but they read "sum" as "sum". 4. Iterator interfaces are better in a variety of ways?that's why 3.x map, zip, etc. return iterators rather than lists. 5. The right way to concatenate sequences should have consistent performance characteristics; it's impossible for sum to be O(N) for all sequence types. So, that's 4 reasons that sum cannot be the right obvious way to concatenate sequences that are more important than performance. And even reason 5 is more about consistency than performance: if people learn that sum is fast, it had better be fast for their types; if it's not going to be fast for their types, we'd better not promise that it will be. And again, we already have an alternative that has none of those problems. Chain is not confusing for vectors, it works on all iterables, it clearly says what it means and means what it says, it provides an iterator interface, and it's O(N) for all iterables. (I'm ignoring the chain-vs.-chain.from_iterable issue, which is being hashed out in another thread, because it's not relevant here.) > There's one more, about "+" (and > sum) > being confusing for concatenation, and I can understand that, however > it's very weak as long "+" is used for joining sequences. But the > main reason that most people, including you, are pushing is that > sum() is slow. The whole our "you can't make it fast for" thread > is the evidence. No, the problem is that _you_ keep insisting that the only problem with sum is that it's not fast for all types, you ignore everyone who offers others reasons, so people engage you on the one thing you'll talk about. >>? Besides, being fast for list and tuple but slow for other >>? collection types would be an attractive nuisance. > > You're pushing "being slow" as a key argument.? No, I'm pushing the being _inconsistent_ as a key argument, as I explained above, and not even as the most important one. Also, the very first word in that sentence is "Besides"; and you ignored the real point before the aside?and you had to go back at least 5 messages to find one you could misrepresent in that way. > If due to some > technical details sum() was fast from the very beginning then nobody > would say that you shouldn't use it. Nobody? How about the person who created the sum function in the first place??You've already been given the quote. The only reason you can misuse sum for sequences at all is that he couldn't figure out a good way to prevent people from making that mistake. > Same as nobody is saying that > you should use reduce() instead of sum() for numbers because sum() > works only for __add__ while reduce works for every operator possible. Actually, those are very nearly the opposite. The reason to use sum instead of reduce for adding numbers is that sum is a specific, fit-for-purpose function that does exactly what you want to do and says exactly what it's doing, while reduce is an overly-general function that makes it unclear what you're trying to do. And that exact same reason means you should use chain instead of sum for?concatenating sequences. > min() is good for list of tuples, right? max() is also good for list > of tuples? Even any() is good for list of tuples (weird, but good). > Then why sum() is bad for list of tuples? Because it's SLOW! No. Finding the smallest tuple in a list is an obviously sensible idea that should obviously be spelled min. Flattening, a bunch of tuples is an obviously sensible idea that should obviously be spelled something like "chain" or "flatten" or "concatenate", not "sum". >>? Your only response to that has been to claim that it can be fast >>? for every possible collection type, but it can't. You haven't >>? gotten it to work. And there are good reasons to believe it's not >>? just hard, but impossible. > > Sorry, but can you please quote what response are you referencing? Picking one of your emails at random: >?Despite we CAN make it faster. For all cases, or for most > of them, Here's another: >> So, if you're suggesting that sum can be fast for anything >> reasonable, you're just wrong. > I suggested two ways how to do that. First, one can use the approach > above, i.e. use mutable type internally. Second, for internal cpython > types we have a simpler option to implement optimization directly > in sum(). And there may be many others, specific to the types in > question. And so on. About half your messages have some form or other of the assertion that sum can be fast for all sequence types. Half of them also have you claiming that you already solved some type (by replacing it with another type), and the other half have you denying you ever claimed such a thing. It's like talking to?http://en.wikipedia.org/wiki/Nathan_Thurm. "I never said that. Who told you I said that? It's so funny you would think that." > PS: maybe for 10 years people got used to the slow sum() and related > arguments so much that they don't want to lose them? ;) More Nathan Thurm: "I'm not being defensive. You're the one who's being defensive. Do you not want children to have toys? Did you not have toys growing up? It's so funny to me that anyone would want to deprive children of toys." From joshua at landau.ws Fri Jul 12 09:52:16 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 08:52:16 +0100 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: Message-ID: On 12 July 2013 07:01, Nick Coghlan wrote: > The strange contortions of the "fast sum for lists" discussions got me > wondering about whether it was possible to rehabilitate reduce with a less > error-prone API. It was banished to functools in 3.0 because it was so > frequently used incorrectly, but now its disfavour seems to be causing > people to propose ridiculous things. > > The 2.x reduce is modelled on map and filter: it accepts the combinator as > the first argument, and then the iterable, and finally an optional initial > value. The most common error was failing to handle the empty iterable case > sensibly by leaving out the initial value, so you got a TypeError instead of > returning a result. > > So, what if we instead added a new alternative API based on Haskell's "fold" > [1] where the initial value is *mandatory*: > > def fold(op, start, iterable): > ... > > Efficiently merging a collection of iterables into a list would then just > be: > > data = fold(operator.iadd, [], iterables) > > I'd personally be in favour of the notion of also allowing strings as the > first argument, so you could instead write: > > data = fold("+=", [], iterables) > > > This could also be introduced as an alternative API in functools. > > (Independent of this idea, it would actually be nice if the operator module > had a dictionary mapping from op symbols to names, like > operator.by_symbol["+="] giving operator.iadd) This sounds like a good idea to me (although by_symbol could well have a more catchy name) -- there are so many places where "lambda a, b: a + b" is just ugly. This could work itself into a lot of APIs. What would be the equivalent for "operator.pos", "operator.neg" and "operator.{get|set|del}item"? ...But then I start suggesting extensions like having operator.* auto curry: "operator.iadd(left=foo)(bar)" === "operator.iadd(foo, bar)" (currying should require separate keyword-only syntax, and those keywords always make a curried function). And uncurry -- we need that if we want this to work with map. From solipsis at pitrou.net Fri Jul 12 09:49:56 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Jul 2013 09:49:56 +0200 Subject: [Python-ideas] Rehabilating reduce (as "fold") References: Message-ID: <20130712094956.57f167d6@fsol> On Fri, 12 Jul 2013 16:01:07 +1000 Nick Coghlan wrote: > The strange contortions of the "fast sum for lists" discussions got me > wondering about whether it was possible to rehabilitate reduce with a less > error-prone API. It was banished to functools in 3.0 because it was so > frequently used incorrectly, but now its disfavour seems to be causing > people to propose ridiculous things. I would disagree with this interpretation. reduce() wasn't "banished" (what a strange qualification!) because its API was "error-prone", but because the whole concept isn't very useful - and, indeed, little used - in a language like Python. > So, what if we instead added a new alternative API based on Haskell's > "fold" [1] where the initial value is *mandatory*: > > def fold(op, start, iterable): > ... So fold(op, start, iterable) is the same as reduce(op, iterable, start)? This sounds silly and useless. Regards Antoine. From abarnert at yahoo.com Fri Jul 12 10:03:52 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Jul 2013 01:03:52 -0700 (PDT) Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: Message-ID: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> From: Nick Coghlan Sent: Thursday, July 11, 2013 11:01 PM >So, what if we instead added a new alternative API based on Haskell's "fold" [1] where the initial value is *mandatory*: > >??? def fold(op, start, iterable): >??????? ... Note that Haskell, and many other functional languages, actually have both functions: ? ? def fold(op, start, iterable): ? ? def fold1(op, iterable): And really, they're only separate functions because a language with strict types and automatic currying can't handle variable arguments. Meanwhile,?there are an awful lot of people who just don't like reduce/fold in any situation.?The quote "Inside every reduce is a loop trying to get out" appears quite frequently, on this list and elsewhere.?And I don't think it's because?it's easy to get the fold/fold1 distinction wrong, but because they consider any use of reduce unreadable. I think the idea is that folding only makes immediate sense if you're thinking of your data structures recursively instead of iteratively,?which you usually aren't in Python. But I'm probably not the best one to characterize the objection, since I don't share it. (Of course there are cases where reduce _is_ unreadable, and the only reason people use it is because in Haskell or OCaml or Scheme the explicit loop would be _more_ unreadable, even though that isn't even remotely true in Python? but there are also cases where it makes sense to me.) One more thing: The name "fold" to me really implies there's going to be "foldr" function as well, in a way that "reduce" doesn't. But I could probably get over that?after all, right-folding isn't nearly as important for code with arrays or iterators the same way it is for recursive code with cons lists or lazy lists. >I'd personally be in favour of the notion of also allowing strings as the first argument, so you could instead write: > >??? data = fold("+=", [], iterables) I like this idea, but only if it's?added to other functions in the stdlib where it makes sense, and?easy to add to new functions of your own. >This could also be introduced as an alternative API in functools. > >(Independent of this idea, it would actually be nice if the operator module had a dictionary mapping from op symbols to names, like operator.by_symbol["+="] giving operator.iadd) And that answers the "easy to add to new functions" bit!? Except a helper function might be nice, something like operator.get_op (but with a better name): ? ? def get_op(func): ? ? ? ? if callable(func): ? ? ? ? ? ? return func ? ? ? ? else: ? ? ? ? ? ? return by_symbol[func] From p.f.moore at gmail.com Fri Jul 12 10:23:09 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Jul 2013 09:23:09 +0100 Subject: [Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?] In-Reply-To: <51DF57E6.8090206@mrabarnett.plus.com> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DF5368.6020505@pearwood.info> <51DF57E6.8090206@mrabarnett.plus.com> Message-ID: On 12 July 2013 02:12, MRAB wrote: > While you have your cap on, if you're going to special-case lists, then > why not strings too (just passing them on to "".join())? > And of course, that specific question was debated, and the decision taken to go with what we have now, when sum was first introduced. Someone who is arguing for this proposal needs to go back and research that decision, and confirm that the reasons discussed then no longer apply. I suspect many of them still do. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 10:44:36 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 09:44:36 +0100 Subject: [Python-ideas] Reference variable in assignment: x = foo(?) In-Reply-To: References: <72C90407-C5D3-4672-9966-AF7F892AB9CA@yahoo.com> Message-ID: On 12 July 2013 05:21, Haoyi Li wrote: > Another possible solution for the > > value = expensive(b) if expensive(b) else default > > problem, if you don't want a statement to assign to a temporary variable, is > to use a `let` expression > > // raw version > value = (lambda res: res if res else default)(expensive(b)) > > // with a helper function `let` > value = let(expensive(b))(lambda res: res if res else default) Ahem: (lambda res=expensive(b): res if res else default)() From joshua at landau.ws Fri Jul 12 10:49:32 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 09:49:32 +0100 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On 12 July 2013 09:03, Andrew Barnert wrote: >>I'd personally be in favour of the notion of also allowing strings as the first argument, so you could instead write: > >> >> data = fold("+=", [], iterables) > > > I like this idea, but only if it's added to other functions in the stdlib where it makes sense, and easy to add to new functions of your own. Is there a use-case for that last part? It strikes me as equivalent to messing with builtins, which is largely unliked. From p.f.moore at gmail.com Fri Jul 12 12:11:17 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Jul 2013 11:11:17 +0100 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> References: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On 12 July 2013 09:03, Andrew Barnert wrote: > Meanwhile, there are an awful lot of people who just don't like > reduce/fold in any situation. The quote "Inside every reduce is a loop > trying to get out" appears quite frequently, on this list and elsewhere. And yet we keep getting cases like the sum discussion which is a fold in essence, but people reject suggestions of "just use a loop". So it doesn't look like the loop is trying very hard to get out :-) Whether "inside every specialised function there is a fold trying to get out" is any more likely to gain traction, I don't know... Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Fri Jul 12 12:36:35 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Fri, 12 Jul 2013 18:36:35 +0800 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: I'd be all for increasing usage of fold and reduce, and higher order combinators in general, but wasn't it always a somewhat philosophical issue that kept lambdas intentionally verbose/crippled and discouraged usage of Higher-Order-Functions when direct imperative code (i.e. loops) works? I've always felt reduce() being banished was but a small facet of this overall philosophy, and not so much because it was individually difficult to use. > data = fold("+=", [], iterables) Seems like a terrible hack to me =( it brings back memories of my PHP days where "first class functions" meant you passed in the functions name as a string which got concatted-around and eval-ed. We all laughed at how badly they designed the language to have it end up like that. Naturally, it is the path of least resistance, since it could be implemented with existing language features (i.e. `eval`, which can implement anything really) but it would leave a sour taste in my mouth every time i use it. I would much prefer the somewhat-more-difficult route of modifying the parser to let `a += b` be an expression, and then you could write data = fold(lambda a, b: a += b, [], iterables) or even groovy/scala/mathematica style data = fold(_ += _, [], iterables) Which is a lot further (implementation wise) from where we are now, and 2 characters more verbose, but it would be far more generally usable than a one-off "let's pass in operators as strings and concat/eval them" rule. -Haoyi On Fri, Jul 12, 2013 at 6:11 PM, Paul Moore wrote: > On 12 July 2013 09:03, Andrew Barnert wrote: > >> Meanwhile, there are an awful lot of people who just don't like >> reduce/fold in any situation. The quote "Inside every reduce is a loop >> trying to get out" appears quite frequently, on this list and elsewhere. > > > And yet we keep getting cases like the sum discussion which is a fold in > essence, but people reject suggestions of "just use a loop". So it doesn't > look like the loop is trying very hard to get out :-) > > Whether "inside every specialised function there is a fold trying to get > out" is any more likely to gain traction, I don't know... > > Paul > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 13:58:13 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 12:58:13 +0100 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On 12 July 2013 11:36, Haoyi Li wrote: > I'd be all for increasing usage of fold and reduce, and higher order > combinators in general, but wasn't it always a somewhat philosophical issue > that kept lambdas intentionally verbose/crippled and discouraged usage of > Higher-Order-Functions when direct imperative code (i.e. loops) works? I've > always felt reduce() being banished was but a small facet of this overall > philosophy, and not so much because it was individually difficult to use. > >> data = fold("+=", [], iterables) > > Seems like a terrible hack to me =( it brings back memories of my PHP days > where "first class functions" meant you passed in the functions name as a > string which got concatted-around and eval-ed. We all laughed at how badly > they designed the language to have it end up like that. Naturally, it is the > path of least resistance, since it could be implemented with existing > language features (i.e. `eval`, which can implement anything really) but it > would leave a sour taste in my mouth every time i use it. > > I would much prefer the somewhat-more-difficult route of modifying the > parser to let `a += b` be an expression, and then you could write > > data = fold(lambda a, b: a += b, [], iterables) > > or even groovy/scala/mathematica style > > data = fold(_ += _, [], iterables) > > Which is a lot further (implementation wise) from where we are now, and 2 > characters more verbose, but it would be far more generally usable than a > one-off "let's pass in operators as strings and concat/eval them" rule. https://github.com/lihaoyi/macropy#quick-lambdas From haoyi.sg at gmail.com Fri Jul 12 14:01:51 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Fri, 12 Jul 2013 20:01:51 +0800 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: > https://github.com/lihaoyi/macropy#quick-lambdas Yeah I'm the author of that =D My point wasn't so much that "use my cool macroz!!!" as "passing operator as string PHP-style makes me sad =(" and with examples of how other languages do it that doesn't make me sad. On Fri, Jul 12, 2013 at 7:58 PM, Joshua Landau wrote: > On 12 July 2013 11:36, Haoyi Li wrote: > > I'd be all for increasing usage of fold and reduce, and higher order > > combinators in general, but wasn't it always a somewhat philosophical > issue > > that kept lambdas intentionally verbose/crippled and discouraged usage of > > Higher-Order-Functions when direct imperative code (i.e. loops) works? > I've > > always felt reduce() being banished was but a small facet of this overall > > philosophy, and not so much because it was individually difficult to use. > > > >> data = fold("+=", [], iterables) > > > > Seems like a terrible hack to me =( it brings back memories of my PHP > days > > where "first class functions" meant you passed in the functions name as a > > string which got concatted-around and eval-ed. We all laughed at how > badly > > they designed the language to have it end up like that. Naturally, it is > the > > path of least resistance, since it could be implemented with existing > > language features (i.e. `eval`, which can implement anything really) but > it > > would leave a sour taste in my mouth every time i use it. > > > > I would much prefer the somewhat-more-difficult route of modifying the > > parser to let `a += b` be an expression, and then you could write > > > > data = fold(lambda a, b: a += b, [], iterables) > > > > or even groovy/scala/mathematica style > > > > data = fold(_ += _, [], iterables) > > > > Which is a lot further (implementation wise) from where we are now, and 2 > > characters more verbose, but it would be far more generally usable than a > > one-off "let's pass in operators as strings and concat/eval them" rule. > > https://github.com/lihaoyi/macropy#quick-lambdas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From szport at gmail.com Fri Jul 12 14:36:11 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Fri, 12 Jul 2013 16:36:11 +0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= Message-ID: Hello! Is it good idea to allow float('?') to be float('inf') in python? -- ? ?????????, ???????? ?.?. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 14:54:32 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 13:54:32 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 13:36, Zaur Shibzukhov wrote: > Hello! > > Is it good idea to allow > float('?') to be float('inf') in python? Why? From _ at lvh.io Fri Jul 12 14:59:41 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 12 Jul 2013 14:59:41 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 2:54 PM, Joshua Landau wrote: > On 12 July 2013 13:36, Zaur Shibzukhov wrote: > > Hello! > > > > Is it good idea to allow > > float('?') to be float('inf') in python? > > Why? > Because it obviously means infinity -- much more so than "inf" does :) cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerald.britton at gmail.com Fri Jul 12 15:09:36 2013 From: gerald.britton at gmail.com (Gerald Britton) Date: Fri, 12 Jul 2013 09:09:36 -0400 Subject: [Python-ideas] =?utf-8?b?IGZsb2F0KCfiiJ4nKT1mbG9hdCgnaW5mJyk=?= Message-ID: > On 12 July 2013 13:36, Zaur Shibzukhov wrote: > > Hello! > > > > Is it good idea to allow > > float('?') to be float('inf') in python? > > Why? > >Because it obviously means infinity -- much more so than "inf" does :) Do you have the infinity symbol on your keyboard? I don't! So, for me, should I ask for float('oo') ?? -1 -- Gerald Britton From storchaka at gmail.com Fri Jul 12 15:12:09 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Jul 2013 16:12:09 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: 12.07.13 15:36, Zaur Shibzukhov ???????(??): > Is it good idea to allow > float('?') to be float('inf') in python? float('?') == 0.5? float('3.(142857)') == 22/7? int('?') == 7? int('?') == 2? From _ at lvh.io Fri Jul 12 15:15:04 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 12 Jul 2013 15:15:04 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 3:09 PM, Gerald Britton wrote: > Do you have the infinity symbol on your keyboard? I don't! > Why does what you have on your keyboard matter? Just because the example uses a string literal, doesn't mean that's the only use case. I can pass infinity symbols along in any text medium. > Gerald Britton > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerald.britton at gmail.com Fri Jul 12 15:21:43 2013 From: gerald.britton at gmail.com (Gerald Britton) Date: Fri, 12 Jul 2013 09:21:43 -0400 Subject: [Python-ideas] =?utf-8?b?IGZsb2F0KCfiiJ4nKT1mbG9hdCgnaW5mJyk=?= Message-ID: >> Do you have the infinity symbol on your keyboard? I don't!. >> >Why does what you have on your keyboard matter? Just because the example >uses a string literal, doesn't mean that's the only use case. I can pass >infinity symbols along in any text medium. Ummm...cause that's what I use when programming? This is a truly silly idea. What next? float('pi') = 3.14159... float('e') = 2.71828... float('phi') = 1.618... etc. Note that I don't have any of those symbols on my keyboard either. Now, if I were Greek... -- Gerald Britton From szport at gmail.com Fri Jul 12 15:37:51 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Fri, 12 Jul 2013 06:37:51 -0700 (PDT) Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: Because infinity is special case of numbers. Unicode standard have regular infinity symbol and it's natural to represent inifinity as ?. ???????, 12 ???? 2013 ?., 17:12:09 UTC+4 ???????????? Serhiy Storchaka ???????: > > 12.07.13 15:36, Zaur Shibzukhov ???????(??): > > Is it good idea to allow > > float('?') to be float('inf') in python? > > float('?') == 0.5? > float('3.(142857)') == 22/7? > int('?') == 7? > int('?') == 2? > > > _______________________________________________ > Python-ideas mailing list > Python... at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.io Fri Jul 12 15:43:50 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 12 Jul 2013 15:43:50 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 3:21 PM, Gerald Britton wrote: > >> Do you have the infinity symbol on your keyboard? I don't!. > >> > > >Why does what you have on your keyboard matter? Just because the example > >uses a string literal, doesn't mean that's the only use case. I can pass > >infinity symbols along in any text medium. > > Ummm...cause that's what I use when programming? > My point is that this doesn't have to come from source code. It can come from any kind of user input, which is the more common use case for calling float or int in the first place. If you just wanted the number, you'd just type the literal. (Infinity, of course, is a little special, since it doesn't have a literal -- just float("inf")). > -- > Gerald Britton > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 15:45:46 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 14:45:46 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 14:09, Gerald Britton wrote: >> On 12 July 2013 13:36, Zaur Shibzukhov wrote: >> > Hello! >> > >> > Is it good idea to allow >> > float('?') to be float('inf') in python? >> >> Why? >> > >>Because it obviously means infinity -- much more so than "inf" does :) > > Do you have the infinity symbol on your keyboard? I don't! So, for > me, should I ask for Do you have any of: ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? on your keyboard because they are all valid *as of now* inside the string you pass to float()? From gerald.britton at gmail.com Fri Jul 12 15:48:36 2013 From: gerald.britton at gmail.com (Gerald Britton) Date: Fri, 12 Jul 2013 09:48:36 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: OK, so you need users with Greek keyboards, I suppose. I'm not sure the number of those that also use Python applications justifies adding this kind of sugar to the language. On Fri, Jul 12, 2013 at 9:43 AM, Laurens Van Houtven <_ at lvh.io> wrote: > On Fri, Jul 12, 2013 at 3:21 PM, Gerald Britton > wrote: >> >> >> Do you have the infinity symbol on your keyboard? I don't!. >> >> >> >> >Why does what you have on your keyboard matter? Just because the example >> >uses a string literal, doesn't mean that's the only use case. I can pass >> >infinity symbols along in any text medium. >> >> Ummm...cause that's what I use when programming? > > > My point is that this doesn't have to come from source code. It can come > from any kind of user input, which is the more common use case for calling > float or int in the first place. If you just wanted the number, you'd just > type the literal. (Infinity, of course, is a little special, since it > doesn't have a literal -- just float("inf")). > >> >> -- >> Gerald Britton >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > -- Gerald Britton From grosser.meister.morti at gmx.net Fri Jul 12 15:50:02 2013 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Fri, 12 Jul 2013 15:50:02 +0200 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: Message-ID: <51E0098A.3070004@gmx.net> On 07/12/2013 08:01 AM, Nick Coghlan wrote: > > I'd personally be in favour of the notion of also allowing strings as the first argument, so you could instead write: > > data = fold("+=", [], iterables) > > I'd like to see a more Haskell like way to reference operators: data = fold((+=), [], iterables) (+=) would just be a short syntax for operator.iadd without the need to explicitly import any module. It should generate the same byte code. But I have the feeling that won't happen. :/ -panzi From _ at lvh.io Fri Jul 12 15:55:25 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 12 Jul 2013 15:55:25 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 3:48 PM, Gerald Britton wrote: > OK, so you need users with Greek keyboards, I suppose. I'm not sure > the number of those that also use Python applications justifies adding > this kind of sugar to the language. > I'm not sure why you're so focused on keyboards. Perhaps "user input" was a poor choice of words on my part: that may just as well come from a different computer :) Or maybe it's a parsed mathematical document. It doesn't necessarily literally have to be typed in by someone. Joshua, elsewhere in this thread, already enumerated all the things float currently accepts. I hope you'll agree that they're far, far more exotic than an infinity sign. > -- > Gerald Britton > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hagmueller at aim-online.com Fri Jul 12 15:31:17 2013 From: hagmueller at aim-online.com (=?UTF-8?B?QW5kcmVhcyBIYWdtw7xsbGVy?=) Date: Fri, 12 Jul 2013 15:31:17 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <51E00525.6020700@aim-online.com> int ('100%') = 1? float('1%') = 0.01? -1 > 12.07.13 15:36, Zaur Shibzukhov ???????(??): >> Is it good idea to allow >> float('?') to be float('inf') in python? > > float('?') == 0.5? > float('3.(142857)') == 22/7? > int('?') == 7? > int('?') == 2? AIM - Gesellschaft f?r angewandte Informatik und Mikroelektronik m.b.H Registered Office: Freiburg District Court of Freiburg HRB 3520 Managing Directors: Hansj?rg Frey, Joachim Schuler This E-mail and any attachment(s) to it are for the addressee's use only. It is strictly confidential and may contain legally privileged information. No confidentiality or privilege is waived or lost by any mistransmission. If you are not the intended addressee, then please delete it from your system and notify the sender immediately. You are hereby notified that any use, disclosure, copying or any action taken in reliance on it is strictly prohibited and may be unlawful. - Thank you. From joshua at landau.ws Fri Jul 12 16:02:06 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 15:02:06 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 14:43, Laurens Van Houtven <_ at lvh.io> wrote: > On Fri, Jul 12, 2013 at 3:21 PM, Gerald Britton > wrote: >> >> >> Do you have the infinity symbol on your keyboard? I don't!. >> >> >> >> >Why does what you have on your keyboard matter? Just because the example >> >uses a string literal, doesn't mean that's the only use case. I can pass >> >infinity symbols along in any text medium. >> >> Ummm...cause that's what I use when programming? > > > My point is that this doesn't have to come from source code. It can come > from any kind of user input, which is the more common use case for calling > float or int in the first place. If you just wanted the number, you'd just > type the literal. (Infinity, of course, is a little special, since it > doesn't have a literal -- just float("inf")). I'd try phrasing it as the same sort of thing as what caused the internationalisation aspect to what float and int can receive -- they now accept foreign numbers: >>> float("???") 234.0 Consider also that float accepts "infinity" as well as "inf", and any variant of capitalization. I think it's reasonable that unicode infinity is allowed. However, we don't take all forms of negative symbols or decimal points, so it's not like anything goes. From sergemp at mail.ru Fri Jul 12 16:03:53 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 12 Jul 2013 17:03:53 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <4081F575-3271-48CC-A2FF-AD2B406FD32F@mac.com> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB6BF6.9030608@pearwood.info> <20130709165449.6b124367@sergey> <20130711014738.7609f82a@sergey> <4081F575-3271-48CC-A2FF-AD2B406FD32F@mac.com> Message-ID: <20130712170353.4847de9e@sergey> On Jul 11, 2013 Ronald Oussoren wrote: >> It depends on implementation details, it's possible to keep it >> backward compatible. BTW, what C API do you expect to break? > > If a tuple stores the values in a separate list the structure of a PyTupleObject > changes. That structure is exposed to users of the C API, and hence changes > shouldn't be made lightly. That would effectively add one more: void *_internal_list; field, which is backward compatible. (On the other hand it may be a good idea to break API compatibility from time to time to make sure that it's not badly misused. :) >> Yes, technically it's possible to implement tuple so that it would >> realloc internal list to save some ram, but why? List does not do >> that when you remove elements from it, why should tuple do that? > > Actually, list does resize when you remove items but only does so when > the amount of free items gets too large (see list_resize in Object/listobject.c) Agree. It indeed tries to reallocate lists sometimes. I was fooled because I tried: a=[i for i in xrange(50000000)] while len(a) > 10: x = a.pop() and it only released the memory when I closed the interpreter. (this brings a weak question of whether it makes sense to reallocate if memory is not released to the system) > Keeping memory blocks alive unnecessarily is bad because this can increase > the amount of memory used by a script, without their being a clear reason for > it when you inspect the python code. This has been one of the reasons for > not making string slicing operations views on the entire string (that is, a method > like aStr.partition could return objects that reference aStr for the character > storage which could save memory but at the significant risk of keeping too much > memory alive when aStr is discarded earlier than one of the return values) Well, depending on the actual implementation this can be fixed. E.g. as long as we optimize things for the case of 2 objects using same list, the list could store both sizes and reallocate if they both are much smaller than allocated size. Or it can store all sizes and reallocate when all of them are small enough. Or it can use some lazy reallocation, just marking the list and allowing other alive objects to reallocate when they next access this list. Lots of options. :) >> On the other hand: >> a = (1,) * 1000 >> b = a + (2,3) >> c = b + (4,5) >> And you have 3 variables for the price of one. Lot's of memory saved! > > How do you do that? As for as I know Python isn't a quantum system, > item 1000 of the hidden storage list can't be both 2 and 4 at the same time. c==(1, ... 1,2,3,4,5). No need to be 2 and 4 at the same time. :) >> Nothing incorrect here, of course __add__ should handle that, and >> if it cannot reuse list it would copy it. As it does now. > > And then you introduce unpredictable performance for tuple addition, > currently you can reason about the performance of code that does > tuple addition and with this change you no longer can (it sometimes > is fast, and sometimes it is slow). Don't understand. Now it's always slow and unpredictable (because it depends on the current heap state, allocation map of system RAM, etc). You cannot predict anything right now. It is as much unpredictable as adding item to a list (it may or may not need to realloc), or as almost every other operation in python. It may become faster in average, but will still be as much unpredictable as it is now. E.g. you have no way to predict that the memory page you're trying to read was swapped out by the system and won't come up in next few seconds because disk is under a heavy load. > Anyways, I'm still +0 using += in sum, and -1 on trying to special case > particular types. This is not about special casing. What we discuss now is some sort of copy-on-write optimisation for python. Copy-on-write is known to save memory, it's widely used in many places (e.g. by Linux kernel when managing apps memory pages). Is it a bad thing to have such a general and well known optimisation technique in python? This approach has other benefits. For example lists and tuples could share the allocation code, that would not only make lists faster as well, but will allow instant conversion of lists to tuples and back. What I'm afraid of is that despite making many things faster I may face the argument "this also make sum() of lists faster, and this should not happen, sum() MUST be slow because this is how it must be". So after all the sum discussions I don't have much desire to even start working on implementation. ?? From alexander.belopolsky at gmail.com Fri Jul 12 16:09:30 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 12 Jul 2013 10:09:30 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <4F6CC8CF-CC55-4E45-AD10-053FAFFC6950@gmail.com> I would like to invite those interested in this topic to comment on this issue: http://bugs.python.org/issue10581 On the topic at hand, I would be -1 to allow float('?'). The case for this is weaker than he case for accepting non-ASCII digits because it is very easy to write a function to replace ? with inf before passing strings to float. On Jul 12, 2013, at 9:45 AM, Joshua Landau wrote: > On 12 July 2013 14:09, Gerald Britton wrote: >>> On 12 July 2013 13:36, Zaur Shibzukhov wrote: >>>> Hello! >>>> >>>> Is it good idea to allow >>>> float('?') to be float('inf') in python? >>> >>> Why? >> >>> Because it obviously means infinity -- much more so than "inf" does :) >> >> Do you have the infinity symbol on your keyboard? I don't! So, for >> me, should I ask for > > Do you have any of: > > ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? > > on your keyboard because they are all valid *as of now* inside the > string you pass to float()? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 16:10:13 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 15:10:13 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E00525.6020700@aim-online.com> References: <51E00525.6020700@aim-online.com> Message-ID: On 12 July 2013 14:31, Andreas Hagm?ller wrote: > int ('100%') = 1? > float('1%') = 0.01? > -1 >> 12.07.13 15:36, Zaur Shibzukhov ???????(??): >>> Is it good idea to allow >>> float('?') to be float('inf') in python? >> >> float('?') == 0.5? >> float('3.(142857)') == 22/7? >> int('?') == 7? >> int('?') == 2? No-one has implied that int or float should parse mathematical *expressions*, but just that unicode infinity should be one more way to write "infinity". Consider that: int(1) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) == int(?) and that float("inf") == float("infinity") == float("INF") == float("INFINITY") == float("Inf") == float("Infinity") int and float are obviously meant to handle abstract inputs (not expressions) and unicode infinity is an extension of this. Your "analogies" are inapt. From joshua at landau.ws Fri Jul 12 16:14:16 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 15:14:16 +0100 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130712170353.4847de9e@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <51DB6BF6.9030608@pearwood.info> <20130709165449.6b124367@sergey> <20130711014738.7609f82a@sergey> <4081F575-3271-48CC-A2FF-AD2B406FD32F@mac.com> <20130712170353.4847de9e@sergey> Message-ID: On 12 July 2013 15:03, Sergey wrote: > What I'm afraid of is that despite making many things faster I may > face the argument "this also make sum() of lists faster, and this > should not happen, sum() MUST be slow because this is how it must be". > So after all the sum discussions I don't have much desire to even > start working on implementation. If this is all you have gotten from our arguments, then there was no point talking to you. You seem to be methodologically ignoring us and trivialising our arguments. From joshua at landau.ws Fri Jul 12 16:17:53 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 15:17:53 +0100 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: <51E0098A.3070004@gmx.net> References: <51E0098A.3070004@gmx.net> Message-ID: On 12 July 2013 14:50, Mathias Panzenb?ck wrote: > On 07/12/2013 08:01 AM, Nick Coghlan wrote: >> >> >> I'd personally be in favour of the notion of also allowing strings as the >> first argument, so you could instead write: >> >> data = fold("+=", [], iterables) >> >> > > I'd like to see a more Haskell like way to reference operators: > > data = fold((+=), [], iterables) > > (+=) would just be a short syntax for operator.iadd without the need to > explicitly import any module. It should > generate the same byte code. > > But I have the feeling that won't happen. :/ Damn straight! Do you realise how much of an attractive nuisance that would be for people constantly begging "I have (+) now so why do I have to write lambda for " and then Guido gets upset because he's covered this so many times before and no-one will just agree goddamnit? From grosser.meister.morti at gmx.net Fri Jul 12 16:37:57 2013 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Fri, 12 Jul 2013 16:37:57 +0200 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <51E0098A.3070004@gmx.net> Message-ID: <51E014C5.9030309@gmx.net> On 07/12/2013 04:17 PM, Joshua Landau wrote: > On 12 July 2013 14:50, Mathias Panzenb?ck wrote: >> On 07/12/2013 08:01 AM, Nick Coghlan wrote: >>> >>> >>> I'd personally be in favour of the notion of also allowing strings as the >>> first argument, so you could instead write: >>> >>> data = fold("+=", [], iterables) >>> >>> >> >> I'd like to see a more Haskell like way to reference operators: >> >> data = fold((+=), [], iterables) >> >> (+=) would just be a short syntax for operator.iadd without the need to >> explicitly import any module. It should >> generate the same byte code. >> >> But I have the feeling that won't happen. :/ > > Damn straight! Do you realise how much of an attractive nuisance that > would be for people constantly begging "I have (+) now so why do I > have to write lambda for " and then Guido gets upset > because he's covered this so many times before and no-one will just > agree goddamnit? > I get your point, but (+) wouldn't be a lambda. It would just be a shorthand for operator.add. So you could write (+)(a, b) instead of a + b. Well, thinking of that maybe it's not such a good idea. From gerald.britton at gmail.com Fri Jul 12 16:43:18 2013 From: gerald.britton at gmail.com (Gerald Britton) Date: Fri, 12 Jul 2013 10:43:18 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: Man I don't know how you are doing this! I just tried: float('') and got Value error: could not convert string to float '' For that matter, I can't figure out how to type the greek letter for pi in gmail! Guess I have some things to learn. So, if Python doesn't recognize the symbol for pi, why should it recognize the one for infinity? On Fri, Jul 12, 2013 at 10:02 AM, Joshua Landau wrote: > On 12 July 2013 14:43, Laurens Van Houtven <_ at lvh.io> wrote: >> On Fri, Jul 12, 2013 at 3:21 PM, Gerald Britton >> wrote: >>> >>> >> Do you have the infinity symbol on your keyboard? I don't!. >>> >> >>> >>> >Why does what you have on your keyboard matter? Just because the example >>> >uses a string literal, doesn't mean that's the only use case. I can pass >>> >infinity symbols along in any text medium. >>> >>> Ummm...cause that's what I use when programming? >> >> >> My point is that this doesn't have to come from source code. It can come >> from any kind of user input, which is the more common use case for calling >> float or int in the first place. If you just wanted the number, you'd just >> type the literal. (Infinity, of course, is a little special, since it >> doesn't have a literal -- just float("inf")). > > I'd try phrasing it as the same sort of thing as what caused the > internationalisation aspect to what float and int can receive -- they > now accept foreign numbers: > >>>> float("???") > 234.0 > > Consider also that float accepts "infinity" as well as "inf", and any > variant of capitalization. I think it's reasonable that unicode > infinity is allowed. > > However, we don't take all forms of negative symbols or decimal > points, so it's not like anything goes. -- Gerald Britton From _ at lvh.io Fri Jul 12 16:46:11 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 12 Jul 2013 16:46:11 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 4:43 PM, Gerald Britton wrote: > Man I don't know how you are doing this! I just tried: > > float('') and got > > Value error: could not convert string to float '' > > For that matter, I can't figure out how to type the greek letter for > pi in gmail! Guess I have some things to learn. > > So, if Python doesn't recognize the symbol for pi, why should it > recognize the one for infinity? > The example he posted is of digits, not of any particular symbol for a constant. The difference, obviously, is that you can't write infinity as a bunch of digits, whereas you can at least approximate pi with any number of digits. lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Jul 12 16:52:47 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Jul 2013 00:52:47 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Sat, Jul 13, 2013 at 12:43 AM, Gerald Britton wrote: > Man I don't know how you are doing this! I just tried: > > float('') and got > > Value error: could not convert string to float '' > > For that matter, I can't figure out how to type the greek letter for > pi in gmail! Guess I have some things to learn. > > So, if Python doesn't recognize the symbol for pi, why should it > recognize the one for infinity? Considering that Python can't represent ? in a float anyway, I wouldn't be too bothered. And what else? float('?') for twice that value? Not really necessary imho. ChrisA From barry at python.org Fri Jul 12 16:54:49 2013 From: barry at python.org (Barry Warsaw) Date: Fri, 12 Jul 2013 10:54:49 -0400 Subject: [Python-ideas] Rehabilating reduce (as "fold") References: Message-ID: <20130712105449.6b9fa825@anarchist> On Jul 12, 2013, at 04:01 PM, Nick Coghlan wrote: >I'd personally be in favour of the notion of also allowing strings as the >first argument, so you could instead write: > > data = fold("+=", [], iterables) You had me until here... >(Independent of this idea, it would actually be nice if the operator module >had a dictionary mapping from op symbols to names, like >operator.by_symbol["+="] giving operator.iadd) ...but this is a neat idea. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From joshua at landau.ws Fri Jul 12 17:02:57 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 16:02:57 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 15:46, Laurens Van Houtven <_ at lvh.io> wrote: > On Fri, Jul 12, 2013 at 4:43 PM, Gerald Britton > wrote: >> >> Man I don't know how you are doing this! I just tried: >> >> float('') and got >> >> Value error: could not convert string to float '' >> >> For that matter, I can't figure out how to type the greek letter for >> pi in gmail! Guess I have some things to learn. >> >> So, if Python doesn't recognize the symbol for pi, why should it >> recognize the one for infinity? > > > The example he posted is of digits, not of any particular symbol for a > constant. The difference, obviously, is that you can't write infinity as a > bunch of digits, whereas you can at least approximate pi with any number of > digits. Ahem: >>> float("1"*310) inf Just because. My personal reason for thinking that unicode infinity is reasonable whereas unicode pi/tau/phi/etc. is not, is simply that we *already* special-case infinity. We do not do so for other mathematical constants. Additionally, Pi only holds the value of half the circle constant by default -- other branches of mathematics uses for other things and some use it as a variable. They are rare, granted, but Pi is not as clear cut as, say, "9" or "infinity". From _ at lvh.io Fri Jul 12 17:13:50 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 12 Jul 2013 17:13:50 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: I was speaking generally about numbers; it's certainly true that any floating point implementation with a limited mantissa and exponent has some upper limit as to which integer it can store exactly; and at some point above that it's going to have to either throw an error, give you a smaller number, or give you infinity ;) On Fri, Jul 12, 2013 at 5:02 PM, Joshua Landau wrote: > On 12 July 2013 15:46, Laurens Van Houtven <_ at lvh.io> wrote: > > On Fri, Jul 12, 2013 at 4:43 PM, Gerald Britton < > gerald.britton at gmail.com> > > wrote: > >> > >> Man I don't know how you are doing this! I just tried: > >> > >> float('') and got > >> > >> Value error: could not convert string to float '' > >> > >> For that matter, I can't figure out how to type the greek letter for > >> pi in gmail! Guess I have some things to learn. > >> > >> So, if Python doesn't recognize the symbol for pi, why should it > >> recognize the one for infinity? > > > > > > The example he posted is of digits, not of any particular symbol for a > > constant. The difference, obviously, is that you can't write infinity as > a > > bunch of digits, whereas you can at least approximate pi with any number > of > > digits. > > Ahem: > > >>> float("1"*310) > inf > > Just because. > > My personal reason for thinking that unicode infinity is reasonable > whereas unicode pi/tau/phi/etc. is not, is simply that we *already* > special-case infinity. We do not do so for other mathematical > constants. Additionally, Pi only holds the value of half the circle > constant by default -- other branches of mathematics uses for other > things and some use it as a variable. They are rare, granted, but Pi > is not as clear cut as, say, "9" or "infinity". > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerald.britton at gmail.com Fri Jul 12 17:14:54 2013 From: gerald.britton at gmail.com (Gerald Britton) Date: Fri, 12 Jul 2013 11:14:54 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: "Just because." so, maybe we should have the interpreter spit out ? instead? I get that we special case infinity. Its an IEEE thing. I can sure the next request coming: The various constants represented by unicode characters. On Fri, Jul 12, 2013 at 11:02 AM, Joshua Landau wrote: > On 12 July 2013 15:46, Laurens Van Houtven <_ at lvh.io> wrote: >> On Fri, Jul 12, 2013 at 4:43 PM, Gerald Britton >> wrote: >>> >>> Man I don't know how you are doing this! I just tried: >>> >>> float('') and got >>> >>> Value error: could not convert string to float '' >>> >>> For that matter, I can't figure out how to type the greek letter for >>> pi in gmail! Guess I have some things to learn. >>> >>> So, if Python doesn't recognize the symbol for pi, why should it >>> recognize the one for infinity? >> >> >> The example he posted is of digits, not of any particular symbol for a >> constant. The difference, obviously, is that you can't write infinity as a >> bunch of digits, whereas you can at least approximate pi with any number of >> digits. > > Ahem: > >>>> float("1"*310) > inf > > Just because. > > My personal reason for thinking that unicode infinity is reasonable > whereas unicode pi/tau/phi/etc. is not, is simply that we *already* > special-case infinity. We do not do so for other mathematical > constants. Additionally, Pi only holds the value of half the circle > constant by default -- other branches of mathematics uses for other > things and some use it as a variable. They are rare, granted, but Pi > is not as clear cut as, say, "9" or "infinity". -- Gerald Britton From gerald.britton at gmail.com Fri Jul 12 17:16:21 2013 From: gerald.britton at gmail.com (Gerald Britton) Date: Fri, 12 Jul 2013 11:16:21 -0400 Subject: [Python-ideas] =?utf-8?b?IGZsb2F0KCfiiJ4nKT1mbG9hdCgnaW5mJyk=?= Message-ID: "Considering that Python can't represent ? in a float anyway, I wouldn't be too bothered. " >>> import math >>> type(math.pi) -- Gerald Britton From _ at lvh.io Fri Jul 12 17:20:09 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 12 Jul 2013 17:20:09 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 5:14 PM, Gerald Britton wrote: > I get that we special case infinity. Its an IEEE thing. I can sure > the next request coming: The various constants represented by unicode > characters. > This is just a slippery slope argument: what anyone suggests otherwise has nothing to do with *this particular issue*. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jul 12 17:26:44 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Jul 2013 18:26:44 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: 12.07.13 17:10, Joshua Landau ???????(??): > int and float are obviously meant to handle abstract inputs (not > expressions) and unicode infinity is an extension of this. Your > "analogies" are inapt. Why you think ? (this is only one symbol!) and 3.(142857) (this is a decimal notation of the 22/7 fraction) are expressions, but ? or even -1 are not? From rosuav at gmail.com Fri Jul 12 17:23:28 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Jul 2013 01:23:28 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Sat, Jul 13, 2013 at 1:16 AM, Gerald Britton wrote: > "Considering that Python can't represent ? in a float anyway, I > > wouldn't be too bothered. " > >>>> import math >>>> type(math.pi) > That's an approximation to pi, which is a standard floating-point value. It's simply 3.141592653589793, nothing more nor less. Infinity is a special floating-point value that actually represents the concept of infinity, not just some huge number. Hence, infinity is special, pi is not. IEEE floating point cannot represent pi, the square root of 2, or i, but it can represent infinity and nan, so there need to be ways to create those. ChrisA From joshua at landau.ws Fri Jul 12 17:29:40 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 16:29:40 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 16:14, Gerald Britton wrote: > "Just because." > > so, maybe we should have the interpreter spit out ? instead? I don't know whether this was a joke, but just as int("?") spits out 5 and not ?, there is no reason that float("inf") should split out anything other than "inf". > I get that we special case infinity. Its an IEEE thing. I can sure > the next request coming: The various constants represented by unicode > characters. I don't see how one leads to the next. None thinks that that's a good idea. This is a *very* restricted change that fits with what we have already done. I don't get the hostility to it. I do get the objections that this isn't needed or that float() has a more restricted scope but this overt dislike to this extent surprises me. This is *minor* extension of the leniency there already is. I'm approximately neutral on the issue, but I'm definitely not as negative as a lot of the reviews it's getting. From joshua at landau.ws Fri Jul 12 17:35:43 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 16:35:43 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 16:23, Chris Angelico wrote: > On Sat, Jul 13, 2013 at 1:16 AM, Gerald Britton > wrote: >> "Considering that Python can't represent ? in a float anyway, I >> >> wouldn't be too bothered. " >> >>>>> import math >>>>> type(math.pi) >> > > That's an approximation to pi, which is a standard floating-point > value. It's simply 3.141592653589793, nothing more nor less. Actually, it's 3.1415926535897932. 3.141592653589793 isn't a real floating-point number, and is "rounded" on evaluation. //Seriouslypedanticcomment From storchaka at gmail.com Fri Jul 12 17:42:02 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Jul 2013 18:42:02 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: 12.07.13 17:52, Chris Angelico ???????(??): > On Sat, Jul 13, 2013 at 12:43 AM, Gerald Britton > wrote: >> So, if Python doesn't recognize the symbol for pi, why should it >> recognize the one for infinity? > > Considering that Python can't represent ? in a float anyway, I > wouldn't be too bothered. However Python can represent ? in a float. Shouldn't it recognize the symbol for ?? From joshua at landau.ws Fri Jul 12 17:45:41 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 16:45:41 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 16:42, Serhiy Storchaka wrote: > 12.07.13 17:52, Chris Angelico ???????(??): >> >> On Sat, Jul 13, 2013 at 12:43 AM, Gerald Britton >> wrote: >>> >>> So, if Python doesn't recognize the symbol for pi, why should it >>> recognize the one for infinity? >> >> >> Considering that Python can't represent ? in a float anyway, I >> wouldn't be too bothered. > > > However Python can represent ? in a float. Shouldn't it recognize the symbol > for ?? No. Why would we special-case ?? We'd need a ton of code just to special-case ?, ?, ?, ?, ?, ?, ?, etc. We don't need or want to special-case any more values. From joshua at landau.ws Fri Jul 12 17:50:04 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 16:50:04 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: On 12 July 2013 16:26, Serhiy Storchaka wrote: > 12.07.13 17:10, Joshua Landau ???????(??): > >> int and float are obviously meant to handle abstract inputs (not >> expressions) and unicode infinity is an extension of this. Your >> "analogies" are inapt. > > > Why you think ? (this is only one symbol!) and 3.(142857) (this is a decimal > notation of the 22/7 fraction) are expressions, but ? or even -1 are not? For the same reason that 0.5 and [0, 1, 2, 3, 4] are literals but 1/2 and range(5) are not. From rosuav at gmail.com Fri Jul 12 17:51:20 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Jul 2013 01:51:20 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Sat, Jul 13, 2013 at 1:42 AM, Serhiy Storchaka wrote: > 12.07.13 17:52, Chris Angelico ???????(??): >> >> On Sat, Jul 13, 2013 at 12:43 AM, Gerald Britton >> wrote: >>> >>> So, if Python doesn't recognize the symbol for pi, why should it >>> recognize the one for infinity? >> >> >> Considering that Python can't represent ? in a float anyway, I >> wouldn't be too bothered. > > > However Python can represent ? in a float. Shouldn't it recognize the symbol > for ?? That one would be more plausible, in the same way that many of the other Unicode digits are accepted. Not sure there's all that much of a use-case for it, though, and if it's going to complicate the code I wouldn't bother; for instance, it's fairly obvious that "3?" should be accepted, but what does "?3" mean? I'm -0 on it initially, but would change that to +0 if a suitable answer is found for that (even if it's "raise ValueError, same as float('1.1.1') does") that doesn't make the code horrendous. ChrisA From abarnert at yahoo.com Fri Jul 12 18:01:42 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Jul 2013 09:01:42 -0700 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: Sent from a random iPhone On Jul 12, 2013, at 1:49, Joshua Landau wrote: > On 12 July 2013 09:03, Andrew Barnert wrote: >>> I'd personally be in favour of the notion of also allowing strings as the first argument, so you could instead write: >> >>> >>> data = fold("+=", [], iterables) >> >> >> I like this idea, but only if it's added to other functions in the stdlib where it makes sense, and easy to add to new functions of your own. > > Is there a use-case for that last part? It strikes me as equivalent to > messing with builtins, which is largely unliked. I think you missed the word "to" in "add to". I don't want to create functions that can be passed to fold as if they were operators, I want to create functions that can take operators the same way fold does. The former is akin to messing with builtins, which is bad; the latter is akin to using them. From joshua at landau.ws Fri Jul 12 18:03:37 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 17:03:37 +0100 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: On 12 July 2013 17:01, Andrew Barnert wrote: > On Jul 12, 2013, at 1:49, Joshua Landau wrote: >> On 12 July 2013 09:03, Andrew Barnert wrote: >>>> >>>> I'd personally be in favour of the notion of also allowing strings as the first argument, so you could instead write: >>>> >>>> data = fold("+=", [], iterables) >>> >>> I like this idea, but only if it's added to other functions in the stdlib where it makes sense, and easy to add to new functions of your own. >> >> Is there a use-case for that last part? It strikes me as equivalent to >> messing with builtins, which is largely unliked. > > I think you missed the word "to" in "add to". I don't want to create functions that can be passed to fold as if they were operators, I want to create functions that can take operators the same way fold does. > > The former is akin to messing with builtins, which is bad; the latter is akin to using them. Apologies; in that case I agree completely. From gerald.britton at gmail.com Fri Jul 12 18:06:47 2013 From: gerald.britton at gmail.com (Gerald Britton) Date: Fri, 12 Jul 2013 12:06:47 -0400 Subject: [Python-ideas] =?utf-8?b?RndkOiAgZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYn?= =?utf-8?q?=29?= In-Reply-To: References: Message-ID: This is a little off-topic. Can anyone tell me why we support numerals in other alphabets but apparently not Greek? On Fri, Jul 12, 2013 at 11:29 AM, Joshua Landau wrote: > On 12 July 2013 16:14, Gerald Britton wrote: >> "Just because." >> >> so, maybe we should have the interpreter spit out ? instead? > > I don't know whether this was a joke, but just as int("?") spits out 5 > and not ?, there is no reason that float("inf") should split out > anything other than "inf". > >> I get that we special case infinity. Its an IEEE thing. I can sure >> the next request coming: The various constants represented by unicode >> characters. > > I don't see how one leads to the next. None thinks that that's a good > idea. This is a *very* restricted change that fits with what we have > already done. > > I don't get the hostility to it. I do get the objections that this > isn't needed or that float() has a more restricted scope but this > overt dislike to this extent surprises me. This is *minor* extension > of the leniency there already is. I'm approximately neutral on the issue, > but I'm definitely not as negative as a lot of the reviews it's > getting. -- Gerald Britton -- Gerald Britton From kwpolska at gmail.com Fri Jul 12 18:08:19 2013 From: kwpolska at gmail.com (=?UTF-8?B?Q2hyaXMg4oCcS3dwb2xza2HigJ0gV2Fycmljaw==?=) Date: Fri, 12 Jul 2013 18:08:19 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 5:51 PM, Chris Angelico wrote: > On Sat, Jul 13, 2013 at 1:42 AM, Serhiy Storchaka wrote: >> 12.07.13 17:52, Chris Angelico ???????(??): >>> >>> On Sat, Jul 13, 2013 at 12:43 AM, Gerald Britton >>> wrote: >>>> >>>> So, if Python doesn't recognize the symbol for pi, why should it >>>> recognize the one for infinity? >>> >>> >>> Considering that Python can't represent ? in a float anyway, I >>> wouldn't be too bothered. >> >> >> However Python can represent ? in a float. Shouldn't it recognize the symbol >> for ?? > > That one would be more plausible, in the same way that many of the > other Unicode digits are accepted. Not sure there's all that much of a > use-case for it, though, and if it's going to complicate the code I > wouldn't bother; for instance, it's fairly obvious that "3?" should be > accepted, but what does "?3" mean? I'm -0 on it initially, but would > change that to +0 if a suitable answer is found for that (even if it's > "raise ValueError, same as float('1.1.1') does") that doesn't make the > code horrendous. Umm, last time I checked, ?*? = 1.5. Which brings us to yet another problem: will we implement magic so that '10?' becomes 100? And '?25' becomes 5? -- Kwpolska | GPG KEY: 5EAAEA16 stop html mail | always bottom-post http://asciiribbon.org | http://caliburn.nl/topposting.html From alexander.belopolsky at gmail.com Fri Jul 12 18:09:56 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 12 Jul 2013 12:09:56 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 11:45 AM, Joshua Landau wrote: > No. Why would we special-case ?? We'd need a ton of code just to > special-case ?, ?, ?, ?, ?, ?, ?, etc. We don't need or want to > special-case any more values. > You don't need to special-case every fraction: >>> import unicodedata >>> [unicodedata.numeric(c) for c in ['?', '?', '?', '?', '?', '?', '?']] [0.5, 0.75, 0.125, 0.25, 0.375, 0.625, 0.875] I would actually support the idea for float() to accept whatever unicodedata.numeric() accepts. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 18:09:27 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 17:09:27 +0100 Subject: [Python-ideas] =?utf-8?b?RndkOiBmbG9hdCgn4oieJyk9ZmxvYXQoJ2luZicp?= In-Reply-To: References: Message-ID: On 12 July 2013 17:06, Gerald Britton wrote: > This is a little off-topic. Can anyone tell me why we support numerals > in other alphabets but apparently not Greek? Greek *letters* are not *digits*. They are commonly associated with digits and other numbers, but are not themselves digits or numbers. Also, please don't top post. From joshua at landau.ws Fri Jul 12 18:15:22 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 17:15:22 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 17:09, Alexander Belopolsky wrote: > > On Fri, Jul 12, 2013 at 11:45 AM, Joshua Landau wrote: >> >> No. Why would we special-case ?? We'd need a ton of code just to >> special-case ?, ?, ?, ?, ?, ?, ?, etc. We don't need or want to >> special-case any more values. > > You don't need to special-case every fraction: > >>>> import unicodedata >>>> [unicodedata.numeric(c) for c in ['?', '?', '?', '?', '?', '?', '?']] > [0.5, 0.75, 0.125, 0.25, 0.375, 0.625, 0.875] That's a good point. > I would actually support the idea for float() to accept whatever > unicodedata.numeric() accepts. It doesn't free us from tons of special cases. We already have people arguing over whether 3? is the traditional 3.5 or whether it's 1.5. We'll need tons of new parsing rules. I'm not convinced, in other words. Additionally, core devs have rejected multiple "negative signs" despite the fact that there is an alternate standard "negative sign". A change of this size is a bit further than reasonable. If it were as simple as just calling "unicodedata.numeric" rather than rewriting the whole float and int parser, I'd not be so hesitant. From rosuav at gmail.com Fri Jul 12 18:28:39 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Jul 2013 02:28:39 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Sat, Jul 13, 2013 at 2:08 AM, Chris ?Kwpolska? Warrick wrote: > On Fri, Jul 12, 2013 at 5:51 PM, Chris Angelico wrote: >> ... it's fairly obvious that "3?" should be >> accepted, but what does "?3" mean? I'm -0 on it initially, but would >> change that to +0 if a suitable answer is found for that (even if it's >> "raise ValueError, same as float('1.1.1') does") that doesn't make the >> code horrendous. > > Umm, last time I checked, ?*? = 1.5. Not sure what you mean here. Yes, 3 times one half equals one and one half, but the abuttal of "3?" meant 3.5 - at least, it did in my school days. If float("?") == 0.5, then float("3?") should be 3.5. Anything else would be treating it as an expression, not a floating-point value. ChrisA From abarnert at yahoo.com Fri Jul 12 18:39:18 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Jul 2013 09:39:18 -0700 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <1373616232.40551.YahooMailNeo@web184704.mail.ne1.yahoo.com> Message-ID: <2406C6A6-6C57-465A-BAC7-C70B011878E5@yahoo.com> On Jul 12, 2013, at 3:36, Haoyi Li wrote: > I would much prefer the somewhat-more-difficult route of modifying the parser to let `a += b` be an expression, and then you could write > > data = fold(lambda a, b: a += b, [], iterables) But that doesn't really _mean_ anything as an expression; it's pure side effect. Besides, if that returns the new a then it invites all the expression order problems from C and friends that Python has always escaped; if it returns None, the fold doesn't actually work. Meanwhile, the way you wrote it: > `a += b` Reminds me. We took away backticks for repr; what about using them for quoting operators? `+=` says "quoting" in the lisp sense more loudly than the PHP sense... At least to me. And this means fold wouldn't need to accept a string; it's getting a function. Of course it's the exact opposite of Haskell, where you use backticks to turn a function into an operator and parens to turn an operator into a function. > or even groovy/scala/mathematica style > > data = fold(_ += _, [], iterables) I semi-suggested this elsewhere, but without the magic of guessing whether two underscores meant the same arg twice or two different args (so you'd have to write _1 == _2 or similar). You can actually write functions this way today using an expression template library, without macros or anything: class Expr: def __init__(self, f=identity) self.f = f def __iadd__(self, other): return Expr(compose(iadd, self.f)) def __call__(self, *args): return self.f(*args) _1, _2 = Expr(), Expr() Actually, this silly proof of concept _would_ work with _ for both args, but only because it won't work for much else (even adding literals). Anyway, while you can do this, I'm not sure you should. Boost.Lambda had this kind of magic, and nobody proposed it for C++11 when real lambdas were added, because it's a clumsy fit for a language that wasn't designed for it from the start--you end up needing const, var, and ref functions to paper over the gaps where overloading doesn't quite get you there, and horrible workarounds for the operators that can't be overloaded, ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jul 12 19:12:13 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Jul 2013 20:12:13 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: 12.07.13 18:50, Joshua Landau ???????(??): > On 12 July 2013 16:26, Serhiy Storchaka wrote: >> 12.07.13 17:10, Joshua Landau ???????(??): >> >>> int and float are obviously meant to handle abstract inputs (not >>> expressions) and unicode infinity is an extension of this. Your >>> "analogies" are inapt. >> >> >> Why you think ? (this is only one symbol!) and 3.(142857) (this is a decimal >> notation of the 22/7 fraction) are expressions, but ? or even -1 are not? > > For the same reason that 0.5 and [0, > 1, 2, 3, 4] are literals but 1/2 and range(5) are not. ? is not a literal. From joshua at landau.ws Fri Jul 12 19:18:51 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 18:18:51 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: On 12 July 2013 18:12, Serhiy Storchaka wrote: > 12.07.13 18:50, Joshua Landau ???????(??): > >> On 12 July 2013 16:26, Serhiy Storchaka wrote: >>> >>> 12.07.13 17:10, Joshua Landau ???????(??): >>> >>>> int and float are obviously meant to handle abstract inputs (not >>>> expressions) and unicode infinity is an extension of this. Your >>>> "analogies" are inapt. >>> >>> >>> >>> Why you think ? (this is only one symbol!) and 3.(142857) (this is a >>> decimal >>> notation of the 22/7 fraction) are expressions, but ? or even -1 are not? >> >> >> For the same reason that 0.5 and [0, >> 1, 2, 3, 4] are literals but 1/2 and range(5) are not. > > > ? is not a literal. So? float("[1, 2, 3, 4]") isn't valid -- I never claimed there was 1:1 mapping between literals and things that float should except. I said that float shouldn't parse expressions. From alexander.belopolsky at gmail.com Fri Jul 12 19:21:49 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 12 Jul 2013 13:21:49 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 12:15 PM, Joshua Landau wrote: > > I would actually support the idea for float() to accept whatever > > unicodedata.numeric() accepts. > > It doesn't free us from tons of special cases. We already have people > arguing over whether 3? is the traditional 3.5 or whether it's 1.5. > We'll need tons of new parsing rules. I'm not convinced, in other > words. I should have explained my idea in more detail. I am not suggesting that float('3?') should work. I wrote: "float() to accept whatever unicodedata.numeric() accepts" and >>> unicodedata.numeric('3?') Traceback (most recent call last): File "", line 1, in TypeError: need a single Unicode character as parameter Since python does not have a character type, I think it is acceptable for single-character strings to be special. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Jul 12 19:27:03 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 12 Jul 2013 13:27:03 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On Fri, Jul 12, 2013 at 12:15 PM, Joshua Landau wrote: > Additionally, core devs have rejected multiple "negative signs" > despite the fact that there is an alternate standard "negative sign". > Did they? http://bugs.python.org/issue10581#msg191014 -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jul 12 19:29:10 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Jul 2013 20:29:10 +0300 Subject: [Python-ideas] Fwd: Python Convert In-Reply-To: References: <9BE2F04C-4148-43A0-BBF6-F63C29503C49@yahoo.com> Message-ID: 12.07.13 07:10, Daniel Rode ???????(??): >>> Maybe even INT? >> What does that do if I use it? > It could be used to convert a string to an integer (if applicable). It is a good idea for an April fool's joke. Encodings 'int', 'float' and 'open' which convert a string or bytes object to an integer, float or io.FileIO. From joshua at landau.ws Fri Jul 12 19:29:27 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 18:29:27 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 18:27, Alexander Belopolsky wrote: > > On Fri, Jul 12, 2013 at 12:15 PM, Joshua Landau wrote: >> >> Additionally, core devs have rejected multiple "negative signs" >> despite the fact that there is an alternate standard "negative sign". > > > Did they? > > http://bugs.python.org/issue10581#msg191014 Ah, thanks -- I'm actually glad to be corrected here. From storchaka at gmail.com Fri Jul 12 19:58:46 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Jul 2013 20:58:46 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: 12.07.13 20:18, Joshua Landau ???????(??): > On 12 July 2013 18:12, Serhiy Storchaka wrote: >> 12.07.13 18:50, Joshua Landau ???????(??): >>> On 12 July 2013 16:26, Serhiy Storchaka wrote: >>>> 12.07.13 17:10, Joshua Landau ???????(??): >>>>> int and float are obviously meant to handle abstract inputs (not >>>>> expressions) and unicode infinity is an extension of this. Your >>>>> "analogies" are inapt. >>>> >>>> Why you think ? (this is only one symbol!) and 3.(142857) (this is a >>>> decimal >>>> notation of the 22/7 fraction) are expressions, but ? or even -1 are not? >>> >>> For the same reason that 0.5 and [0, >>> 1, 2, 3, 4] are literals but 1/2 and range(5) are not. >> >> ? is not a literal. > > So? float("[1, 2, 3, 4]") isn't valid -- I never claimed there was 1:1 > mapping between literals and things that float should except. I said > that float shouldn't parse expressions. I agree. But how is it related to ? and 3.(142857)? From joshua at landau.ws Fri Jul 12 20:07:07 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 19:07:07 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 18:21, Alexander Belopolsky wrote: > > On Fri, Jul 12, 2013 at 12:15 PM, Joshua Landau wrote: >> >> > I would actually support the idea for float() to accept whatever >> > unicodedata.numeric() accepts. >> >> It doesn't free us from tons of special cases. We already have people >> arguing over whether 3? is the traditional 3.5 or whether it's 1.5. >> We'll need tons of new parsing rules. I'm not convinced, in other >> words. > > > I should have explained my idea in more detail. I am not suggesting that > float('3?') should work. I wrote: "float() to accept whatever > unicodedata.numeric() accepts" and > >>>> unicodedata.numeric('3?') > Traceback (most recent call last): > File "", line 1, in > TypeError: need a single Unicode character as parameter > > Since python does not have a character type, I think it is acceptable for > single-character strings to be special. For reference, you want to define these: ?: 2 ?: 19 ?: 90 ?: 6 ?: 20 ?: 3/4 ?: 7/8 ?: 80 ?: 12 ?: 100 ?: 1000 ?: 20 ?: 6 ?: 30 ?: 3 ?: 10 ?: 16 ?: 12 ?: 100 ?: 23 ?: 9 ?: 1/5 ?: 31 ?: 100 ?: 40 ?: 10000 ?: 5/8 ?: 10 ?: 10 ?: 9 ?: 500 ?: 40 ?: 1 ?: 1000 ?: 0 ?: 100 ?: 15/2 ?: 3 ?: 40 ?: 43 ?: 4 ?: 1/4 ?: 300 ?: 4 ?: 7 ?: 5 ?: 7 ?: 4000 ?: 1 ?: 9 ?: 50 ?: 4 ?: 3 ?: 30000 ?: 1 ?: 9 ?: 3 ?: 13 ?: 60 ?: 3 ?: 2 ?: 500 ?: 10000 ?: 7 ?: 1 ?: 2 ?: 0 ?: 17 ?: 3 ?: 0 ?: 9 ?: 18 ?: 4 ?: 7 ?: 5000 ?: 80 ?: 1 ?: 1/8 ?: 1/8 ?: 11 ?: 6 ?: 9 ?: 100 ?: 1/4 ?: 1 ?: 10 ?: 3 ?: 7 ?: 100 ?: 11 ?: 6 ?: 60 ?: 3 ?: 13 ?: 1/10 ?: 15 ?: 10 ?: 1 ?: 5 ?: 5/6 ?: 4 ?: 28 ?: 1000 ?: 100 ?: 5/6 ?: 20 ?: 1 ?: 3 ?: 8 ?: 50 ?: 19 ?: 10 ?: 6 ?: 5 ?: 10 ?: 4 ?: 80000 ?: 38 ?: 1/8 ?: 46 ?: 7 ?: 4 ?: 50 ?: 1000 ?: 1 ?: 1/9 ?: 4 ?: 5 ?: 1 ?: 9 ?: 11 ?: 9/2 ?: 10000 ?: 30 ?: 1/6 ?: 4 ?: 70 ?: 16 ?: 9 ?: 4 ?: 200 ?: 10000 ?: 1/2 ?: 8 ?: 100 ?: 4 ?: 1 ?: 7 ?: 20 ?: 6 ?: 6 ?: 4 ?: 10 ?: 3 ?: 7 ?: 18 ?: 20 ?: 10 ?: 2 ?: 0 ?: 2/3 ?: 1/16 ?: 100 ?: 8 ?: 800 ?: 10 ?: 1000 ?: 50 ?: 4 ?: 9000 ?: 1 ?: 1 ?: 8 ?: 1000 ?: 30 ?: 10 ?: 9 ?: 1000 ?: 14 ?: 21 ?: 10 ?: 5 ?: 32 ?: 4 ?: 29 ?: 12 ?: 1 ?: 100 ?: 1000 ?: 90 ?: 50000 ?: 8 ?: 20 ?: 3/4 ?: 500 ?: 5 ?: -1/2 ?: 8 ?: 41 ?: 3/16 ?: 2/3 ?: 30 ?: 49 ?: 4 ?: 2 ?: 50 ?: 2 ?: 6000 ?: 3 ?: 5 ?: 3/2 ?: 10 ?: 50000 ?: 2 ?: 11 ?: 1/4 ?: 2 ?: 5 ?: 4 ?: 80 ?: 3 ?: 1 ?: 19 ?: 4 ?: 700 ?: 40 ?: 9 ?: 10 ?: 2 ?: 7 ?: 1 ?: 5 ?: 3 ?: 9 ?: 40 ?: 8 ?: 19 ?: 3 ?: 100 ?: 500 ?: 1/4 ?: 1/2 ?: 2 ?: 70 ?: 100 ?: 50 ?: 1/2 ?: 80 ?: 1 ?: 5 ?: 11 ?: 9 ?: 4 ?: 60 ?: 9 ?: 50 ?: 13 ?: 4 ?: 10 ?: 26 ?: 2/3 ?: 16 ?: 3 ?: 2 ?: 1 ?: 34 ?: 2 ?: 7 ?: 3 ?: 6 ?: 6 ?: 500 ?: 50 ?: 2 ?: 2 ?: 70 ?: 500 ?: 3/4 ?: 13/2 ?: 36 ?: 1/4 ?: 10000 ?: 6 ?: 30 ?: 44 ?: 1/7 ?: 1/2 ?: 5 ?: 1 ?: 7 ?: 3000 ?: 5 ?: 20000 ?: 7 ?: 10 ?: 2 ?: 0 ?: 1000 ?: 50 ?: 14 ?: 1/3 ?: 400 ?: 4 ?: 1/3 ?: 2 ?: 50000 ?: 100000000?: 18 ?: 5 ?: 10 ?: 4/5 ?: 6 ?: 5 ?: 1000 ?: 20 ?: 16 ?: 2 ?: 3/16 ?: 600 ?: 1/16 ?: 3 ?: 10 ?: 0 ?: 20 ?: 90 ?: 10 ?: 3/8 ?: 6 ?: 7 ?: 12 ?: 2 ?: 70 ?: 7 ?: 50 ?: 7 ?: 4 ?: 50 ?: 27 ?: 10 ?: 1 ?: 35 ?: 20 ?: 3 ?: 7 ?: 1/2 ?: 1000 ?: 4 ?: 3 ?: 10 ?: 80 ?: 7 ?: 50 ?: 1/16 ?: 70000 ?: 30 ?: 39 ?: 3 ?: 47 ?: 1 ?: 2 ?: 5 ?: 8 ?: 9 ?: 8000 ?: 7/2 ?: 8 ?: 9 ?: 20 ?: 8 ?: 5 ?: 4 ?: 7 ?: 17 ?: 100 ?: 1 ?: 5 ?: 900 ?: 70 ?: 50 ?: 10000 ?: 5 ?: 6 ?: 3 ?: 2 ?: 2 ?: 1/4 ?: 19 ?: 17 ?: 100 ?: 1/2 ?: 10 ?: 5 ?: 3 ?: 2 ?: 0 ?: 16 ?: 500 ?: 20 ?: 1/4 ?: 3 ?: 5 ?: 60 ?: 50 ?: 2 ?: 300 ?: 20 ?: 11 ?: 500 ?: 0 ?: 40 ?: 2 ?: 6 ?: 2/5 ?: 2/3 ?: 24 ?: 100 ?: 3 ?: 17 ?: 1 ?: 2 ?: 13 ?: 80 ?: 500 ?: 3 ?: 6 ?: 30 ?: 1000000000000?: 10 ?: 900 ?: 9 ?: 50 ?: 5000 ?: 50 ?: 100 ?: 17/2 ?: 2 ?: 4 ?: 42 ?: 8 ?: 1/3 ?: 4 ?: 100000000?: 2 ?: 10 ?: 3 ?: 5000 ?: 5 ?: 1/2 ?: 8 ?: 40000 ?: 1 ?: 3/5 ?: 12 ?: 5 ?: 1 ?: 90 ?: 20 ?: 4 ?: 600 ?: 2 ?: 2 ?: 5000 ?: 8 ?: 1/8 ?: 6 ?: 6 ?: 3 ?: 5 ?: 8 ?: 2 ?: 1000 ?: 18 ?: 200 ?: 3/16 ?: 4 ?: 400 ?: 5000 ?: 12 ?: 3/4 ?: 1 ?: 10 ?: 1000 ?: 1 ?: 10 ?: 1/6 ?: 1/2 ?: 8 ?: 6 ?: 50 ?: 30 ?: 14 ?: 70 ?: 4 ?: 500 ?: 3 ?: 10 ?: 300 ?: 20 ?: 25 ?: 2 ?: 2 ?: 1000 ?: 10 ?: 9 ?: 90 ?: 7 ?: 1/4 ?: 3 ?: 20 ?: 5 ?: 100 ?: 9 ?: 700 ?: 10 ?: 37 ?: 90000 ?: 1/2 ?: 10 ?: 45 ?: 20 ?: 3/4 ?: 1/8 ?: 2000 ?: 4 ?: 500 ?: 10000 ?: 7 ?: 2 ?: 11/2 ?: 2 ?: 50 ?: 6 ?: 15 ?: 1 ?: 10 ?: 3 ?: 3 ?: 5 ?: 3 ?: 1/4 ?: 5 ?: 3 ?: 6 ?: 1000 ?: 7 ?: 4 ?: 7 ?: 100000 ?: 60 ?: 10 ?: 90 ?: 100 ?: 1 ?: 17 ?: 8 ?: 4 ?: 3/4 ?: 900 ?: 9 ?: 1 ?: 3 ?: 10 ?: 3 ?: 100 ?: 10 ?: 5 ?: 5 ?: 60 ?: 20 ?: 3 ?: 50 ?: 3 ?: 15 ?: 5 ?: 40 ?: 8 ?: 22 ?: 1000 ?: 10 ?: 30 ?: 500 ?: 1 ?: 2 ?: 2/3 ?: 10 ?: 15 ?: 1/3 ?: 2 ?: 14 ?: 9 ?: 20 ?: 2 ?: 30 ?: 5 ?: 6 ?: 40 ?: 100 ?: 60000 ?: 5 ?: 3 ?: 40 ?: 6 ?: 10 ?: 1000 ?: 4 ?: 40 ?: 48 ?: 3 ?: 800 ?: 1 ?: 4 ?: 10 ?: 50000 ?: 7000 ?: 3 ?: 5/2 ?: 9 ?: 3 ?: 2 ?: 10 ?: 10 ?: 2 ?: 50 ?: 3 ?: 20 ?: 18 ?: 1 ?: 30 ?: 33 ?: 1 Personally I'm not too sure that ? and ? and co. really deserve to be parsed... -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jul 12 20:17:13 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Jul 2013 21:17:13 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: 12.07.13 18:51, Chris Angelico ???????(??): > On Sat, Jul 13, 2013 at 1:42 AM, Serhiy Storchaka wrote: >> 12.07.13 17:52, Chris Angelico ???????(??): >>> On Sat, Jul 13, 2013 at 12:43 AM, Gerald Britton >>> wrote: >>>> So, if Python doesn't recognize the symbol for pi, why should it >>>> recognize the one for infinity? >>> >>> Considering that Python can't represent ? in a float anyway, I >>> wouldn't be too bothered. >> >> However Python can represent ? in a float. Shouldn't it recognize the symbol >> for ?? > > That one would be more plausible, in the same way that many of the > other Unicode digits are accepted. Not sure there's all that much of a > use-case for it, though, and if it's going to complicate the code I > wouldn't bother; for instance, it's fairly obvious that "3?" should be > accepted, but what does "?3" mean? I'm -0 on it initially, but would > change that to +0 if a suitable answer is found for that (even if it's > "raise ValueError, same as float('1.1.1') does") that doesn't make the > code horrendous. This will complicate the code is not more than recognizing ?. I don't propose accepting ?. I just noticed that the accepting ? will open a wide gate for a lot of other cases. From brett at python.org Fri Jul 12 21:32:29 2013 From: brett at python.org (Brett Cannon) Date: Fri, 12 Jul 2013 15:32:29 -0400 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: <20130712105449.6b9fa825@anarchist> References: <20130712105449.6b9fa825@anarchist> Message-ID: On Fri, Jul 12, 2013 at 10:54 AM, Barry Warsaw wrote: > On Jul 12, 2013, at 04:01 PM, Nick Coghlan wrote: > > >I'd personally be in favour of the notion of also allowing strings as the > >first argument, so you could instead write: > > > > data = fold("+=", [], iterables) > > You had me until here... > > >(Independent of this idea, it would actually be nice if the operator > module > >had a dictionary mapping from op symbols to names, like > >operator.by_symbol["+="] giving operator.iadd) > > ...but this is a neat idea. > +1 from me as well. The table already exists in the docs ( http://docs.python.org/3.4/library/operator.html#module-operator), it just needs to be codified. Maybe operator.map['+='] or operator.from_syntax['+=']. Go really nuts and support ['.attribute'] or ['[42]'] to auto-generate attrgetter or itemgetter instances, but that's probably just asking for support problems. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Fri Jul 12 21:49:55 2013 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 12 Jul 2013 14:49:55 -0500 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: Message-ID: On 07/12/2013 01:01 AM, Nick Coghlan wrote: > Efficiently merging a collection of iterables into a list would then just be: > > data = fold(operator.iadd, [], iterables) > > I'd personally be in favour of the notion of also allowing strings as the > first argument, so you could instead write: > > data = fold("+=", [], iterables) > > > This could also be introduced as an alternative API in functools. How about if start was first... and could take a class? data = fold(class, op, iterables) data = fold(instance, op, iterables) > (Independent of this idea, it would actually be nice if the operator module > had a dictionary mapping from op symbols to names, like > operator.by_symbol["+="] giving operator.iadd) I'm guessing that this is how that might be used? name = operator.symbols['+='] op_method = start.__getattribute__(name) Looks good to me. +1 Cheers, Ron From alexander.belopolsky at gmail.com Fri Jul 12 21:53:50 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 12 Jul 2013 15:53:50 -0400 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: <20130712105449.6b9fa825@anarchist> References: <20130712105449.6b9fa825@anarchist> Message-ID: On Fri, Jul 12, 2013 at 10:54 AM, Barry Warsaw wrote: > >(Independent of this idea, it would actually be nice if the operator > module > >had a dictionary mapping from op symbols to names, like > >operator.by_symbol["+="] giving operator.iadd) > > ...but this is a neat idea. -1 This is neat, but I don't really see much use beyond implementing things like fold("+=", ..) that you've just rejected. -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Fri Jul 12 22:13:31 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 12 Jul 2013 23:13:31 +0300 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <20130712105449.6b9fa825@anarchist> Message-ID: 12.07.13 22:53, Alexander Belopolsky ???????(??): > On Fri, Jul 12, 2013 at 10:54 AM, Barry Warsaw > > wrote: > >(Independent of this idea, it would actually be nice if the > operator module > >had a dictionary mapping from op symbols to names, like > >operator.by_symbol["+="] giving operator.iadd) > > ...but this is a neat idea. > > -1 > > This is neat, but I don't really see much use beyond implementing things > like fold("+=", ..) that you've just rejected. Concur with Alexander. Why you want yet one alternative names for operators? What are alternative names for operator.neg() and operator.sub(). If any of them is not "-" then what is a benefit? From zachary.ware+pyideas at gmail.com Fri Jul 12 22:48:41 2013 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Fri, 12 Jul 2013 15:48:41 -0500 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <20130712105449.6b9fa825@anarchist> Message-ID: On Fri, Jul 12, 2013 at 2:32 PM, Brett Cannon wrote: > > > > On Fri, Jul 12, 2013 at 10:54 AM, Barry Warsaw wrote: >> >> On Jul 12, 2013, at 04:01 PM, Nick Coghlan wrote: >> >> >I'd personally be in favour of the notion of also allowing strings as the >> >first argument, so you could instead write: >> > >> > data = fold("+=", [], iterables) >> >> You had me until here... >> >> >(Independent of this idea, it would actually be nice if the operator >> > module >> >had a dictionary mapping from op symbols to names, like >> >operator.by_symbol["+="] giving operator.iadd) >> >> ...but this is a neat idea. > > > +1 from me as well. The table already exists in the docs > (http://docs.python.org/3.4/library/operator.html#module-operator), it just > needs to be codified. Maybe operator.map['+='] or > operator.from_syntax['+=']. Go really nuts and support ['.attribute'] or > ['[42]'] to auto-generate attrgetter or itemgetter instances, but that's > probably just asking for support problems. > I decided to take a stab at this idea and thus created issue18436[1]. -- Zach [1]http://bugs.python.org/issue18436 From sergemp at mail.ru Fri Jul 12 22:52:41 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 12 Jul 2013 23:52:41 +0300 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> Message-ID: <20130712235241.0736951d@sergey> On Jul 11, 2013 Ron Adam wrote: >> It's just instead of discussing what is the best way to fix a slowness, >> I'm spending most time trying to convince people that slowness should >> be fixed. >> ? sum is slow for lists, let's fix that! >> ? you shouldn't use sum... >> ? why can't I use sum? >> ? because it's slow >> ? then let's fix that! >> ? you shouldn't use sum... >> I haven't thought that somebody can truly believe that something should >> be slow, and will find one excuse after another instead of just fixing >> the slowness. > > My advise is to not try so hard to change an individuals mind. I'm not trying to change someones mind (well, maybe I do, but that's just a side-effect). I'm trying to understand their mind. What I'm really trying is to find a solution that would take in account as many opinions as possible. But I need to understand them to do that. I understood when Steven said "I am uncomfortable about changing the semantics to use __iadd__ instead of __add__". I was unsure about that too, but since its not officially documented that sum() uses __add__, I was hoping that nobody is relying on it and nothing would break if this is changed. I understood Joshua when it appeared that such change can break numpy-based code, and he said "We can't rush a semantic change for code that's in popular usage... it seems I've left for the dark side." (and I agree with him, that's why I suggested [1] to record that so others would not be tempted to do that again, so we're still on the same side about that patch :)). I even understood when Stefan said that using "+" and sum() for concatenation makes no (obvious) sense. IMO, it does not matter for our case, it's just a feature that you should be aware of. Using "+" to add lists is like using 2**20 (e.g. instead of 2^20) to get a power, it's neither good nor bad, it's just how it is named here. I mean, I do not agree with that point, but I understand it. But I cannot understand Andrew. From the very beginning it seemed that he's mainly concerned about speed, he was constantly asking me how to speed up different types (or rather he was insisting that I cannot speed them up). When I explicitly asked him whether he thinks that sum() must not be optimized JUST because of possible speed issues with other types he said "Yes". But in the next email he says that speed is not the main reason... So I should either stop trying to understand him (and I don't want to, because discussing the problem with him have inspired me with new ideas) or I'm doomed to repeat the same questions over and over in different forms until I finally understand what he really wants. > Are you familiar with the informal voting system we use? Basically > take a look though the discussion and look for [...] Thank you for detailed explanation. Unfortunatelly I can't just account them once and forget about that, because I adapt my suggestion to these opinions. I.e. initially there was just one patch suggested and now there're three patches and two more ideas waiting to be discussed and, maybe, modified again. > So.. make the numbers case faster, but probably don't bother changing the > non numbers case. (It seems like this is the preferred view so far.) > > There might be some support for depreciating the non-numbers > case. I'm not sugesting that *you* do that btw... see below. :-) Sum is not just for numbers. It's a rather good choice to add many things, including timedeltas and different numpy types. That's why it was never restricted to work on numbers only. It only has string restriction (for historical reasons and it's more a note to newbies than a restriction because it can be easily tricked if needed). So we can't just deprecate non-numbers, well, we can but I don't think it's a good idea. -- [1] http://bugs.python.org/issue18305#msg192956 http://bugs.python.org/file30904/fastsum-iadd_warning.patch From zachary.ware+pyideas at gmail.com Fri Jul 12 22:53:40 2013 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Fri, 12 Jul 2013 15:53:40 -0500 Subject: [Python-ideas] Rehabilating reduce (as "fold") In-Reply-To: References: <20130712105449.6b9fa825@anarchist> Message-ID: On Fri, Jul 12, 2013 at 2:53 PM, Alexander Belopolsky wrote: > > On Fri, Jul 12, 2013 at 10:54 AM, Barry Warsaw wrote: >> >> >(Independent of this idea, it would actually be nice if the operator >> > module >> >had a dictionary mapping from op symbols to names, like >> >operator.by_symbol["+="] giving operator.iadd) >> >> ...but this is a neat idea. > > > -1 > > This is neat, but I don't really see much use beyond implementing things > like fold("+=", ..) that you've just rejected. I do somewhat agree that there may not be much place to use this in the standard library, but I think it could make use of the operator module a bit easier to read in some cases. For instance, it takes me a second thought to correctly parse `operator.irshift` as "right-shift in-place" instead of "IR shift" (which may or may not mean anything, but certainly doesn't in Python). On the other hand, `operator.get_op(">>=")` shows what the returned function is going to do. -- Zach From joshua at landau.ws Fri Jul 12 23:27:48 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 22:27:48 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: On 12 July 2013 18:58, Serhiy Storchaka wrote: > 12.07.13 20:18, Joshua Landau ???????(??): > > On 12 July 2013 18:12, Serhiy Storchaka wrote: >> >>> 12.07.13 18:50, Joshua Landau ???????(??): >>> >>>> On 12 July 2013 16:26, Serhiy Storchaka wrote: >>>> >>>>> 12.07.13 17:10, Joshua Landau ???????(??): >>>>> >>>>>> int and float are obviously meant to handle abstract inputs (not >>>>>> expressions) and unicode infinity is an extension of this. Your >>>>>> "analogies" are inapt. >>>>>> >>>>> >>>>> Why you think ? (this is only one symbol!) and 3.(142857) (this is a >>>>> decimal >>>>> notation of the 22/7 fraction) are expressions, but ? or even -1 are >>>>> not? >>>>> >>>> >>>> For the same reason that 0.5 and [0, >>>> 1, 2, 3, 4] are literals but 1/2 and range(5) are not. >>>> >>> >>> ? is not a literal. >>> >> >> So? float("[1, 2, 3, 4]") isn't valid -- I never claimed there was 1:1 >> mapping between literals and things that float should except. I said >> that float shouldn't parse expressions. >> > > I agree. But how is it related to ? and 3.(142857)? ? === 1/2; thus is an expression 3.(142857) is more ambiguous, because there's not actually any mathematical operator in place. But it is too much parsing for no benefit, AFAICT; you would complicate something simple to solve almost no use-cases, and then when they are used it's harder for people to work out what is meant. The informal definition for "expression" with regards to int and float I'm using is basically the measure of how much more parsing code would need to be implemented. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 23:31:19 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 22:31:19 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 19:17, Serhiy Storchaka wrote: > This will complicate the code is not more than recognizing ?. I don't > propose accepting ?. I just noticed that the accepting ? will open a wide > gate for a lot of other cases. Whilst I disagreed, the usual hunger of Python-list to jump at bad ideas like piranhas means that you've effectively been proven right. This doesn't mean that ? is a bad thing to accept -- just that if we accept it we're going to have to be ready to push back against all the other people who want silly extensions. //justmyopinionlyyrs -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Jul 12 23:33:46 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 12 Jul 2013 14:33:46 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: <20130712235241.0736951d@sergey> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> <20130712235241.0736951d@sergey> Message-ID: I think I've followed every post in this long thread and the spinoffs. I don't want to try to address all the small points and sub-threads. Here's my overall take-away: * Using 'sum()' to concatenate sequences or iterators is unintuitive, inferior to other existing techniques, and any code that does so is already *slightly* "broken." (stylistically, not functionally necessarily). Given my perception in the above bullet, I would be +1 on deprecating this use altogether (i.e. raise DeprecationWarning if a good way could be found to distinguist "sequence" from "some other thing that implements .__add__() and is numerical enough"). Even if there's no practical way to make that distinction in a duck-typed language, the documentation should (continue to) say "sum() is the wrong way to concatenate sequences". I am -1 on modifying the semantics of sum() to use .__iadd__(). Even though we have TOOWTDI, in reality there are lots of Python constructs that one *can* do, but really just *shouldn't*. Using sum() on sequences feels about like ._getframe() hacks or slightly perverse generator comprehensions on can write (e.g. for side-effects). We're all adults, and we're not going to stop you (unless we do so with DeprecationWarning), but it's just not the *right* thing to do. David... -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 12 23:40:13 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 22:40:13 +0100 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> <20130712235241.0736951d@sergey> Message-ID: On 12 July 2013 22:33, David Mertz wrote: > Even though we have TOOWTDI, in reality there are lots of Python > constructs that one *can* do, but really just *shouldn't*. Using sum() on > sequences feels about like ._getframe() hacks or slightly perverse > generator comprehensions on can write (e.g. for side-effects). We're all > adults, and we're not going to stop you (unless we do so with > DeprecationWarning), but it's just not the *right* thing to do. > Really? As bad as a frame-hack? I'm not arguing, just surprised. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jul 12 23:41:04 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Jul 2013 14:41:04 -0700 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: <51E077F0.9050806@stoneleaf.us> On 07/12/2013 02:27 PM, Joshua Landau wrote: > On 12 July 2013 18:58, Serhiy Storchaka > wrote: > >> I agree. But how is it related to ? and 3.(142857)? > > ? === 1/2; thus is an expression That's ridiculous. ? is no more an expression than "0.5" is. -- ~Ethan~ From storchaka at gmail.com Fri Jul 12 23:46:11 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 13 Jul 2013 00:46:11 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: 13.07.13 00:27, Joshua Landau ???????(??): > On 12 July 2013 18:58, Serhiy Storchaka > wrote: > I agree. But how is it related to ? and 3.(142857)? > ? === 1/2; thus is an expression 0.5 === 5/10. Isn't it an expression? > 3.(142857) is more ambiguous, because there's not actually any > mathematical operator in place. But it is too much parsing for no > benefit, AFAICT; you would complicate something simple to solve almost > no use-cases, and then when they are used it's harder for people to work > out what is meant. AFAIK children teach 3.(142857) before ?. I'm sure people use fractions and recurring decimals more often than infinity. > The informal definition for "expression" with regards > to int and float I'm using is basically the measure of how much more > parsing code would need to be implemented. ? requires no more parsing code then ?. From joshua at landau.ws Fri Jul 12 23:47:49 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 22:47:49 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E077F0.9050806@stoneleaf.us> References: <51E00525.6020700@aim-online.com> <51E077F0.9050806@stoneleaf.us> Message-ID: On 12 July 2013 22:41, Ethan Furman wrote: > On 07/12/2013 02:27 PM, Joshua Landau wrote: > > On 12 July 2013 18:58, Serhiy Storchaka > storchaka at gmail.com>> wrote: >> >> I agree. But how is it related to ? and 3.(142857)? >>> >> >> ? === 1/2; thus is an expression >> > > That's ridiculous. ? is no more an expression than "0.5" is. > When you are talking in context of float(...), I have to disagree. -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Fri Jul 12 23:52:53 2013 From: masklinn at masklinn.net (Masklinn) Date: Fri, 12 Jul 2013 21:52:53 +0000 Subject: [Python-ideas] =?cp932?b?ZmxvYXQoJ4GHJyk9ZmxvYXQoJ2luZicp?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: <8A13379B-F31C-415C-B8F9-EE31DC27892A@masklinn.net> On 12 juil. 2013, at 21:46, Serhiy Storchaka wrote: > 13.07.13 00:27, Joshua Landau ???????(??): >> On 12 July 2013 18:58, Serhiy Storchaka > > wrote: >> I agree. But how is it related to ? and 3.(142857)? >> ? === 1/2; thus is an expression > > 0.5 === 5/10. Isn't it an expression? In the context of making a difference between literal values and other expressions in python, yes 5/10 is an expression, no 0.5 is not one. From joshua at landau.ws Fri Jul 12 23:55:28 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 12 Jul 2013 22:55:28 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: On 12 July 2013 22:46, Serhiy Storchaka wrote: > 13.07.13 00:27, Joshua Landau ???????(??): > >> On 12 July 2013 18:58, Serhiy Storchaka > > wrote: >> I agree. But how is it related to ? and 3.(142857)? >> ? === 1/2; thus is an expression >> > > 0.5 === 5/10. Isn't it an expression? No. That's like saying "1 === 2/2". There is a much more obvious equivalence between two ways of writing "1/2" than between two ways of displaying the result of "1/2". > 3.(142857) is more ambiguous, because there's not actually any >> mathematical operator in place. But it is too much parsing for no >> benefit, AFAICT; you would complicate something simple to solve almost >> no use-cases, and then when they are used it's harder for people to work >> out what is meant. >> > > AFAIK children teach 3.(142857) before ?. I'm sure people use fractions > and recurring decimals more often than infinity. In my experience (I'll take a good wager I'm younger than you) people learn first about infinity, then are taught recurrence using the floating-dot syntax. The bracket form for recurrence was not taught once during high-school for me, and although "infinity" was hardly covered either it's not niche knowledge. Plus, why on earth would you use recurrence for floats? Give me a use case. There's a good reason for float infinity. Note that I'm British. > The informal definition for "expression" with regards >> to int and float I'm using is basically the measure of how much more >> parsing code would need to be implemented. >> > > ? requires no more parsing code then ?. Au contraire, if you accept ? you are bound by law to accept all of the other fractions -- that's much more code than just allowing ?. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Fri Jul 12 23:57:18 2013 From: sergemp at mail.ru (Sergey) Date: Sat, 13 Jul 2013 00:57:18 +0300 Subject: [Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?] In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DF5368.6020505@pearwood.info> <51DF57E6.8090206@mrabarnett.plus.com> Message-ID: <20130713005718.78d01516@sergey> On Jul 12, 2013 Paul Moore wrote: > On 12 July 2013 02:12, MRAB wrote: > >> While you have your cap on, if you're going to special-case lists, then >> why not strings too (just passing them on to "".join())? > > And of course, that specific question was debated, and the decision taken > to go with what we have now, when sum was first introduced. > > Someone who is arguing for this proposal needs to go back and research that > decision, and confirm that the reasons discussed then no longer apply. I > suspect many of them still do. That's what Alex Martelli, author of sum(), initially did [1]: > for the simple reason that I special-case this -- when the first > item is a PyBaseString_Type, I delegate to ''.join So you can, kind of, say that sum was DESIGNED to have special cases from the very beginning. The problem appeared for mixed lists like: sum(["str1", "str2", SomeClass, "str3"]) Or: def myit(): yield "str1" yield "str2" yield SomeClass yield "str3" sum(myit()) ''.join() can't handle such cases despite SomeClass could be addable to string. So the problem is that you can't just silently delegate the argument to ''.join() because join() first exhausts entire sequence into its own temporary list, and then fails. So the sum code had to be somewhat like that: read entire sequence into a temporary list if argument is string: try ''.join() if join succeeded: prepend first element to result and return it if argument is unicode: try u''.join() if join succeeded: prepend first element to result and return it general code here That looked unnecessarily complex. To avoid the complications it was suggested to block strings a little, and point newbies in a better direction. Yes, it was known that sum is slow for some types from the beginning, but there were no attempts to make it faster for other types, I can't find any suggestions. So the sum() came in that way: slow for lists, blocked for strings. I'm not the first one to bother about sum() being unexpectedly slow [2], and I'm not the first one trying to find a solution [3]. But it looks like I'm the first to go as far as writing patches. Maybe, just maybe, if someone came up with my ideas 10 years ago, it was accepted from the beginning, and we won't be discussing it now. -- [1] http://mail.python.org/pipermail/python-dev/2003-April/034767.html [2] http://article.gmane.org/gmane.comp.python.general/441831 Paul Rubin @ 2006-01-12 > A fast implementation would probably allocate the output list just > once and then stream the values into place with a simple index. That's what I hoped "sum" would do, but instead it barfs with a type error. [3] http://article.gmane.org/gmane.comp.python.general/658610 Alf P. Steinbach @ 2010-03-28 > From a more practical point of view, the sum efficiency could be improved by > doing the first addition using '+' and the rest using '+=', without changing > the behavior. From storchaka at gmail.com Fri Jul 12 23:58:24 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 13 Jul 2013 00:58:24 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <8A13379B-F31C-415C-B8F9-EE31DC27892A@masklinn.net> References: <51E00525.6020700@aim-online.com> <8A13379B-F31C-415C-B8F9-EE31DC27892A@masklinn.net> Message-ID: 13.07.13 00:52, Masklinn ???????(??): > On 12 juil. 2013, at 21:46, Serhiy Storchaka wrote: >> 13.07.13 00:27, Joshua Landau ???????(??): >>> On 12 July 2013 18:58, Serhiy Storchaka >> > wrote: >>> I agree. But how is it related to ? and 3.(142857)? >>> ? === 1/2; thus is an expression >> >> 0.5 === 5/10. Isn't it an expression? > > In the context of making a difference between literal values and other expressions in python, yes 5/10 is an expression, no 0.5 is not one. In this context both ? and ? are not literal values and not expressions. From mertz at gnosis.cx Sat Jul 13 00:01:50 2013 From: mertz at gnosis.cx (David Mertz) Date: Fri, 12 Jul 2013 15:01:50 -0700 Subject: [Python-ideas] Fast sum() for non-numbers - why so much worries? In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <20130711005842.13ea7ec1@sergey> <20130712235241.0736951d@sergey> Message-ID: On Fri, Jul 12, 2013 at 2:40 PM, Joshua Landau wrote: > On 12 July 2013 22:33, David Mertz wrote: > >> Even though we have TOOWTDI, in reality there are lots of Python >> constructs that one *can* do, but really just *shouldn't*. Using sum() on >> sequences feels about like ._getframe() hacks or slightly perverse >> generator comprehensions on can write (e.g. for side-effects). We're all >> adults, and we're not going to stop you (unless we do so with >> DeprecationWarning), but it's just not the *right* thing to do. >> > > Really? As bad as a frame-hack? > I'm not arguing, just surprised. > I think so. I'm not trying to hang any big moral on the exact ranking of "bad habits" in Python. But ._getframe() has the virtue that it sometimes lets you do something that you simply *cannot* do using non-hacky code. These things should be nicely hidden away in modules where average users don't have to think about the magic, but for their arcane maintainers, they can be useful. Maybe throw in some metaclass programming hacks in the same category here. In contrast, there is simply *nothing* one can do by sum()'ing sequences that you can't do with other (more) straightforward code, either itertools.chain() or a small explicit loop. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Jul 13 00:12:11 2013 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 12 Jul 2013 23:12:11 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <51E07F3B.5000405@mrabarnett.plus.com> On 12/07/2013 19:17, Serhiy Storchaka wrote: > 12.07.13 18:51, Chris Angelico ???????(??): >> On Sat, Jul 13, 2013 at 1:42 AM, Serhiy Storchaka wrote: >>> 12.07.13 17:52, Chris Angelico ???????(??): >>>> On Sat, Jul 13, 2013 at 12:43 AM, Gerald Britton >>>> wrote: >>>>> So, if Python doesn't recognize the symbol for pi, why should it >>>>> recognize the one for infinity? >>>> >>>> Considering that Python can't represent ? in a float anyway, I >>>> wouldn't be too bothered. >>> >>> However Python can represent ? in a float. Shouldn't it recognize the symbol >>> for ?? >> >> That one would be more plausible, in the same way that many of the >> other Unicode digits are accepted. Not sure there's all that much of a >> use-case for it, though, and if it's going to complicate the code I >> wouldn't bother; for instance, it's fairly obvious that "3?" should be >> accepted, but what does "?3" mean? I'm -0 on it initially, but would >> change that to +0 if a suitable answer is found for that (even if it's >> "raise ValueError, same as float('1.1.1') does") that doesn't make the >> code horrendous. > > This will complicate the code is not more than recognizing ?. I don't > propose accepting ?. I just noticed that the accepting ? will open a > wide gate for a lot of other cases. > In other words, it would be the infinite end of the wedge. :-) From alexander.belopolsky at gmail.com Sat Jul 13 00:20:25 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Fri, 12 Jul 2013 18:20:25 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: On Fri, Jul 12, 2013 at 5:55 PM, Joshua Landau wrote: > Au contraire, if you accept ? you are bound by law to accept all of the > other fractions -- that's much more code than just allowing ?. Let's see: def float(x): if x == '\u221e': return builtins.float('inf') return builtins.float(x) def float(x): if len(x) == 1: return unicodedata.numeric(x) return builtins.float(x) Where is "much more code "? And if you accept '?', aren't you "bound by law to accept '+?' and '-?' as well? -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Jul 13 00:36:32 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 13 Jul 2013 01:36:32 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E07F3B.5000405@mrabarnett.plus.com> References: <51E07F3B.5000405@mrabarnett.plus.com> Message-ID: 13.07.13 01:12, MRAB ???????(??): > On 12/07/2013 19:17, Serhiy Storchaka wrote: >> This will complicate the code is not more than recognizing ?. I don't >> propose accepting ?. I just noticed that the accepting ? will open a >> wide gate for a lot of other cases. >> > In other words, it would be the infinite end of the wedge. :-) To reach the finish line, we must first run ? of the way. Oh, we must first run ? of the way. Oh? From python at mrabarnett.plus.com Sat Jul 13 01:11:24 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 13 Jul 2013 00:11:24 +0100 Subject: [Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?] In-Reply-To: <20130713005718.78d01516@sergey> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DF5368.6020505@pearwood.info> <51DF57E6.8090206@mrabarnett.plus.com> <20130713005718.78d01516@sergey> Message-ID: <51E08D1C.20502@mrabarnett.plus.com> On 12/07/2013 22:57, Sergey wrote: > On Jul 12, 2013 Paul Moore wrote: > >> On 12 July 2013 02:12, MRAB wrote: >> >>> While you have your cap on, if you're going to special-case lists, then >>> why not strings too (just passing them on to "".join())? >> >> And of course, that specific question was debated, and the decision taken >> to go with what we have now, when sum was first introduced. >> >> Someone who is arguing for this proposal needs to go back and research that >> decision, and confirm that the reasons discussed then no longer apply. I >> suspect many of them still do. > > That's what Alex Martelli, author of sum(), initially did [1]: >> for the simple reason that I special-case this -- when the first >> item is a PyBaseString_Type, I delegate to ''.join > > So you can, kind of, say that sum was DESIGNED to have special cases > from the very beginning. > > The problem appeared for mixed lists like: > sum(["str1", "str2", SomeClass, "str3"]) > > Or: > def myit(): > yield "str1" > yield "str2" > yield SomeClass > yield "str3" > > sum(myit()) > > ''.join() can't handle such cases despite SomeClass could be addable > to string. > > So the problem is that you can't just silently delegate the argument > to ''.join() because join() first exhausts entire sequence into its > own temporary list, and then fails. So the sum code had to be somewhat > like that: > read entire sequence into a temporary list > if argument is string: try ''.join() > if join succeeded: prepend first element to result and return it > if argument is unicode: try u''.join() > if join succeeded: prepend first element to result and return it > general code here > > That looked unnecessarily complex. To avoid the complications it was > suggested to block strings a little, and point newbies in a better > direction. > > Yes, it was known that sum is slow for some types from the beginning, > but there were no attempts to make it faster for other types, I can't > find any suggestions. So the sum() came in that way: slow for lists, > blocked for strings. > > I'm not the first one to bother about sum() being unexpectedly slow > [2], and I'm not the first one trying to find a solution [3]. But it > looks like I'm the first to go as far as writing patches. > > Maybe, just maybe, if someone came up with my ideas 10 years ago, it > was accepted from the beginning, and we won't be discussing it now. > What if it did this: 1. Get the first item, or the start value if provided (i.e. not None). 2. Ask the item for an 'accumulator' (item.__accum__()). 3. Add the first item to the accumulator. 4. Add the remaining items to the accumulator. 5. Ask the accumulator for the result. If there's no accumulator available (the "__accum__" method isn't implemented), then either fall back to the current behaviour or raise a TypeError like it currently does for strings. From joshua at landau.ws Sat Jul 13 02:12:48 2013 From: joshua at landau.ws (Joshua Landau) Date: Sat, 13 Jul 2013 01:12:48 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: On 12 July 2013 23:20, Alexander Belopolsky wrote: > > On Fri, Jul 12, 2013 at 5:55 PM, Joshua Landau wrote: > >> Au contraire, if you accept ? you are bound by law to accept all of the >> other fractions -- that's much more code than just allowing ?. > > > Let's see: > > def float(x): > if x == '\u221e': > return builtins.float('inf') > return builtins.float(x) > > def float(x): > if len(x) == 1: > return unicodedata.numeric(x) > return builtins.float(x) > > Where is "much more code "? > Sorry, I didn't equate you on this thread with you on the other, where you said you wanted to special-case characters, I thought you were on about solely fractions. I don't like that idea, because then you could have stuff like ?, ?, ? and so on, leading people to assume that "??" or somesuch is valid. Additionally, accepting "?" is confusing, "?" is ridiculous and ? is just silly. And why would int("?") equal 50? Additionally, your change would affect both int and float, and require an extra check to see whether the return is an integer for int. I see no advantage in a blatant explosion of acceptable characters, and several disadvantages from having characters special-cased. And if you accept '?', aren't you "bound by law to accept '+?' and '-?' as > well? > Yes. On the basis that we except "-inf", there's really no question. The "-" is undoubtedly dealt with semi-separately, though, so it shouldn't change the amount we have to write. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sat Jul 13 02:25:05 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 13 Jul 2013 01:25:05 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: A blessing from the Gods has resulted in http://www.python.org/dev/peps/pep-0448/! See what you think; it's not too changed from before but it's mighty pretty now. Still up for discussion are the specifics of function call syntax, the full details of which should already be in the PEP. If you come up with a better suggestion or want argue for one of the choices, go ahead. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vernondcole at gmail.com Sat Jul 13 08:32:59 2013 From: vernondcole at gmail.com (Vernon D. Cole) Date: Sat, 13 Jul 2013 00:32:59 -0600 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= Message-ID: For the benefit of those who read this in ASCII, I will include Unicode translations in the following. I prefer code which is readable in ASCII (as PEP-8 suggests) which is one reason that I a little bit dislike the proposal. I had to go to the archives to even read the subject line. Nevertheless, I think that, in the Unicode world, the proposal is sound. The question was asked earlier why the Python int() and float() functions do not allow Greek numbers, when they do allow numbers from many other language character sets. The answer is in the documentation for int(): > The numeric literals accepted include the digits 0 to 9 or any Unicode > equivalent (code points with the Nd property). > The "Nd" characters are decimal digits of systems which use positional notation (i.e. Arabic numbers). The Greeks used decimal numbers, but used different symbols for one, ten, hundred, thousand, (etc.) and added them together, much like the system of Roman numbers we are familiar with. The int() parser expects Arabic formatted numbers, so it will not correctly interpret other systems of notation. In order to read such numbers, you need to use a parser which was built for them. PEP 313 suggested that a parser for Roman formatted numbers be included in Python, and it was rejected. Several algorithms for reading Roman numbers encoded using ASCII values ['i','v','x','L', (etc.)] have been published. The one I wrote goes a bit further -- it also tries to read the value of unicodedata.numeric() for each character of its input string, and sums them (sort of). It would, therefore convert all of the Greek and other characters mentioned in this thread and return a value for them. If a Greek author followed Roman formatting rules it would return a _correct_ value, too. If, on the other hand, he put a smaller valued digit on the left side of a larger digit, he would probably not appreciate the resulting subtraction. > >>> import romanclass as Roman > >>> g2 = '\U0001015c' > >>> unicodedata.name(g2) > 'GREEK ACROPHONIC THESPIAN TWO' > >>> g5000 = '\U00010172' > >>> unicodedata.name(g5000) > 'GREEK ACROPHONIC THESPIAN FIVE THOUSAND' > >>> g5002 = g5000 + g2 # string concatenation (not addition) > >>> g5002 > '\U00010172\U0001015c' > >>> Roman.Roman(g5002) > Roman(5002) > >>> print(Roman.Roman(g5002)) > ?II > >>> # but -- since Roman math subtracts values on the left... > >>> print(Roman.Roman(g2 + g5000)) > M?CMXCVIII > This is all an unimportant side effect of my attempt to support actual Unicode Roman numbers: > >>> u'\u2167' > '?' > >>> eight = Roman.Roman(u'\u2167') > >>> print(eight + 10) # NOTE: mathematical addition > XVIII > This all assumes that we are talking about Acrophonic (or Herodian or Attic) numerals. The Greeks also used Alphabetic (also called Milesian, Alexandrian, or Ionic) numerals. In that system, the value of pi ('\u03c0') is 80 (and has nothing to do with the circumference of a circle.) That usage, however, is not recognized by Unicode: > >>> '\u03c0' > '?' > >>> pi = '\u03c0' > >>> unicodedata.name(pi) > 'GREEK SMALL LETTER PI' > >>> unicodedata.numeric(pi) > Traceback (most recent call last): > File "", line 1, in > unicodedata.numeric(pi) > ValueError: not a numeric character > >>> [ as a complete side note: Greeks pronounce the name of that letter as "pea" not "pie".] That agrees with Unicode's non-recognition of the numeric value of ASCII letters used in Roman numerals: > >>> unicodedata.numeric('X') > Traceback (most recent call last): > File "", line 1, in > unicodedata.numeric('X') > ValueError: not a numeric character > >>> > Any numeric usage requires a definition of how the string is to be parsed: > >>> Roman.Roman('X') > Roman(10) > >>> float(Roman.Roman('X')) > 10.0 > >>> > So, forget all of this noise about all of the other possible things that could be done with extended definitions of float(). Any of those would require another definition, and another PEP. This proposal is for only one thing -- to make the following happen: >>> inf = '\u221e' >>> float(inf) inf >>> Mark me as +0 -- Vernon Cole -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuttu at oa-cagliari.inaf.it Sat Jul 13 10:45:37 2013 From: mbuttu at oa-cagliari.inaf.it (Marco Buttu) Date: Sat, 13 Jul 2013 10:45:37 +0200 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <51D1805B.20309@pearwood.info> References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> <51D1805B.20309@pearwood.info> Message-ID: <51E113B1.60000@oa-cagliari.inaf.it> On 07/01/2013 03:12 PM, Steven D'Aprano wrote: >> "dedent" is a weird word, maybe "unindent" would be better? > > The de- prefix is a standard English prefix meaning removal, negation > or reversal: > > http://dictionary.reference.com/browse/de- > > > Neologism or not, I think that dedent is sufficiently understandable > and widespread that there's no need to deprecate it in favour of > "outdent". It's being used in the F# and Ruby communities, as well as > Python: > > http://en.wiktionary.org/wiki/dedent > > https://github.com/cespare/ruby-dedent Hi all, this is my first message here :) I use Python since almost 7 years, and I discovered textwrap.dedent() right now, by following this thread. So in my case if yesterday I had seen there was a method str.dedent(), I should have looked at the documentation to know its meaning, but if I had seen str.unindent(), I shouldn't have looked at the doc at all, because the meaning appears to me very clear. Maybe this comes either from my poor English or from my lacks in Python (or both...), but maybe not, because looking for "unindent indentation" , "dedent indentation" or "deindent indentation" on Google, we can see people use the word "unindent" and not dedent or deindent. + 1 for unindent() http://en.wiktionary.org/wiki/unindent -- Marco Buttu INAF Osservatorio Astronomico di Cagliari Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy Phone: +39 070 71180255 Email: mbuttu at oa-cagliari.inaf.it From p.f.moore at gmail.com Sat Jul 13 11:11:07 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 13 Jul 2013 10:11:07 +0100 Subject: [Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?] In-Reply-To: <20130713005718.78d01516@sergey> References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DF5368.6020505@pearwood.info> <51DF57E6.8090206@mrabarnett.plus.com> <20130713005718.78d01516@sergey> Message-ID: On 12 July 2013 22:57, Sergey wrote: > That's what Alex Martelli, author of sum(), initially did [1]: > > for the simple reason that I special-case this -- when the first > > item is a PyBaseString_Type, I delegate to ''.join > > So you can, kind of, say that sum was DESIGNED to have special cases > from the very beginning. > Thanks for the reference. That's the *original* implementation. So why does the current sum() not do this? You need to locate the justification for the removal of this special case, and explain why that reason no longer applies. My recollection (yes, I was round for those original discussions!) is that the key point is that "Guido said no". If I'm right, have you persuaded Guido to change his mind? To my knowledge he's not commented in this thread. Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at gmail.com Sat Jul 13 11:40:44 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Sat, 13 Jul 2013 12:40:44 +0300 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: <51E113B1.60000@oa-cagliari.inaf.it> References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> <51D1805B.20309@pearwood.info> <51E113B1.60000@oa-cagliari.inaf.it> Message-ID: On 13 July 2013 11:45, Marco Buttu wrote: > On 07/01/2013 03:12 PM, Steven D'Aprano wrote: > >> "dedent" is a weird word, maybe "unindent" would be better? >>> >> >> The de- prefix is a standard English prefix meaning removal, negation or >> reversal: >> >> http://dictionary.reference.**com/browse/de- >> >> >> Neologism or not, I think that dedent is sufficiently understandable and >> widespread that there's no need to deprecate it in favour of "outdent". >> It's being used in the F# and Ruby communities, as well as Python: >> >> http://en.wiktionary.org/wiki/**dedent >> >> https://github.com/cespare/**ruby-dedent >> > > Hi all, this is my first message here :) > I use Python since almost 7 years, and I discovered textwrap.dedent() > right now, by following this thread. > So in my case if yesterday I had seen there was a method str.dedent(), I > should have looked at the > documentation to know its meaning, but if I had seen str.unindent(), I > shouldn't have looked at the doc at all, > because the meaning appears to me very clear. > Maybe this comes either from my poor English or from my lacks in Python > (or both...), but maybe not, because > looking for "unindent indentation" , "dedent indentation" or "deindent > indentation" on Google, we can see people > use the word "unindent" and not dedent or deindent. > > + 1 for unindent() > > http://en.wiktionary.org/wiki/**unindent Interestingly dedent has an entry on wiktionary, with a note that its primary use is Python: http://en.wiktionary.org/wiki/dedent So whilst the meaning is obvious to native speakers, it can hardly be said to be a term in common usage. (The wiktionary entry suggests it is a synonym for "outdent".) Michael > > > -- > Marco Buttu > > INAF Osservatorio Astronomico di Cagliari > Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy > Phone: +39 070 71180255 > Email: mbuttu at oa-cagliari.inaf.it > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From szport at gmail.com Sat Jul 13 12:29:42 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sat, 13 Jul 2013 03:29:42 -0700 (PDT) Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <3159c86f-1866-40fd-9a6f-c6efc44a8458@googlegroups.com> ???????, 13 ???? 2013 ?., 10:32:59 UTC+4 ???????????? Vernon D. Cole ???????: > > This proposal is for only one thing -- to make the following happen: > >>> inf = '\u221e' > >>> float(inf) > inf > >>> > > Exactly. But to be more complete: >> float(u'?'), inf >> float(u'-?') -inf One could go further and make a string '?' literal in python to refer to infinity. But this can only speak if this proposal makes sense. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mbuttu at oa-cagliari.inaf.it Sat Jul 13 12:44:51 2013 From: mbuttu at oa-cagliari.inaf.it (Marco Buttu) Date: Sat, 13 Jul 2013 12:44:51 +0200 Subject: [Python-ideas] Idea for new multi-line triple quote literal In-Reply-To: References: <51D0D6C3.2020502@pearwood.info> <346B812F-9987-4259-9550-DC752CF48D4A@yahoo.com> <51D0E945.90609@pearwood.info> <2ae7fe65-db62-4178-bf72-da06a787cae3@googlegroups.com> <51D1805B.20309@pearwood.info> <51E113B1.60000@oa-cagliari.inaf.it> Message-ID: <51E12FA3.4010203@oa-cagliari.inaf.it> On 07/13/2013 11:40 AM, Michael Foord wrote: > > I use Python since almost 7 years, and I discovered > textwrap.dedent() right now, by following this thread. > So in my case if yesterday I had seen there was a method > str.dedent(), I should have looked at the > documentation to know its meaning, but if I had seen > str.unindent(), I shouldn't have looked at the doc at all, > because the meaning appears to me very clear. > Maybe this comes either from my poor English or from my lacks in > Python (or both...), but maybe not, because > looking for "unindent indentation" , "dedent indentation" or > "deindent indentation" on Google, we can see people > use the word "unindent" and not dedent or deindent. > > + 1 for unindent() > > http://en.wiktionary.org/wiki/unindent > > > Interestingly dedent has an entry on wiktionary, with a note that its > primary use is Python: > > http://en.wiktionary.org/wiki/dedent > > So whilst the meaning is obvious to native speakers, it can hardly be > said to be a term in common usage. (The wiktionary entry suggests it > is a synonym for "outdent".) > > Michael And by the way, Python itself prefers to use unindent: >>> def foo(): ... """foo""" ... pass File "", line 3 pass ^ IndentationError: unindent does not match any outer indentation level -- Marco Buttu INAF Osservatorio Astronomico di Cagliari Loc. Poggio dei Pini, Strada 54 - 09012 Capoterra (CA) - Italy Phone: +39 070 71180255 Email: mbuttu at oa-cagliari.inaf.it -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Jul 13 12:45:53 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 13 Jul 2013 20:45:53 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <51E12FE1.40508@pearwood.info> On 12/07/13 23:09, Gerald Britton wrote: > Do you have the infinity symbol on your keyboard? I don't! Oh, you probably do. On Windows: Hold down ALT and type 236 on the numeric keypad. On Mac: Option-5 On Linux: Don't ask. It is possible, but only three men have ever understood how. One of them is now dead. The second is a German professor who has gone mad. And the third was me, but I have forgotten. (Apologies to Lord Palmertson.) Lack of keyboard support for non-ASCII characters is only a weak argument against. Firstly, nobody will be forced to type such non-ASCII characters. Alternatives include copy and paste, Character Map applications, or simply not using them at all. Secondly, if there is need for entering such characters, I am sure that people will develop ways to do so. After all, people manage to enter Japanese and Chinese, and program in APL. I do not believe that ? is an important enough value for most developers that Python should support it natively. If Python were *my* language, I would quite likely include ? as a built-in literal for infinity (but which one? float or Decimal?) but I recognise that is my personal quirk, a little like spelling "not equal" as != instead of <> as the FLUFL intended. [Aside: there's precedent for a programming language to understand ? directly. It has been many years since I last used it, but my recollection of Apple's HyperTalk was that ? was recognised as infinity. I may be confabulating this though.] If Python were more heavily oriented towards mathematics, then it would be arguable that ? should be a literal, or at least the '?' be understood by float. But it's not, so although I personally would use float('?') or even a literal ?, I don't think it is useful enough to the average Python programmer to justify the added complexity to the language. As tiny as that added complexity is, the added benefit is even tinier. So although my personal preference is to say "F--- yeah!" to the idea, being responsible to the entire community, my vote goes to -1. Those who want to support ? can write code to support it. The same goes to suggestions that Python support Unicode numeric non-digits like Roman numerals, fractional forms, circled numbers, etc. If you need that, write a function. The benefit to the language is smaller than the increased complexity, so -1. -- Steven From steve at pearwood.info Sat Jul 13 12:48:09 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 13 Jul 2013 20:48:09 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <51E13069.5040103@pearwood.info> On 13/07/13 16:32, Vernon D. Cole wrote: > For the benefit of those who read this in ASCII, I will include Unicode > translations in the following. I prefer code which is readable in ASCII [rant] It's 2013, not 1963. When oh when are we going to catch up with technology that I had on my Mac in 1984??? Oh, another thing... even in 1963, the ASCII standard was obsolete, since you cannot represent standard American characters in common use in 1963 such as ?. Ironically, you cannot even say "Copyright ? 1963 American Standards Association" in ASCII. I am aware of the reasons for the limitations on ASCII, but its time is long, long gone. It needs to die in peace. [/rant] Nice analysis of the Roman numerals issue though. Thanks for that. (P.S. are you aware that the practice of subtracting Roman numerals on the left was a medieval innovation, and not one that the ancient Romans themselves did?) -- Steven From flying-sheep at web.de Sat Jul 13 19:25:32 2013 From: flying-sheep at web.de (Philipp A.) Date: Sat, 13 Jul 2013 19:25:32 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> Message-ID: there is no such thing as a ?slippery slope?. ? is special. it?s a symbol for exactly one thing that happens to be the same one created using float('inf') in python. the same holds for the vulgar fractions, but they seem to be deprecated in unicode. ? is a greek letter commonly used in math to represent a number, but it?s not its one responsibility to represent that number. you could define ? to mean something else if you?re evil (i?ve even seen it done for some angle: let the angle between a and b be ?), but you can?t redefine ?. so don?t be silly everyone. the question is if float('?') should work and i say ?why the hell not? -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan at cmu.edu Sat Jul 13 19:30:09 2013 From: nathan at cmu.edu (Nathan Schneider) Date: Sat, 13 Jul 2013 13:30:09 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> Message-ID: On Sat, Jul 13, 2013 at 1:25 PM, Philipp A. wrote: > there is no such thing as a ?slippery slope?. ? is special. it?s a symbol > for exactly one thing that happens to be the same one created using > float('inf') in python. the same holds for the vulgar fractions, but they > seem to be deprecated in unicode. > > ? is a greek letter commonly used in math to represent a number, but it?s > not its one responsibility to represent that number. you could define ? to > mean something else if you?re evil (i?ve even seen it done for some angle: > let the angle between a and b be ?), but you can?t redefine ?. > > so don?t be silly everyone. the question is if float('?') should work and > i say ?why the hell not? > > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Sat Jul 13 21:33:32 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 13 Jul 2013 15:33:32 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> Message-ID: On Sat, Jul 13, 2013 at 1:25 PM, Philipp A. wrote: > so don?t be silly everyone. the question is if float('?') should work and > i say ?why the hell not? Here is why: float() remained mostly unchanged since unicode was added in Python 2.0. There were internal changes such as a better rounding algorithm and elimination of platform-dependencies when parsing special values (-0, inf, nan, etc.), but overall the design remained the same: replace unicode digits with ASCII equivalents and pass the result to a more or less equivalent of ISO C strtod. Note that even accepting non-ASCII digits is not free from criticism. Python rejected the ISO C's wcstod approach and did not make float() parsing locale-dependent. This was a good decision, but resulted in float() accepting strings with a mix of scripts that don't represent valid numbers in any system. We recently agreed that float() and int() should be changed to reject mixed scripts, but the best way to do that is still being discussed. If we decide that 3.4 is the release in which the way float() and int() parse unicode change, we should do it in a way that will last until Python 4.0 and hopefully beyond. This is what I've been trying to achieve in issue 10581 , but when it comes to reviewing actual code rather than posting expletives on python-ideas the pull of volunteers shrinks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jul 14 03:24:21 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Jul 2013 11:24:21 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> Message-ID: <51E1FDC5.1030501@pearwood.info> On 14/07/13 03:25, Philipp A. wrote: > so don?t be silly everyone. the question is if float('?') should work and i > say ?why the hell not? Firstly, let me say that I personally love the idea of float('?') working. Or even having literal ? recognised as float('inf') (or perhaps that should be Decimal('inf')?). But I'm still voting -1 on this proposal. If the best argument in favour is "why the hell not?", then whatever benefit we might gain is truly tiny. So tiny that the benefit is probably smaller than the cost. And yes, there is a cost, there is always a cost. There are one-off costs: - somebody has to program this feature; - write tests for it; - write documentation; and on-going costs: - that's one more thing for every user to learn; - programmers will have to take this feature into account whenever they use float. Now, you might argue, and I will agree, that these costs are probably tiny costs. But the benefit is even tinier. Cost: tiny, but real; Benefit: "why the hell not?" Overall benefit: negative. Here is a simple implementation that supports ?, +? and -?. _float = float def float(arg): if isinstance(arg, str): arg = arg.replace('?', 'inf') return _float(arg) Hands up anyone who already uses this, or something like it, in their code? Anyone? Given how trivial it is to build this functionality if you need it, if you haven't already done so, chances are that you don't need it, even if you think you want it. -- Steven From joshua at landau.ws Sun Jul 14 03:39:36 2013 From: joshua at landau.ws (Joshua Landau) Date: Sun, 14 Jul 2013 02:39:36 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E1FDC5.1030501@pearwood.info> References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> Message-ID: On 14 July 2013 02:24, Steven D'Aprano wrote: > On 14/07/13 03:25, Philipp A. wrote: >> >> so don?t be silly everyone. the question is if float('?') should work and >> i >> say ?why the hell not? > > > > Firstly, let me say that I personally love the idea of float('?') working. > Or even having literal ? recognised as float('inf') (or perhaps that should > be Decimal('inf')?). > > But I'm still voting -1 on this proposal. If the best argument in favour is > "why the hell not?", then whatever benefit we might gain is truly tiny. So > tiny that the benefit is probably smaller than the cost. And yes, there is a > cost, there is always a cost. There are one-off costs: That wasn't the best reason. The best reason was given by the OP, who said that it was for data input. If you receive data that uses ?, then it's useful. > - somebody has to program this feature; > - write tests for it; > - write documentation; > > and on-going costs: > > - that's one more thing for every user to learn; Doesn't apply here. > - programmers will have to take this feature into account whenever they use > float. This isn't true -- most uses of float(...) don't care about the internationalisation aspect either. Only a minority of cases will need to account for this. > Now, you might argue, and I will agree, that these costs are probably tiny > costs. But the benefit is even tinier. > > Cost: tiny, but real; > Benefit: "why the hell not?" > Overall benefit: negative. With these changes: Cost: People keep giving bad criticisms Benefit: Tiny Overall benefit: Still can't decide > Here is a simple implementation that supports ?, +? and -?. > > > _float = float > > def float(arg): > if isinstance(arg, str): > arg = arg.replace('?', 'inf') > return _float(arg) > > > Hands up anyone who already uses this, or something like it, in their code? > Anyone? Given how trivial it is to build this functionality if you need it, > if you haven't already done so, chances are that you don't need it, even if > you think you want it. I'd hope not, because that's broken code. All the more reason to accept the proposal. From rosuav at gmail.com Sun Jul 14 03:53:29 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 14 Jul 2013 11:53:29 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> Message-ID: On Sun, Jul 14, 2013 at 11:39 AM, Joshua Landau wrote: > On 14 July 2013 02:24, Steven D'Aprano wrote: >> and on-going costs: >> >> - that's one more thing for every user to learn; > > Doesn't apply here. Yes, it does; what happens to someone who reads someone else's Python code? To write code, you need to understand one way of spelling something. To read code, you need to understand _every_ way of spelling that thing. That may not be a particularly great cost in this case, but it is a cost. ChrisA From alexander.belopolsky at gmail.com Sun Jul 14 04:04:04 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 13 Jul 2013 22:04:04 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> Message-ID: On Sat, Jul 13, 2013 at 9:39 PM, Joshua Landau wrote: > The best reason was given by the OP, who > said that it was for data input. If you receive data that uses ?, then > it's useful. > Are you serious? Why would anyone use non-ASCII text format for numerical data? Note that >>> len('?'.encode()) 3 so using '?' does not even save space compared to 'inf'. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jul 14 04:04:17 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Jul 2013 19:04:17 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On Fri, Jul 12, 2013 at 5:25 PM, Joshua Landau wrote: > A blessing from the Gods has resulted in > http://www.python.org/dev/peps/pep-0448/! See what you think; it's not too > changed from before but it's mighty pretty now. > > Still up for discussion are the specifics of function call syntax, the full > details of which should already be in the PEP. If you come up with a better > suggestion or want argue for one of the choices, go ahead. I like it. I note that we now end up with new ways for concatenating sequences (e.g. [*a, *b]) and also for merging dicts (e.g. {**a, **b}). I think it would be good to prepare an implementation in time for inclusion in Python 3.4a1 to avoid the same issue with this we had before -- I could imagine that there might be some implementation problems and I don't want to accept an unimplementable PEP. Also it would be good to know that code not using the new syntax won't run any slower (especially for function calls this is very important). Regarding the decision about the allowable syntax for argument lists, I prefer to keep the existing restriction (making *args after a keyword argument basically an exception) since, as you point out, placing regular positional arguments after regular keyword arguments looks plain silly. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Sun Jul 14 04:07:04 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Jul 2013 12:07:04 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> Message-ID: <51E207C8.9060602@pearwood.info> On 14/07/13 11:39, Joshua Landau wrote: > On 14 July 2013 02:24, Steven D'Aprano wrote: >> On 14/07/13 03:25, Philipp A. wrote: >>> >>> so don?t be silly everyone. the question is if float('?') should work and >>> i >>> say ?why the hell not? >> >> >> >> Firstly, let me say that I personally love the idea of float('?') working. >> Or even having literal ? recognised as float('inf') (or perhaps that should >> be Decimal('inf')?). >> >> But I'm still voting -1 on this proposal. If the best argument in favour is >> "why the hell not?", then whatever benefit we might gain is truly tiny. So >> tiny that the benefit is probably smaller than the cost. And yes, there is a >> cost, there is always a cost. There are one-off costs: > > > That wasn't the best reason. The best reason was given by the OP, who > said that it was for data input. If you receive data that uses ?, then > it's useful. Only if you are expecting to get float('inf') as the answer. Just because IEEE 754 floating point supports INFs and NANs doesn't mean that any particular application needs or wants to support them. My guess is that for every app that actively would benefit from this, there is another app that would actively have to counter-act this feature, and another 50 that simply don't care. For apps that actively do want to support INFs, doing a text transformation ? -> 'inf' before calling float is trivial. >> - somebody has to program this feature; >> - write tests for it; >> - write documentation; >> >> and on-going costs: >> >> - that's one more thing for every user to learn; > > Doesn't apply here. Of course it does. Do you think that people are born knowing that float('?') returns an IEEE 754 floating point INF? It's a feature that needs to be learned. >> - programmers will have to take this feature into account whenever they use >> float. > > This isn't true -- most uses of float(...) don't care about the > internationalisation aspect either. Only a minority of cases will need > to account for this. Correct. And? Most users won't care. Of those that do care, some will be annoyed because previously they could rely on float('?') raising an exception, and no longer can. >> Here is a simple implementation that supports ?, +? and -?. >> >> >> _float = float >> >> def float(arg): >> if isinstance(arg, str): >> arg = arg.replace('?', 'inf') >> return _float(arg) >> >> >> Hands up anyone who already uses this, or something like it, in their code? >> Anyone? Given how trivial it is to build this functionality if you need it, >> if you haven't already done so, chances are that you don't need it, even if >> you think you want it. > > I'd hope not, because that's broken code. All the more reason to > accept the proposal. How is it broken? True, it accepts '?inity' as well as 'infinity', but that's a feature, not a bug. -- Steven From alexander.belopolsky at gmail.com Sun Jul 14 04:17:46 2013 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 13 Jul 2013 22:17:46 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E1FDC5.1030501@pearwood.info> References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> Message-ID: On Sat, Jul 13, 2013 at 9:24 PM, Steven D'Aprano wrote: > Or even having literal ? recognised as float('inf') BTW, if having literal ? was considered as a language feature [1], I would support it: if x == ?: ... would be an improvement over existing alternatives if x == float('inf'): ... or if math.isinf(x): ... But float('?') still looks like line noise. [1] .. for some language other than Python. In that language empty set would be spelled ? and string catenation operator would be ?. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Sun Jul 14 06:35:41 2013 From: joshua at landau.ws (Joshua Landau) Date: Sun, 14 Jul 2013 05:35:41 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E207C8.9060602@pearwood.info> References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E207C8.9060602@pearwood.info> Message-ID: On 14 July 2013 03:07, Steven D'Aprano wrote: > On 14/07/13 11:39, Joshua Landau wrote: >> >> On 14 July 2013 02:24, Steven D'Aprano wrote: >>> >>> On 14/07/13 03:25, Philipp A. wrote: >>>> >>>> >>>> so don?t be silly everyone. the question is if float('?') should work >>>> and >>>> i >>>> say ?why the hell not? >>> >>> >>> >>> >>> Firstly, let me say that I personally love the idea of float('?') >>> working. >>> Or even having literal ? recognised as float('inf') (or perhaps that >>> should >>> be Decimal('inf')?). >>> >>> But I'm still voting -1 on this proposal. If the best argument in favour >>> is >>> "why the hell not?", then whatever benefit we might gain is truly tiny. >>> So >>> tiny that the benefit is probably smaller than the cost. And yes, there >>> is a >>> cost, there is always a cost. There are one-off costs: >> >> >> >> That wasn't the best reason. The best reason was given by the OP, who >> said that it was for data input. If you receive data that uses ?, then >> it's useful. > > > Only if you are expecting to get float('inf') as the answer. Just because > IEEE 754 floating point supports INFs and NANs doesn't mean that any > particular application needs or wants to support them. > > My guess is that for every app that actively would benefit from this, there > is another app that would actively have to counter-act this feature, and > another 50 that simply don't care. For apps that actively do want to support > INFs, doing a text transformation ? -> 'inf' before calling float is > trivial. I think it's the best reason because "why not?" is a worse one. Hence it's the best reason. Also, name a single app where accepting unicode infinity is bad. >>> - somebody has to program this feature; >>> - write tests for it; >>> - write documentation; >>> >>> and on-going costs: >>> >>> - that's one more thing for every user to learn; >> >> >> Doesn't apply here. > > > Of course it does. Do you think that people are born knowing that float('?') > returns an IEEE 754 floating point INF? It's a feature that needs to be > learned. No it's not. It needs to be learned if you use float on data that can return "?", excluding data directly from humans (where you don't need to know about it much as most people don't need to know about internationalisation). This is not a burden that most people will feel. >>> - programmers will have to take this feature into account whenever they >>> use >>> float. >> >> >> This isn't true -- most uses of float(...) don't care about the >> internationalisation aspect either. Only a minority of cases will need >> to account for this. > > > Correct. And? Most users won't care. Of those that do care, some will be > annoyed because previously they could rely on float('?') raising an > exception, and no longer can. Agreed. But that was a contradiction of the statement "programmers will have to take this feature into account whenever they use float" which it adequately contradicts. >>> Here is a simple implementation that supports ?, +? and -?. >>> >>> >>> _float = float >>> >>> def float(arg): >>> if isinstance(arg, str): >>> arg = arg.replace('?', 'inf') >>> return _float(arg) >>> >>> >>> Hands up anyone who already uses this, or something like it, in their >>> code? >>> Anyone? Given how trivial it is to build this functionality if you need >>> it, >>> if you haven't already done so, chances are that you don't need it, even >>> if >>> you think you want it. >> >> >> I'd hope not, because that's broken code. All the more reason to >> accept the proposal. > > > How is it broken? True, it accepts '?inity' as well as 'infinity', but > that's a feature, not a bug. First I will admit that originally I misread the code so my tone was more pronounced than it should've been. I posted seconds before I realised, but decided that it wasn't worth changing. Secondly, I'm not convinced that accepting "?inity" a feature. Thirdly, even assuming it was, the code was written as if it were a direct stand-in for the proposal, which it is not and thus the code is buggy. Of course, it would be as simple as replacing "?" with "infinity" instead of "inf", AFAICT -- *assuming* that you don't mind that it "breaks" the returned exceptions. From joshua at landau.ws Sun Jul 14 08:40:52 2013 From: joshua at landau.ws (Joshua Landau) Date: Sun, 14 Jul 2013 07:40:52 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> Message-ID: On 14 July 2013 03:17, Alexander Belopolsky wrote: > > On Sat, Jul 13, 2013 at 9:24 PM, Steven D'Aprano > wrote: >> >> Or even having literal ? recognised as float('inf') > > > BTW, if having literal ? was considered as a language feature [1], I would > support it: > > if x == ?: ... > > would be an improvement over existing alternatives > > if x == float('inf'): ... > > or > > if math.isinf(x): ... > > But float('?') still looks like line noise. I disagree. Personally, code in which copy-paste is the best way to write single-character identifiers is a hassle. I know because I wrote: from itertools import count as ?, permutations as ?, starmap as ? [globals().setdefault("%c"%sum(?.encode()),?)for ?,? in vars(__builtins__).items()if ?!="vars"] sorted = lambda ?:?(? for ? in ?(?(lambda:?,?),?(0,?(?)**(len(?)*2-1)*len(?)))for ? in ?(?)if ?(()).__eq__((?(?(?(?(?).pop()).__rpow__,?(?(?(?,?(lambda:0,?)),()),1))),?),?))[::-1] And that was a *nightmare* to edit. From stephen at xemacs.org Sun Jul 14 09:20:01 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 14 Jul 2013 16:20:01 +0900 Subject: [Python-ideas] Python Convert In-Reply-To: <9BE2F04C-4148-43A0-BBF6-F63C29503C49@yahoo.com> References: <9BE2F04C-4148-43A0-BBF6-F63C29503C49@yahoo.com> Message-ID: <87fvvhtxpa.fsf@uwakimon.sk.tsukuba.ac.jp> > On Jul 11, 2013, at 19:01, Daniel Rode > wrote: >> Since Python3, the python creators removed a lot of encodings >> from the str.encode() method. They did it because they weren't sure >> how to implement the feature in Python3. No, they specifically decided not to implement codecs that are not directional (in the sense that they convert str to str or bytes to bytes or both). >> I have an idea, add a built in method called "convert". Only the name is new; the idea has been suggested several times. However, the API proposed has usually been symmetric and polymorphic (that is, either bytes-to-bytes or str-to-str). It's arguable (and I've argued it) that base encoding should be bytes-to-str, but pragmatically base encodings are used mostly for content transfer encodings in wire protocols, and in the relatively rare and comparatively low-throughput cases where they're displayed to people, there's no real cost to decoding from ASCII to Unicode (str), especially since PEP 393. Since special-case methods already exist and are well known (not to forget easily Googled), there's little benefit to merely providing an bunch of aliases and a registry. So we want to reserve this opportunity for an API that helps users to avoid double-encoding and things like that. Andrew Barnert writes: > it's just a str, which doesn't have an encoding. (Under the covers, > of course, it's actually stored as ASCII, UCS2, or UTF-32...) Actually, 8-bit str is stored as ISO-8859-1. From stephen at xemacs.org Sun Jul 14 09:41:27 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 14 Jul 2013 16:41:27 +0900 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <87ehb1twpk.fsf@uwakimon.sk.tsukuba.ac.jp> Joshua Landau writes: > Do you have any of: > > ???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? > > on your keyboard because they are all valid *as of now* inside the > string you pass to float()? He has these: ????????? (and ?) because Japanese input is necessarily context-dependent. From stephen at xemacs.org Sun Jul 14 10:10:39 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 14 Jul 2013 17:10:39 +0900 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <87d2qltvcw.fsf@uwakimon.sk.tsukuba.ac.jp> Chris Angelico writes: > That one would be more plausible, in the same way that many of the > other Unicode digits are accepted. The analogy doesn't hold. Unicode *digit* and Unicode *numeric* are separate properties. Digits are intended to form numerals according to a positional rule, so parsing a string of digits in logical order always means the same thing, regardless of the character set (or Unicode block, if you prefer). Numeric characters are characters that have a numeric interpretation. So in Japanese "1x1" can mean 11, 101, 1001, 10001, 100000001, and a few others depending on the numeric character used for x (which is the multiplier for the "1" on its left), or it might be a parse error (conventions for writing checks often use powers of 10000 as separators rather than multipliers, so you're missing three digits on the right side). It's possible the same conventions apply to Chinese. Anyway, in Japanese many numeric characters make no sense in positional notation, and require localized parsing methods. Personally, I think it was a mistake to allow non-ASCII digits to be parsed directly by int() and float(). Not even language nationalists like the French, Russians, and Japanese publish statistics using non-ASCII digits. OTOH, people who need to read numbers out of text or whatever probably should be using localization facilities anyway (there are a few cases of "confusables" among the digits where digits whose glyphs are similar have different values as digits). But that ship has sailed, apparently. From storchaka at gmail.com Sun Jul 14 10:12:09 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 14 Jul 2013 11:12:09 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> Message-ID: 14.07.13 05:17, Alexander Belopolsky ???????(??): > [1] .. for some language other than Python. In that language empty set > would be spelled ? and string catenation operator would be ?. :-) And "not equal" would be spelled ?. From storchaka at gmail.com Sun Jul 14 10:40:09 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 14 Jul 2013 11:40:09 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: 13.07.13 00:55, Joshua Landau ???????(??): > On 12 July 2013 22:46, Serhiy Storchaka > wrote: > > 13.07.13 00:27, Joshua Landau ???????(??): > > On 12 July 2013 18:58, Serhiy Storchaka > > >> wrote: > I agree. But how is it related to ? and 3.(142857)? > ? === 1/2; thus is an expression > > > 0.5 === 5/10. Isn't it an expression? > > > No. That's like saying "1 === 2/2". There is a much more obvious > equivalence between two ways of writing "1/2" than between two ways of > displaying the result of "1/2". 0.5 is 5/10 by definition. The result of 1/2 is a fraction ?. > 3.(142857) is more ambiguous, because there's not actually any > mathematical operator in place. But it is too much parsing for no > benefit, AFAICT; you would complicate something simple to solve > almost > no use-cases, and then when they are used it's harder for people > to work > out what is meant. > > > AFAIK children teach 3.(142857) before ?. I'm sure people use > fractions and recurring decimals more often than infinity. > > > In my experience (I'll take a good wager I'm younger than you) people > learn first about infinity, then are taught recurrence using the > floating-dot syntax. The bracket form for recurrence was not taught once > during high-school for me, and although "infinity" was hardly covered > either it's not niche knowledge. Well, maybe it's a cultural difference. I learned recurring decimals in primary school (if memory serves me). > Plus, why on earth would you use recurrence for floats? Give me a use > case. There's a good reason for float infinity. This is only a way to spell a general fraction in decimal. On other hand, ? is even not a real number. > Note that I'm British. > > The informal definition for "expression" with regards > to int and float I'm using is basically the measure of how much more > parsing code would need to be implemented. > > > ? requires no more parsing code then ?. > > > Au contraire, if you accept ? you are bound by law to accept all of the > other fractions -- that's much more code than just allowing ?. If you accept ? you are bound by law to accept ? and all of the other fractions ? and that's much more code than just allowing ?. From joshua at landau.ws Sun Jul 14 10:56:59 2013 From: joshua at landau.ws (Joshua Landau) Date: Sun, 14 Jul 2013 09:56:59 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: On 14 July 2013 09:40, Serhiy Storchaka wrote: > 13.07.13 00:55, Joshua Landau wrote: >> On 12 July 2013 22:46, Serhiy Storchaka wrote: >>> 13.07.13 00:27, Joshua Landau wrote: >>>> >>>> ? === 1/2; thus is an expression >>> >>> 0.5 === 5/10. Isn't it an expression? >> >> No. That's like saying "1 === 2/2". There is a much more obvious >> equivalence between two ways of writing "1/2" than between two ways of >> displaying the result of "1/2". > > 0.5 is 5/10 by definition. The result of 1/2 is a fraction ?. I don't understand. What are you trying to say? >> Plus, why on earth would you use recurrence for floats? Give me a use >> case. There's a good reason for float infinity. > > This is only a way to spell a general fraction in decimal. On other hand, ? > is even not a real number. That's not a use-case. >>>> The informal definition for "expression" with regards >>>> to int and float I'm using is basically the measure of how much >>>> more parsing code would need to be implemented. >>> >>> ? requires no more parsing code then ?. >> >> Au contraire, if you accept ? you are bound by law to accept all of the >> other fractions -- that's much more code than just allowing ?. > > If you accept ? you are bound by law to accept ? and all of the other > fractions ? and that's much more code than just allowing ?. I was afraid that people would go and take this too literally. But either way, if you accept ? and reject ?, you have made a really bad design decision. If you accept ? and reject ?, the atrocity of that decision is much less. I would say it's a good choice, you may say it is bad. But if you say those are equivalently bad decisions you're simply wrong and there's not much more I can say. From steve at pearwood.info Sun Jul 14 11:17:18 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Jul 2013 19:17:18 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> Message-ID: <51E26C9E.4080505@pearwood.info> On 14/07/13 16:40, Joshua Landau wrote: > Personally, code in which copy-paste is the best way to > write single-character identifiers is a hassle. I know because I > wrote: Copy-pasting is never the best way to write single-character identifiers, however it may be the least-worst way when dealing with unfamiliar characters or those not supported by your input system. It works, even in Notepad. But nobody suggests Notepad is the best way to edit code. Professionals use a more powerful editor with lots of powerful shortcuts. Professionals who need to support non-ASCII characters should likewise choose an editor that provides them with powerful character entry methods that are faster than copy-and-pasting. Looking at the example you give below: > from itertools import count as ?, permutations as ?, starmap as ? > [globals().setdefault("%c"%sum(?.encode()),?)for ?,? in > vars(__builtins__).items()if ?!="vars"] > sorted = lambda ?:?(? for ? in > ?(?(lambda:?,?),?(0,?(?)**(len(?)*2-1)*len(?)))for ? in ?(?)if > ?(()).__eq__((?(?(?(?(?).pop()).__rpow__,?(?(?(?,?(lambda:0,?)),()),1))),?),?))[::-1] > > And that was a *nightmare* to edit. Of course it is, because you have just arbitrarily chosen identifiers that don't mean anything, and obfuscated your algorithm as well. I could generate obfuscated ASCII-only code just as horrible to edit using equally awful identifiers like O00OOllII1, O0O0OlIlIl and so forth. But to a mathematician, identifiers like ? ? and ? are no more obfuscated than len, encode or count. Back in the 1980s, I used a Mac which made entering non-ASCII characters a dream, at least for the limited 8-bit charset that Macs supported. Making allowances for that, the above could be simple, if you know key sequence to get the symbols you want, or if your editor provides an input interface that you are happy to use. We have developers here who have seemingly memorized seemingly vast numbers of Emacs and Vim key sequences to perform the most obscure functions, and yet are apparently utterly terrified of the idea that some time in the future they may have to memorize a key sequence like option-p to get ?, or option-u o to get ?. (See, I still remember them, 15+ years since I last used a Mac extensively. Well-thought out mnemonic key sequences for the win.) Editor support for non-ASCII characters ranges from mediocre to absolutely atrocious, depending on the characters and the editor. I don't deny this. A softly, softly approach to non-ASCII identifiers is still wise. I'm still unconvinced that Python 3.4 should accept ? in the language, and I am probably one of the minority who would actually made use of such a feature. But let's please put aside the concept that writing code in anything other than a subset of American English characters is by definition an insane thing to do. -- Steven From joshua at landau.ws Sun Jul 14 11:46:57 2013 From: joshua at landau.ws (Joshua Landau) Date: Sun, 14 Jul 2013 10:46:57 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E26C9E.4080505@pearwood.info> References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> Message-ID: On 14 July 2013 10:17, Steven D'Aprano wrote: > On 14/07/13 16:40, Joshua Landau wrote: >> >> Personally, code in which copy-paste is the best way to >> write single-character identifiers is a hassle. I know because I >> wrote: > > Copy-pasting is never the best way to write single-character identifiers, > however it may be the least-worst way when dealing with unfamiliar > characters or those not supported by your input system. It works, even in > Notepad. But nobody suggests Notepad is the best way to edit code. > Professionals use a more powerful editor with lots of powerful shortcuts. > Professionals who need to support non-ASCII characters should likewise > choose an editor that provides them with powerful character entry methods > that are faster than copy-and-pasting. How do you input unicode then? I don't count typing escape sequences better, because I don't want to memorise pseudorandom numbers. That said, you have inspired me to find a new plugin for my editor which makes unicode input much easier (\forall for example), so thanks. > We have developers here who have seemingly memorized seemingly vast numbers > of Emacs and Vim key sequences to perform the most obscure functions, and > yet are apparently utterly terrified of the idea that some time in the > future they may have to memorize a key sequence like option-p to get ?, or > option-u o to get ?. (See, I still remember them, 15+ years since I last > used a Mac extensively. Well-thought out mnemonic key sequences for the > win.) The ones I added manually to that were actually single-key shortcuts (ALT-GR ); the others were much harder in that they weren't. There's no simple way to usefully fit so many thousands of possibilities into such a small keyboard, in my opinion without reverting to character codes. > Editor support for non-ASCII characters ranges from mediocre to absolutely > atrocious, depending on the characters and the editor. I don't deny this. A > softly, softly approach to non-ASCII identifiers is still wise. I'm still > unconvinced that Python 3.4 should accept ? in the language, and I am > probably one of the minority who would actually made use of such a feature. > But let's please put aside the concept that writing code in anything other > than a subset of American English characters is by definition an insane > thing to do. I didn't say it was insane. But I do dislike it. Code should be quick for the majority of people (of the same language) to edit; unicode symbols ruin that barrier. This includes Notepad users. Should I be French, "?" would be completely acceptable -- if you cannot type that as a French human, you're doing something wrong. But expecting typical read-writers of your program to have better access to unicode infinity (which I wrote like that because copy-paste is a hassle) than copy-paste is presumptuous and silly. There is a *lot* of unicode, and it is hard to get at its reaches. From storchaka at gmail.com Sun Jul 14 12:11:21 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 14 Jul 2013 13:11:21 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: 14.07.13 11:56, Joshua Landau ???????(??): > On 14 July 2013 09:40, Serhiy Storchaka wrote: >> 13.07.13 00:55, Joshua Landau wrote: >>> On 12 July 2013 22:46, Serhiy Storchaka wrote: >>>> 13.07.13 00:27, Joshua Landau wrote: >>>>> >>>>> ? === 1/2; thus is an expression >>>> >>>> 0.5 === 5/10. Isn't it an expression? >>> >>> No. That's like saying "1 === 2/2". There is a much more obvious >>> equivalence between two ways of writing "1/2" than between two ways of >>> displaying the result of "1/2". >> >> 0.5 is 5/10 by definition. The result of 1/2 is a fraction ?. > > I don't understand. What are you trying to say? 0.5 is a spelling of 5?10 which is a result of expression 5/10. ? is a spelling of 1?2 which is a result of expression 1/2. I don't understand why you think 1?2 is expression while 5?10 is not. >>> Plus, why on earth would you use recurrence for floats? Give me a use >>> case. There's a good reason for float infinity. >> >> This is only a way to spell a general fraction in decimal. On other hand, ? >> is even not a real number. > > That's not a use-case. ? is not a use-case. >> If you accept ? you are bound by law to accept ? and all of the other >> fractions ? and that's much more code than just allowing ?. > > I was afraid that people would go and take this too literally. But > either way, if you accept ? and reject ?, you have made a really bad > design decision. If you accept ? and reject ?, the atrocity of that > decision is much less. I would say it's a good choice, you may say it > is bad. But if you say those are equivalently bad decisions you're > simply wrong and there's not much more I can say. The difference between this two bad choices is far less than the difference between good and bad. Why should we choose between two bad designs? From flying-sheep at web.de Sun Jul 14 13:07:13 2013 From: flying-sheep at web.de (Philipp A.) Date: Sun, 14 Jul 2013 13:07:13 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> Message-ID: 2013/7/14 Joshua Landau > I didn't say it was insane. But I do dislike it. Code should be quick > for the majority of people (of the same language) to edit; unicode > symbols ruin that barrier. > when applied to the oly way to do something: yes. in this case: not in the slightest. float('inf') still works, so if your keyboard layout/compose key doesn?t support ?, you just use the ascii variant. comparing it to naming a variable ? or ? or something isn?t valid at all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Sun Jul 14 13:08:58 2013 From: flying-sheep at web.de (Philipp A.) Date: Sun, 14 Jul 2013 13:08:58 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: 2013/7/14 Serhiy Storchaka > ? is not a use-case. it is. OP has it in his/her data. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Sun Jul 14 13:18:03 2013 From: joshua at landau.ws (Joshua Landau) Date: Sun, 14 Jul 2013 12:18:03 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> Message-ID: On 14 July 2013 12:07, Philipp A. wrote: > 2013/7/14 Joshua Landau >> >> I didn't say it was insane. But I do dislike it. Code should be quick >> for the majority of people (of the same language) to edit; unicode >> symbols ruin that barrier. > > > when applied to the oly way to do something: yes. in this case: not in the > slightest. > > float('inf') still works, so if your keyboard layout/compose key doesn?t > support ?, you just use the ascii variant. That would make the code inconsistent. From ned at nedbatchelder.com Sun Jul 14 14:55:46 2013 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 14 Jul 2013 08:55:46 -0400 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> Message-ID: <51E29FD2.7030601@nedbatchelder.com> On 7/14/2013 7:08 AM, Philipp A. wrote: > 2013/7/14 Serhiy Storchaka > > > ? is not a use-case. > > > it is. OP has it in his/her data. Looking back through the many emails about this so far, I didn't see where the OP explained why he wanted this to work. Zaur Shibzukhov never said he had it in his data, he said, "Is it good idea to allow float('?') to be float('inf') in python?" and, "Because infinity is special case of numbers. Unicode standard have regular infinity symbol and it's natural to represent inifinity as ?." As near as I can tell, we don't have an actual use case yet. --Ned. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From szport at gmail.com Sun Jul 14 15:53:46 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sun, 14 Jul 2013 06:53:46 -0700 (PDT) Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E29FD2.7030601@nedbatchelder.com> References: <51E00525.6020700@aim-online.com> <51E29FD2.7030601@nedbatchelder.com> Message-ID: <4f5e849a-cad3-4708-9052-c5bc848f63a9@googlegroups.com> ???????????, 14 ???? 2013 ?., 16:55:46 UTC+4 ???????????? Ned Batchelder ???????: > On 7/14/2013 7:08 AM, Philipp A. wrote: > Looking back through the many emails about this so far, I didn't see where > the OP explained why he wanted this to work. Zaur Shibzukhov never said he > had it in his data, he said, "Is it good idea to allow float('?') to be > float('inf') in python?" and, "Because infinity is special case of numbers. > Unicode standard have regular infinity symbol and it's natural to represent > inifinity as ?." > > As near as I can tell, we don't have an actual use case yet. > I think that actual use cases are rather belongs to numeric area. For example, one could use infinity symbol when output infinity numerical results of calculations to the text file and another one input them. Usually float numbers in python world are converted from string using float(...). So any code that use float to convert from string could benefit.This is not a real use case though, but rather some scenario... > _______________________________________________ > Python-ideas mailing listPython... at python.org http://mail.python.org/mailman/listinfo/python-ideas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Jul 14 16:06:03 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 14 Jul 2013 23:06:03 +0900 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> Message-ID: <877ggttewk.fsf@uwakimon.sk.tsukuba.ac.jp> Joshua Landau writes: > How do you input unicode then? I don't. He probably doesn't, either. On Mac OS, there's a generic way to input characters you only know by sight: a clickable table. Kinda painful, but then, that works for functions in Excel so 99% of the computer-using world can handle it, I suppose. However, in general I don't input Unicode, I input characters I know. I bet Steven does too. On touchpads, I write them by hand (takes memorizing pseudo-random sequences, though, because Chinese characters are recognized dynamically by the shape of strokes, and order matters, not by the shape of the resulting glyph). ATK, I use phonetic input. For math characters, at least Emacs provides "LaTeX entry". Ie, if you type "\pi" you get the Greek letter, if you type "\int" you get an integral sign character. To get the leminscate, you type "\infty". If you know the Unicode name of the character you can use that for input in Python and Emacs. Emacs is pretty smart about completion: I don't use that feature but it would be easily arranged that completion work on the list of Unicode names, if it doesn't work already. (Of limited utility for Han characters though because the Unicode code point is the significant part of the name for them.) > > future they may have to memorize a key sequence like option-p to get ?, or > > option-u o to get ?. (See, I still remember them, 15+ years since I last > > used a Mac extensively. Well-thought out mnemonic key sequences for the > > win.) Not for Han characters or Hangul or Hieroglyphics though. Phonetic input is the way to go for those. Siri actually works pretty good for spoken input, too. From storchaka at gmail.com Sun Jul 14 16:49:13 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 14 Jul 2013 17:49:13 +0300 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <4f5e849a-cad3-4708-9052-c5bc848f63a9@googlegroups.com> References: <51E00525.6020700@aim-online.com> <51E29FD2.7030601@nedbatchelder.com> <4f5e849a-cad3-4708-9052-c5bc848f63a9@googlegroups.com> Message-ID: 14.07.13 16:53, Zaur Shibzukhov ???????(??): > I think that actual use cases are rather belongs to numeric area. > For example, one could use infinity symbol when output infinity > numerical results of calculations to the text file and another one input > them. Usually float numbers in python world are converted from string > using float(...). So any code that use float to convert from string > could benefit.This is not a real use case though, but rather some > scenario... If one use custom code to output an infinity float as '?' (or '+?', or '-', or '\\infty', or '?????????????', or '>9000'), then another one should use custom code to input them. From sturla at molden.no Sun Jul 14 17:07:29 2013 From: sturla at molden.no (Sturla Molden) Date: Sun, 14 Jul 2013 17:07:29 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <4E1B9B7C-D373-4A36-9D39-05BD852449AC@molden.no> Why not include physical constants as well? float('?') = 1E-10 float('c') = 2.9979E8 float('R') = 8.314 -1 (if I have a vote) Sturla Den 12. juli 2013 kl. 15:09 skrev Gerald Britton : >> On 12 July 2013 13:36, Zaur Shibzukhov wrote: >>> Hello! >>> >>> Is it good idea to allow >>> float('?') to be float('inf') in python? >> >> Why? > >> Because it obviously means infinity -- much more so than "inf" does :) > > Do you have the infinity symbol on your keyboard? I don't! So, for > me, should I ask for > > float('oo') > > ?? > > -1 > > > -- > Gerald Britton > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From szport at gmail.com Sun Jul 14 18:47:29 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sun, 14 Jul 2013 09:47:29 -0700 (PDT) Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <4E1B9B7C-D373-4A36-9D39-05BD852449AC@molden.no> References: <4E1B9B7C-D373-4A36-9D39-05BD852449AC@molden.no> Message-ID: ???????????, 14 ???? 2013 ?., 19:07:29 UTC+4 ???????????? Sturla Molden ???????: > > Why not include physical constants as well? > > float('?') = 1E-10 > float('c') = 2.9979E8 > float('R') = 8.314 > > Because '??' isn't denote a constant but special value of float point standard. Symbol ? naturally represent infinity and as I know it's unique in unicode standard. > _______________________________________________ > > Python-ideas mailing list > Python... at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From szport at gmail.com Sun Jul 14 18:58:06 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Sun, 14 Jul 2013 09:58:06 -0700 (PDT) Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E00525.6020700@aim-online.com> <51E29FD2.7030601@nedbatchelder.com> <4f5e849a-cad3-4708-9052-c5bc848f63a9@googlegroups.com> Message-ID: <9dacdb9e-ce63-4295-a526-aa9642362e2b@googlegroups.com> ???????????, 14 ???? 2013 ?., 18:49:13 UTC+4 ???????????? Serhiy Storchaka ???????: > > 14.07.13 16:53, Zaur Shibzukhov ???????(??): > > I think that actual use cases are rather belongs to numeric area. > > For example, one could use infinity symbol when output infinity > > numerical results of calculations to the text file and another one input > > them. Usually float numbers in python world are converted from string > > using float(...). So any code that use float to convert from string > > could benefit.This is not a real use case though, but rather some > > scenario... > > If one use custom code to output an infinity float as '?' (or '+?', or > '-', or '\\infty', or '?????????????', or '>9000'), then another one > should use custom code to input them. > > > Yes of caurse. But I mean only the case when the output is float ('inf') or float ('-inf') - the actual infinity values. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Sun Jul 14 21:26:06 2013 From: sergemp at mail.ru (Sergey) Date: Sun, 14 Jul 2013 22:26:06 +0300 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers Message-ID: <20130714222606.0f61f16e@sergey> Hello, python-ideas. I don't want to call it "summary" yet because I hope there're other ideas out there that would be suggested later. So let's call it "Intermediate Summary". I also shifted emphasis in this summary a little, to show that a faster sum() may be just a side-effect of some ideas. Introduction ============ sum() is a great function because it makes the code simple. It was never restricted to numbers, since it's also useful for adding types like timedelta or different numpy types. Unfortunately sum() performance is INCONSISTENT. I.e. it's O(N) for numbers, but it's O(N*N) for many containers. As a result: sum( [123]*1000000, 0 ) is instant, but: sum( [[1,2,3]]*1000000, [] ) takes forever to complete. It's worth to note, that sum() is one of the most commonly suggested options to add lists [1], despite usually someone also comes and says that it may be slow. This means, that people at least often try to use sum() for lists. That case was also explicitly mentioned in comments to sum() sources. So the problem is not hypothetical. This thread was not the first time when people discussed how this problem can be solved [2]. But this is probably the most deep and detailed discussion of it. Alternatives ============ During the thread there were many alternatives suggested. Among them: * Sum is not obvious (for everyone) way to add lists, so people should not use it, as there're alternatives, i.e. instead of - sum(list_of_lists, []) one can use: - reduce(operator.iadd, list_of_lists, []) - list(itertools.chain.from_iterable(list_of_lists)) - result = [] for x in list_of_lists: result.extend(x) * Alternatives should be more visible, so that people would spotted them earlier. For example there was a suggestion to move chain (or chain.from_iterable) from itertools into builtins, optionally changing its name. * Another alternative was to introduce "reduce" in builtins under a new name "fold", optionally with adjusted parameters to have them easier to understand. * And one more alternative suggested was do nothing. I can't understand it, so I'll just quote: > You're saying what IS NOT your preferred way. But I'm asking what > IS your preferred way. Do you prefer to have a slow sum in python > and people asking why it's slow forever? Do you see that as a best > possible case for python? YES. Godamnit YES. 100% true-to-form YES. But I believe that all of them are workarounds (better workarounds, maybe, but still workaround), that do not solve original problem. It's like if you have a similar bug [3] in httplib you could answer that python is not the best tool to download 110Mb files, or you could suggest some customhttpclient instead, but that would not fix the bug in httplib. I'm trying to address this particular issue of sum() inconsistency being O(N) for numbers, but O(N*N) for many containers. And trying to find options that could give sum a chance to be O(N) for most use cases. Ideas ===== 0. Original patch. Original patch [4] was suggesting to use "+" for first item and "+=" for everything else. That could make sum faster not only for lists but also for all the types having __iadd__ faster than __add__. But a clever example from Oscar [5] shown, that "+" can be different from "+=" e.g. due to coercion: >>> from numpy import array >>> a = array([1, 2, 3], dtype=int) >>> b = array([1.4, 2.5, 3.6], dtype=float) >>> a + b array([ 2.4, 4.5, 6.6]) >>> a += b >>> a array([2, 4, 6]) I suggest (patch [6]) to mention this example explicitly in comments for sum(), so that future developers would not be tempted to follow this approach. 1. Fast custom types. Is it fine to have types that are O(N) summable? This question looks weird, since sum() is already O(N) for some types and O(N*N) for others, but I was several times told things like: > Today, sum is not the best way to concatenate sequences. Making > it work better for some sequences but not others would mean it's > still not the best way to concatenate sequences, but it would > _appear_ to be. That's the very definition of an attractive > nuisance. So I want to clear this out: is it normal that sum() is fast for some types (e.g. numbers, timedeltas or FastTuples from [7]) and slow for others? Or sum just MUST BE slow and the very existence of faster containers is "attractive nuisance", and those should be expelled or slowed down too? 2. Fast built-in types. If the answer is "Yes", i.e. there's nothing bad in some types being faster than others, and there's nothing bad in sum() being faster for some types. Then what do you think about optimizing built-in types, like lists, tuples and strings? The optimization itself could have different benefits: * lower memory usage * instant creation of one instance from another * faster add operation (i.e. "c = a + b") * having it common to lists and tuples could give instant conversion of lists to tuples and back But as a side-effect of such optimization sum() will be O(N) for those types. So the question is: if it's fine for some types to be faster than others, then is it fine if those "some types" are lists or tuples? I.e. what if lists and tuples would be O(N) to sum, while some other types are not? To clear the question whether this is possible I wrote [7] a very limited version of tuple, using that approach, so that: from fasttuple import ft x = sum( [ft([1,2,3])] * 1000000, ft([]) ) takes a few seconds under a usual unpatched python. 3. Fast built-in types just for sum [8]. If it's not bad to have lists/tuples faster than other types, then how about just implementing a small special case for them? Advantage: patch is much shorter [8], than optimization for lists and tuples would be, and it would probably cover most (if not all) usages of sum() for containers. Disadvantage: this patch is limited to sum(), i.e. it gives the same performance for sum() as #2, but it does not optimize general "c = a + b" code as #2 would. Side-note: sum() was designed with special cases in mind from the beginning [9]. Initial sum() patch had a special case for strings (''.join() was used for them) but due to a complexity of code for mixed lists it was replaced by another special case rejecting strings. Currently sum has 5 special cases: three to reject strings/bytes and two optimizations for ints and floats. 4. If that would be considered a bad idea, i.e. if we decide that other custom types should have a fair chance to implement some optimization, then how about declaring a unified way to initialize a container type from iterable? Something like an optional method __init_concatenable_sequence_from_iterable__. Most containers already have such method as a default constructor, i.e. it should be trivial to implement it for any existing type. As a side-effect of this unification sum() could have a general optimization for such types, something like: if hasattr(type(start), "__init_concatenable_container_from_iterable__"): optimized_for = type(start) l_result = list() l_result.extend(start) try: while True: item = next(it) if type(item) is not optimized_for: start = optimized_for.__init_concatenable_container_from_iterable__(l_result) start = start + item break l_result.extend(item) except StopIteration: return optimized_for.__init_concatenable_container_from_iterable__(l_result) (I don't like that long __ name, it was just first I thought about. Better names are welcome. __iccfi__?) 5. Alternative unification idea was suggested by MRAB: . Get the first item, or the start value . Ask the item for an 'accumulator' (item.__accum__()). . Add the first item to the accumulator. . Add the remaining items to the accumulator. . Ask the accumulator for the result. If there's no accumulator available (the "__accum__" method isn't implemented), then either fall back to the current behaviour or raise a TypeError like it currently does for strings. I *guess* sum() was never discussed before so deeply because many people either hate sum() or believe that nothing can be done, so they don't even try to find any options. Or maybe they just prefer to use some easily available workaround and don't spend time trying to find other solutions. PS: If you can, please, explain what exactly you [don't] like in the ideas above, so I could have a chance to modify the ideas according to your opinions. PPS: I'm sorry if I missed some of the suggestions, please, add them too. PPPS: I'm also sorry not to answer to some emails during the thread. I read all of them, but there were already too many of messages in the thread, so I was trying to limit number of emails I write. If I missed something important, please, forward your email to me privately and I'll answer you either directly or on list, if you wish. -- [1] Some stackoverflow questions where sum() is suggested for lists: http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python http://stackoverflow.com/questions/716477/join-list-of-lists-in-python http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python http://stackoverflow.com/questions/3021641/concatenation-of-many-lists-in-python http://stackoverflow.com/questions/7895449/merging-a-list-of-lists http://stackoverflow.com/questions/10424219/combining-lists-into-one http://stackoverflow.com/questions/11331908/how-to-use-reduce-with-list-of-lists http://stackoverflow.com/questions/17142101/concatenating-sublists-python [2] Some earlier attempts to discuss solutions to sum() slowness: 2006-01-12 http://article.gmane.org/gmane.comp.python.general/441831 > A fast implementation would probably allocate the output list just > once and then stream the values into place with a simple index. That's what I hoped "sum" would do, but instead it barfs with a type error. 2010-03-27 http://article.gmane.org/gmane.comp.python.general/658537 The mildly surprising part of sum() is that is does add vs. add-in- place, which leads to O(N) vs. O(1) for the inner loop calls, for certain data structures, notably lists, even though none of the intermediate results get used by the caller. For lists, you could make a more efficient variant of sum() that clones the start value and does add-in-place. [3] http://bugs.python.org/issue6838 *** httplib's _read_chunked extremely slow for lots of chunks *** As the comment in this method suggests, accumulating the value by repeated string concatenation is slow. Appending to a list speeds this up dramatically. [4] http://bugs.python.org/file30705/fastsum.patch Original patch using "+" for first item and "+=" for the rest. [5] http://bugs.python.org/issue18305#msg192873 Oscar Benjamin: This "optimisation" is a semantic change. It breaks backward compatibility in cases where a = a + b and a += b do not result in the name a having the same value. In particular this breaks backward compatibility for numpy users: [...] [6] http://bugs.python.org/issue18305#msg192956 http://bugs.python.org/file30904/fastsum-iadd_warning.patch Extended warning in sum() comments with more important numpy cases. [7] http://bugs.python.org/issue18305#msg193048 http://bugs.python.org/file30917/fasttuple.py fasttuple.py is a Proof-of-Concept implementation of tuple, that reuses same data storage when possible. Its possible usage looks similar to built-in tuples [...]. An interesting side-effect of this implementation is a faster __add__ operator: Adding 100000 of fasttuples took 0.23242688179 seconds Adding 100000 of built-in tuples took 25.2749021053 seconds [8] http://bugs.python.org/issue18305#msg192919 http://bugs.python.org/file30897/fastsum-special-tuplesandlists.patch Patch implementing a special case for tuples and lists. Should work for both Python 2.7 and Python 3.3, and should introduce no behavior change. [9] http://mail.python.org/pipermail/python-dev/2003-April/034767.html Alex Martelli (author of sum()) > for the simple reason that I special-case this -- when the first > item is a PyBaseString_Type, I delegate to ''.join From p.f.moore at gmail.com Sun Jul 14 22:05:13 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 14 Jul 2013 21:05:13 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <20130714222606.0f61f16e@sergey> References: <20130714222606.0f61f16e@sergey> Message-ID: On 14 July 2013 20:26, Sergey wrote: > I'm trying to address this particular issue of sum() inconsistency > being O(N) for numbers, but O(N*N) for many containers. And trying > to find options that could give sum a chance to be O(N) for most > use cases. > sum is currently simple. It just adds each element of the iterable it's called with using +. That's it. (OK, it also checks for strings and rejects them...) result = start for elem in iterable: result = result + elem The performance complexity follows directly from that definition. It is no more surprising that numbers are O(N) and containers O(N*N) than it is that the loop I show above has that complexity. That is to say, it *is* surprising, if you don't know complexity calculations very well, but it's a fundamental and trivial result once you do. Changing the performance of sum() makes reasoning about its complexity *harder* because you can't refer back to that very basic equivalence above to inform your assumptions. The suggestion of using += has merit because it keeps the equivalence simple, so that people can still reason about complexity in a straightforward manner. But there are backward compatibility concerns to review and assess. Performance is not the only consideration. But this is the underlying situation around the performance debates, as I see it. (And people saying "don't change anything" are quite possibly doing so because they view understandable performance guarantees as more important than the "attractive nuisance" of something that is often fast, but it's hard to be sure exactly when.) Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Jul 14 22:42:48 2013 From: mertz at gnosis.cx (David Mertz) Date: Sun, 14 Jul 2013 13:42:48 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <20130714222606.0f61f16e@sergey> References: <20130714222606.0f61f16e@sergey> Message-ID: On Sun, Jul 14, 2013 at 12:26 PM, Sergey wrote: > * Sum is not obvious (for everyone) way to add lists, so people > should not use it, as there're alternatives, i.e. instead of > - sum(list_of_lists, []) > one can use: > - reduce(operator.iadd, list_of_lists, []) > - list(itertools.chain.from_iterable(list_of_lists)) > - result = [] > for x in list_of_lists: > result.extend(x) > It seems to me that in order to make sum() look more attractive, Sergey presents ugly versions of alternative ways to (efficiently) concatenate sequences. One can make these look much nicer, e.g. (assuming there is a 'from itertools import chain' at the very top of the file, which is the sensible place to put it). # If 'list_of_lists' really is as it is named, there is no need to treat it # as generic iterable. Moreover, one doesn't usually need to make an # actual instantiated list from chain() for most purposes. So: flat = chain(list_of_lists) # If we do start with an iterable of lists, but know it isn't infinite, just use: flat = chain(*iter_of_lists) If it is really needed, of course chain.from_iterable() can be used. Although the only time you'd want that is when the iterable is potentially infinite, and in that case you *definitely* don't want to make it back into a list either, just: inf_flat = chain.from_iterable(endless_lists) Another approach in one of the links Sergey gave is nice too, and shorter and more elegant than any of his alternatives: flat = [] map(flat.extend, list_of_lists) Using map() for a side effect is slightly wrong, but this is short, readable, and obvious in purpose. On the other hand, as I've said before, when I read: flat = sum(list_of_lists, []) It just looks WRONG! Yes, I know why it works, because of some quirks of Python internals. But it absolutely doesn't *read* like it should mean what it does or that it should necessarily even work at all. The word SUM is self-evidently and intuitively about *adding numbers* and *not* about "doing something that is technically supported because other things have an .__add__() method". As various people have observed, if Python used some other operator for concatenation, we wouldn't be having this discussion at all. E.g. if we had: concat = [1, 2, 3] . [4, 5, 6] Then we might have a method called .__concat__() on various collections. Conceptually that really is what Python is doing now. It's just that Guido made the very reasonable decision that the symbol "+" was something users could intuitively read as meaning concatenation when appropriate, but as addition in other cases. I definitely don't prefer some other operator than '+' to concatenate sequences. However, I think possibly if I had a time machine I might go back and change the spelling of .__add__() to .__plus__(). That might more clearly indicate that we don't really mean "mathematical addition" but rather simply "what the plus sign does". -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Jul 14 22:57:40 2013 From: mertz at gnosis.cx (David Mertz) Date: Sun, 14 Jul 2013 13:57:40 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On Sun, Jul 14, 2013 at 1:42 PM, David Mertz wrote: > Another approach in one of the links Sergey gave is nice too, and shorter > and more elegant than any of his alternatives: > > flat = [] > map(flat.extend, list_of_lists) > > Using map() for a side effect is slightly wrong, but this is short, > readable, and obvious in purpose. > Oh yeah. That's a Python 2.x thing. Now map() doesn't actually have side-effects. So you'd need to do: flat = [] set(map(flat.extend, list_of_lists)) That starts to border on ugly. One might also try: flat = [] [flat.extend(l) for l in list_of_lists] But I'm not thrilled by how that reads either. Using the chain() versions is just nicer. Moreover, if you insist on concrete collections out of it, you can take your pick (unlike sum().. although you can obviously wrap that answer in a constructor too): flat_tup = tuple(chain(*iter_of_lists)) flat_set = set(chain(list_of_lists)) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From antony.lee at berkeley.edu Sun Jul 14 23:36:49 2013 From: antony.lee at berkeley.edu (Antony Lee) Date: Sun, 14 Jul 2013 14:36:49 -0700 Subject: [Python-ideas] Fwd: Allow Enum members to refer to each other during execution of body In-Reply-To: References: <51DB5573.5070004@stoneleaf.us> <6fc3f1ea-e643-40c4-aa2c-6e0d42bd7b6e@googlegroups.com> <51DE0FD8.4050301@stoneleaf.us> <334b5e00-2f0b-4231-9b86-1e82105c5e28@googlegroups.com> <51DF2AF5.6050804@stoneleaf.us> Message-ID: Sorry for the duplicate, seems like sending just to the googlegroups address doesn't cc to the python.org address. My previous email below. Antony 2013/7/12 Antony Lee > Is there any specific reason why you do not wish to change the behavior of > Enum to this one (which does seem more logical to me)? The patch is fairly > simple in its logic (compared to the rest of the implementation, at > least...), and I could even change it to remove the requirement of defining > __new__ before the members as long as there are no references to other > members (because as long as there are no references to other members, I > obviously don't need to actually create the members), thus making it fully > compatible with the current version. > Antony > > > 2013/7/11 Ethan Furman > >> On 07/11/2013 02:07 PM, Antony Lee wrote: >> >>> On Wednesday, July 10, 2013 6:52:24 PM UTC-7, stoneleaf wrote: >>> >>>> On 07/10/2013 03:47 PM, Antony Lee wrote: >>>> >>>>> >>>>> Forward references are now implemented (https://github.com/anntzer/** >>>>> enum >>>> enum >). >>>>> >>>> >>>> Do they work with a custom __new__ ? __init__ ? >>>> >>> >>> In the current version, they work with a custom __init__ (though of >>> course, as long as the actual arguments that need to >>> be passed to __init__ are provided, the pre-declared members are just >>> "empty"). They do not work with a custom __new__ >>> (not sure how I could make this work, given that at declaration time an >>> "empty" member needs to be created but we don't >>> know what arguments we need to pass to __new__...). >>> As a side effect, however, the whole patch adds a new requirement: >>> custom __new__s must be defined before the members >>> themselves; otherwise they won't be called, for the same reason as >>> above: if I don't know what __new__ is, I can't call >>> it... >>> >> >> Hmm. Well, at this point I can offer kudos for getting it this far, but >> that's about it. The use-case this addresses seems fairly rare, and is >> definitely not a typical enumeration, and can be solved fairly easily with >> some extra post-processing code on a per-enumeration basis. >> >> >> -- >> ~Ethan~ >> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/**mailman/listinfo/python-ideas >> >> -- >> >> --- You received this message because you are subscribed to a topic in >> the Google Groups "python-ideas" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/** >> topic/python-ideas/PC_**Ej19qj5w/unsubscribe >> . >> To unsubscribe from this group and all its topics, send an email to >> python-ideas+unsubscribe@**googlegroups.com >> . >> For more options, visit https://groups.google.com/**groups/opt_out >> . >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jul 15 01:43:27 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 15 Jul 2013 09:43:27 +1000 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> Message-ID: <51E3379F.4080306@pearwood.info> On 14/07/13 21:18, Joshua Landau wrote: > On 14 July 2013 12:07, Philipp A. wrote: >> float('inf') still works, so if your keyboard layout/compose key doesn?t >> support ?, you just use the ascii variant. > > That would make the code inconsistent. Who cares if in one function you say float('?') and in another function you say float('inf') and in a third you say float("inf") [note quotation marks] and in a fourth float("INF")? What possible difference does it make? If you really care, then refactor all of those calls out to a single constant declared in one place only: INF = float(random.choice([ 'inf', "inf", "INFINITY", '?', '+?', r"""?""", '\N{INFINITY}', "\U0000221e", "\u221E", # etc. ])) and now you can satisfy everybody's preferred way of writing ?, no matter what. And no, I am not serious about calling random.choice. But I am serious about refactoring multiple calls to float(whatever) to a module level constant. -- Steven From sergemp at mail.ru Mon Jul 15 02:24:26 2013 From: sergemp at mail.ru (Sergey) Date: Mon, 15 Jul 2013 03:24:26 +0300 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: <20130715032426.2dad1afd@sergey> On Jul 14, 2013 Paul Moore wrote: > sum is currently simple. Sum currently looks like: def sum(seq, start = 0): it = iter(seq) if isinstance(start, str): raise TypeError( "sum() can't sum strings [use ''.join(seq) instead]") if isinstance(start, bytes): raise TypeError( "sum() can't sum bytes [use b''.join(seq) instead]") if isinstance(start, bytearray): raise TypeError( "sum() can't sum bytearray [use b''.join(seq) instead]") # SPECIAL CASES if type(start) is int: i_result = int(start, overflow) if not overflow: try: start = None while start is None: item = next(it) if isinstance(item, int): b = int(item, overflow) x = i_result + b if not overflow and ((x^i_result) >= 0 or (x^b) >= 0) i_result = x continue start = i_result start = start + item except StopIteration: return i_result if type(start) is float: f_result = float(start) try: start = None while start is None: item = next(it) if isinstance(item, float): f_result += float(item) continue if isinstance(item, int): value = int(item, overflow) if not overflow: f_result += float(value); continue start = f_result start = start + item except StopIteration: return f_result # END OF SPECIAL CASES try: while True: item = next(it) result = result + item except StopIteration: return start > result = start > for elem in iterable: > result = result + elem > > The performance complexity follows directly from that definition. First, nothing says that sum is implemented like that. And not everyone assume it is: Paul Rubin @ 2006-01-12 > A fast implementation would probably allocate the output list just > once and then stream the values into place with a simple index. That's what I hoped "sum" would do, but instead it barfs with a type error. So much for duck typing. > It is no more surprising that numbers are O(N) and containers > O(N*N) than it is that the loop I show above has that complexity. Second, line "result = result + elem" does not mean that performance should be O(N*N) for containers. FastTuple container is a simple O(N) proof of that concept (see #1 and #2 in "intermediate summary"). > That is to say, it *is* surprising, if you don't know complexity > calculations very well, but it's a fundamental and trivial result > once you do. I do know. And I do understand that containers MAY be O(N*N) but do NOT have to, as my fasttuple example shows. But many beginners may not know that, they may just reasonably expect that python is a good language, and it's a language with dynamic typing, so its creators are probably smart guys, and their functions work similarly fast for all the types. No, really, if you put aside all your deep python knowledge and take a fresh look at line "result = result + item", what part of that line makes you think that you MUST walk through the items of "result"? I.e. nothing stops a "smart python interpreter" to spot result on both sides and optimize it to perform operation in-place. That's how a beginner could think about it. Hey, even in this list of experienced people it wasn't obvious to many that inplace modification (i.e. "+=") could be so much different from "+" in commonly used code! Beginners may not even know about __[i]add__, even less they would expect them to be much different. > Changing the performance of sum() makes reasoning about its complexity > *harder* because you can't refer back to that very basic equivalence above > to inform your assumptions. Yes, I understand your reasons, you're trying to say that now you have a simple explanation of sum performance: "sum is O(N) for numbers and O(N*N) for containers". But you're wrong in both statements. First: sum is O(N) for many types, not just for numbers. E.g. it's O(N) for timedeltas, and for numpy arrays (yes, it's O(N) for numpy arrays, meaning that sum of 1000 arrays is 10 times slower than sum of 100 arrays). It's also not O(N*N) for all containers, and my fasttuple example is an evidence [1]. So your can't refer to that reasoning, because it's already wrong! Hm... If my patch shows that wrong explanation is wrong, does it adds points to the patch or takes them? ;) > The suggestion of using += has merit because it keeps the equivalence > simple, so that people can still reason about complexity in a > straightforward manner. But there are backward compatibility concerns > to review and assess. Yeah, I liked it too, but Oscar's numpy example have killed my optimism. So my current favourite is #2, I wish it was easy to do. And from practical point of view #3 should cover most use cases I could find and was easy to implement, so if "practicality beats purity" then #3 is the winner. > Performance is not the only consideration. But this is the underlying > situation around the performance debates, as I see it. (And people saying > "don't change anything" are quite possibly doing so because they view > understandable performance guarantees as more important than the > "attractive nuisance" of something that is often fast, but it's hard > to be sure exactly when.) Well, if they say that because of that guarantee, it means they're wrong ? they're trying to keep something they don't have. I guess after 10 years of slow sum people got used to it and don't believe there can be anything good any more. That's why I'm here, I'm trying to find something good and bring the faith back to people. :) E.g. they probably don't assume that faster sum could be a side-effect of tuple being faster. I.e. would you reject the patch making tuple much faster in many cases, just because one of those cases is sum()? -- [1] http://bugs.python.org/file30917/fasttuple.py Adding 100000 of fasttuples took 0.23242688179 seconds Adding 100000 of built-in tuples took 25.2749021053 seconds From ncoghlan at gmail.com Mon Jul 15 04:12:19 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Jul 2013 12:12:19 +1000 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 14 July 2013 12:04, Guido van Rossum wrote: > On Fri, Jul 12, 2013 at 5:25 PM, Joshua Landau > wrote: > > A blessing from the Gods has resulted in > > http://www.python.org/dev/peps/pep-0448/! See what you think; it's not > too > > changed from before but it's mighty pretty now. > > > > Still up for discussion are the specifics of function call syntax, the > full > > details of which should already be in the PEP. If you come up with a > better > > suggestion or want argue for one of the choices, go ahead. > > I like it. Finally read it myself - looks promising. > I note that we now end up with new ways for concatenating > sequences (e.g. [*a, *b]) and also for merging dicts (e.g. {**a, > **b}). I think it would be good to prepare an implementation in time > for inclusion in Python 3.4a1 to avoid the same issue with this we had > before -- I could imagine that there might be some implementation > problems and I don't want to accept an unimplementable PEP. Also it > would be good to know that code not using the new syntax won't run any > slower (especially for function calls this is very important). > I believe we should be able to confine those changes to the bytecode generation, which would mean existing code would be unaffected. > > Regarding the decision about the allowable syntax for argument lists, > I prefer to keep the existing restriction (making *args after a > keyword argument basically an exception) since, as you point out, > placing regular positional arguments after regular keyword arguments > looks plain silly. > Agreed. One interesting point I see is that the "*expr" syntax in comprehensions is getting close to a nested "yield from": >>> list((yield from x) for x in ([1], [2, 3], [4, 5, 6])) [1, None, 2, 3, None, 4, 5, 6, None] The reason those "None" results show up is that this still emits the standard implied "yield value" for the comprehension, and the result of "(yield from x)" is None So the translation for star unpacking in generator expressions would be along the lines of a straightforward replacement of the implied "yield" with an implied "yield from": # Expansion of existing generator expression g = (x for x in iterable) def _g(outermost_iterable): for x in outermost_iterable: yield x g = _g() # Flattening generator expression g = (*x for x in iterable) def _g(outermost_iterable): for x in outermost_iterable: yield from x g = _g(iterable) The meaning for list and set comprehensions then follows from the generator expression semantics. Dictionary comprehensions would remain a unique snowflake, as they would be the only form which permitted the doublestar unpacking (they're already unique, since they rely on the embedded "k:v" notation to distinguish themselves from set comprehensions and set displays in general). As with existing doublestar unpacking in function calls, the semantics of what is acceptable would be driven by http://docs.python.org/3/c-api/dict.html#PyDict_Update (Note that those docs are currently inaccurate, as they imply it also accepts an iterable of key, value 2-tuples like dict.update, which is not the case: http://bugs.python.org/issue18456) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Mon Jul 15 07:46:24 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 15 Jul 2013 00:46:24 -0500 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <20130714222606.0f61f16e@sergey> References: <20130714222606.0f61f16e@sergey> Message-ID: On 07/14/2013 02:26 PM, Sergey wrote: > * Sum is not obvious (for everyone) way to add lists, so people > should not use it, as there're alternatives, i.e. instead of > - sum(list_of_lists, []) > one can use: > - reduce(operator.iadd, list_of_lists, []) > - list(itertools.chain.from_iterable(list_of_lists)) > - result = [] > for x in list_of_lists: > result.extend(x) I have nothing against increasing sum()'s speed. I think the real issue is having a very fast way to sum non-numbers. You could copy the code from sum() and make a function with a different name that specialises in non-number addition. That would not have any backwards compatibility issues. Do you think you could use what you learned with sum to make chain, or a new fold function faster? Both of those have some advantages over sum(), but aren't as fast. If you could make those faster, then that would be very nice. The advantages are.. Chain works with most iterables and uses much less memory in some cases. A new fold function would do quite a lot more depending on the operator passed to it. It may be possible to speed up some common cases that use methods on builtin types. Cheers, Ron From joshua at landau.ws Mon Jul 15 08:44:14 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 07:44:14 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: <51E3379F.4080306@pearwood.info> References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> <51E3379F.4080306@pearwood.info> Message-ID: On 15 July 2013 00:43, Steven D'Aprano wrote: > On 14/07/13 21:18, Joshua Landau wrote: >> >> On 14 July 2013 12:07, Philipp A. wrote: > > >>> float('inf') still works, so if your keyboard layout/compose key doesn?t >>> support ?, you just use the ascii variant. >> >> >> That would make the code inconsistent. > > > Who cares if in one function you say float('?') and in another function you > say float('inf') and in a third you say float("inf") [note quotation marks] > and in a fourth float("INF")? What possible difference does it make? No, I was saying using "?" here and "float('inf')" there is inconsistent. If you really want a pretty global-ish constant, you should just write "Infinity = float('inf')"?? and forget these troubling unicode urges. I even tend to use "thing is Ellipses" instead of "thing is ..." because it reads better -- "thing == ?" just goes back to that. Can you really prefer it that much to "thing == Infinity"?? ? You might wonder what I'm doing with capitalisation, but Ellipsis is capitalised and its class is lower-case, so I feel this is warranted. ? Or your name of choice From mal at egenix.com Mon Jul 15 09:11:16 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 15 Jul 2013 09:11:16 +0200 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <20130714222606.0f61f16e@sergey> References: <20130714222606.0f61f16e@sergey> Message-ID: <51E3A094.9020107@egenix.com> I don't understand why people try to use sum() for anything other than a sequence of numbers. If you want to flatten a list, use a flatten function. Here's a performance comparison of a few possible implementations: http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python Pick one and you're fine :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 15 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-07-16: Python Meeting Duesseldorf ... tomorrow ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From joshua at landau.ws Mon Jul 15 09:16:05 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 08:16:05 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 14 July 2013 21:42, David Mertz wrote: > On Sun, Jul 14, 2013 at 12:26 PM, Sergey wrote: >> >> * Sum is not obvious (for everyone) way to add lists, so people >> should not use it, as there're alternatives, i.e. instead of >> - sum(list_of_lists, []) >> one can use: >> - reduce(operator.iadd, list_of_lists, []) >> - list(itertools.chain.from_iterable(list_of_lists)) >> - result = [] >> for x in list_of_lists: >> result.extend(x) > > > It seems to me that in order to make sum() look more attractive, Sergey > presents ugly versions of alternative ways to (efficiently) concatenate > sequences. > > One can make these look much nicer, e.g. (assuming there is a 'from > itertools import chain' at the very top of the file, which is the sensible > place to put it). > > # If 'list_of_lists' really is as it is named, there is no need to treat > it > # as generic iterable. Moreover, one doesn't usually need to make an > # actual instantiated list from chain() for most purposes. So: > flat = chain(list_of_lists) This does nothing more that iter(list)... > # If we do start with an iterable of lists, but know it isn't infinite, > just use: > flat = chain(*iter_of_lists) Just use chain.from_iterable(...) here too, it might be longer but it's more flexible and has almost no downsides. This just does redundant work for the sake of saving a few characters. > If it is really needed, of course chain.from_iterable() can be used. > Although the only time you'd want that is when the iterable is potentially > infinite, and in that case you *definitely* don't want to make it back into > a list either, just: > > inf_flat = chain.from_iterable(endless_lists) > > Another approach in one of the links Sergey gave is nice too, and shorter > and more elegant than any of his alternatives: > > flat = [] > map(flat.extend, list_of_lists) Gah! No like. flat = [] for lst in list_of_lists: flat.extend(lst) is no longer and also doesn't force you to "deque(maxlen=0).extend(...)" it. > Using map() for a side effect is slightly wrong, but this is short, > readable, and obvious in purpose. I disagree somewhat. > On the other hand, as I've said before, when I read: > > flat = sum(list_of_lists, []) > > It just looks WRONG! I definitely don't think this is nearly as bad as map(flat.extend, list_of_lists); not only is this *defined* to work > Yes, I know why it works, because of some quirks of > Python internals. You think that "[1, 2, 3] + [4, 5, 6] == [1, 2, 3, 4, 5, 6]" is a quirk of python's internals? > But it absolutely doesn't *read* like it should mean what > it does You can look up the term "sum" -- it absolutely does. > or that it should necessarily even work at all. The word SUM is > self-evidently and intuitively about *adding numbers* No it's not. > and *not* about "doing > something that is technically supported because other things have an > .__add__() method". Again, this is wrong. https://en.wikipedia.org/wiki/Summation > Besides numbers, other types of values can be added as well: vectors, matrices, polynomials and, in general, elements of any additive group (or even monoid). Google's (aggregated) dictionary: > The total amount resulting from the addition of two or more numbers, amounts, or items > the final aggregate; "the sum of all our troubles did not equal the misery they suffered" (a good example of where you *already know* you can sum things other than numbers*) So first tell me why it makes sense to sum "misery" but not "lists". How is "misery" more like a number than a "list"? > As various people have observed, if Python used some other operator for > concatenation, we wouldn't be having this discussion at all. E.g. if we > had: > > concat = [1, 2, 3] . [4, 5, 6] > > Then we might have a method called .__concat__() on various collections. > Conceptually that really is what Python is doing now. It's just that Guido > made the very reasonable decision that the symbol "+" was something users > could intuitively read as meaning concatenation when appropriate, but as > addition in other cases. > > I definitely don't prefer some other operator than '+' to concatenate > sequences. However, I think possibly if I had a time machine I might go > back and change the spelling of .__add__() to .__plus__(). That might more > clearly indicate that we don't really mean "mathematical addition" but > rather simply "what the plus sign does". I agree with none of this (except the start: if Python used some other operator we'd only be having *different* discussions). From oscar.j.benjamin at gmail.com Mon Jul 15 12:26:43 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 11:26:43 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: On 12 July 2013 13:36, Zaur Shibzukhov wrote: > > Is it good idea to allow > float('?') to be float('inf') in python? I don't think so as my preference is for Python to stick to IEEE 754 as closely as possible. Section 5.12.1 of IEEE 754-2008 says ''' Conversion of an infinity in a supported format to an external character sequence shall produce a language defined one of ?inf? or ?infinity? or a sequence that is equivalent except for case (e.g., ?Infinity? or ?INF?), with a preceding minus sign if the input is negative. Whether the conversion produces a preceding plus sign if the input is positive is language-defined. Conversion of external character sequences ?inf? and ?infinity? (regardless of case) with an optional preceding sign, to a supported floating-point format shall produce an infinity (with the same sign as the input). ''' This does not seem to prohibit accepting other strings for infinity but it does explicitly define a set of textual representations for infinity. I don't think Python's float <-> str conversions should accept (or emit) any other strings for infinity. Since I was looking at the standard I was also interested to see what is says about non-ascii decimal digits (as accepted by Python 3). Immediately above this in section 5.12 it says ''' Issues of character codes (ASCII, Unicode, etc.) are not defined by this standard. ''' My interpretation of that is that Python's current behaviour is conforming but that it would also be conforming if it didn't accept non-ascii decimal digits. Oscar From oscar.j.benjamin at gmail.com Mon Jul 15 12:40:45 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 11:40:45 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 13 July 2013 01:25, Joshua Landau wrote: > A blessing from the Gods has resulted in > http://www.python.org/dev/peps/pep-0448/! See what you think; it's not too > changed from before but it's mighty pretty now. I definitely like the general colour of this shed but would probably repaint one side of it: while unpacking can create tuples, sets, lists and dicts there's no way to create an iterator. I would like it if the unpacking syntax could somehow be used for iterators. For example: first_line = next(inputfile) # inspect first_line for line in chain([first_line], inputfile): # process line could be rewritten as first_line = next(inputfile): for line in first_line, *inputfile: pass without reading the whole file into memory. Using the tuple syntax is probably confusing but it would be great if there were some way to spell this and get an iterator instead of a concrete collection. Also this may be outside the scope of this PEP but since unpacking is likely to be overhauled I'd like to put forward a previous suggestion by Greg Ewing that there be a way to unpack some items from an iterator without consuming the whole thing e.g.: a, ... = iterable could be roughly equivalent to: try: a = next(iter(iterable)) except StopIteration: raise ValueError('Need more than 0 items to unpack') I currently write code like: def parsefile(inputfile): inputfile = iter(inputfile) try: first_line = next(inputfile) except StopIteration: raise ValueError('Empty file') # Inspect first_line for line in chain([first_line], inputfile): # Process line But with the changes above I could do def parsefile(inputfile): inputfile = iter(inputfile) first_line, ... = inputfile # Inspect first_line for line in first_line, *inputfile: # Process line Oscar From joshua.landau.ws at gmail.com Mon Jul 15 13:08:14 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 15 Jul 2013 12:08:14 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 15 July 2013 11:40, Oscar Benjamin wrote: > On 13 July 2013 01:25, Joshua Landau wrote: >> A blessing from the Gods has resulted in >> http://www.python.org/dev/peps/pep-0448/! See what you think; it's not too >> changed from before but it's mighty pretty now. > > I definitely like the general colour of this shed but would probably > repaint one side of it: while unpacking can create tuples, sets, lists > and dicts there's no way to create an iterator. I would like it if the > unpacking syntax could somehow be used for iterators. For example: > > first_line = next(inputfile) > # inspect first_line > for line in chain([first_line], inputfile): > # process line > > could be rewritten as > > first_line = next(inputfile): > for line in first_line, *inputfile: > pass > > without reading the whole file into memory. > > Using the tuple syntax is > probably confusing but it would be great if there were some way to > spell this and get an iterator instead of a concrete collection. It's useful... I don't dislike it. It might even be a good fit to use tuple syntax for this: 1) We already use the tuple's comprehension syntax for iterators 2) If you think of "*" as yield from, it's not too different from changing a function to a generator But you have the "unification" disadvantage, as well as the added complexity of implementation. Side note: In fact, I'd much like it if there was an iterable "unpacking" method for functions, too, so "chain.from_iterable()" could use the same interface as "chain" (and str.format with str.format_map, etc.). I feel we already have a good deal of redundancy due to this. > Also this may be outside the scope of this PEP but since unpacking is > likely to be overhauled I'd like to put forward a previous suggestion > by Greg Ewing that there be a way to unpack some items from an > iterator without consuming the whole thing e.g.: > > a, ... = iterable That's definitely outside of this PEP's scope ;). Also, I think you oversimplified your last version -- you still need a try-except AFAICT. From oscar.j.benjamin at gmail.com Mon Jul 15 13:17:43 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 12:17:43 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 15 July 2013 12:08, Joshua Landau wrote: > On 15 July 2013 11:40, Oscar Benjamin wrote: >> >> Also this may be outside the scope of this PEP but since unpacking is >> likely to be overhauled I'd like to put forward a previous suggestion >> by Greg Ewing that there be a way to unpack some items from an >> iterator without consuming the whole thing e.g.: >> >> a, ... = iterable > > That's definitely outside of this PEP's scope ;). Also, I think you > oversimplified your last version -- you still need a try-except > AFAICT. Where? The point is that next() raises StopIteration which is not an acceptable type of Error. Leaking the StopIteration makes the function not "generator-safe" i.e. if you call it from a generator the StopIteration could terminate an outer loop. That's why I have the try/except. As long as a, ... = iterator gives me a ValueError I'm happy to let the error propagate upwards. Oscar From joshua.landau.ws at gmail.com Mon Jul 15 13:19:21 2013 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 15 Jul 2013 12:19:21 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 15 July 2013 12:17, Oscar Benjamin wrote: > On 15 July 2013 12:08, Joshua Landau wrote: >> On 15 July 2013 11:40, Oscar Benjamin wrote: >>> >>> Also this may be outside the scope of this PEP but since unpacking is >>> likely to be overhauled I'd like to put forward a previous suggestion >>> by Greg Ewing that there be a way to unpack some items from an >>> iterator without consuming the whole thing e.g.: >>> >>> a, ... = iterable >> >> That's definitely outside of this PEP's scope ;). Also, I think you >> oversimplified your last version -- you still need a try-except >> AFAICT. > > Where? The point is that next() raises StopIteration which is not an > acceptable type of Error. Leaking the StopIteration makes the function > not "generator-safe" i.e. if you call it from a generator the > StopIteration could terminate an outer loop. That's why I have the > try/except. > > As long as > > a, ... = iterator > > gives me a ValueError I'm happy to let the error propagate upwards. I misread the original, apologies. From szport at gmail.com Mon Jul 15 14:05:20 2013 From: szport at gmail.com (Zaur Shibzukhov) Date: Mon, 15 Jul 2013 05:05:20 -0700 (PDT) Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: Message-ID: <02a51f56-f72f-4b31-912d-c45b5e816b15@googlegroups.com> ???????????, 15 ???? 2013 ?., 14:26:43 UTC+4 ???????????? Oscar Benjamin ???????: > On 12 July 2013 13:36, Zaur Shibzukhov > > wrote: > > > > Is it good idea to allow > > float('?') to be float('inf') in python? > > I don't think so as my preference is for Python to stick to IEEE 754 > as closely as possible. > > Section 5.12.1 of IEEE 754-2008 says > ''' > Conversion of an infinity in a supported format to an external > character sequence shall produce a language defined one of ?inf? or > ?infinity? or a sequence that is equivalent except for case (e.g., > ?Infinity? or ?INF?), with a preceding minus sign if the input is > negative. Whether the conversion produces a preceding plus sign if the > input is positive is language-defined. > > Conversion of external character sequences ?inf? and ?infinity? > (regardless of case) with an optional preceding sign, to a supported > floating-point format shall produce an infinity (with the same sign as > the input). > ''' > > This does not seem to prohibit accepting other strings for infinity > but it does explicitly define a set of textual representations for > infinity. I don't think Python's float <-> str conversions should > accept (or emit) any other strings for infinity. > > Since I was looking at the standard I was also interested to see what > is says about non-ascii decimal digits (as accepted by Python 3). > Immediately above this in section 5.12 it says > ''' > Issues of character codes (ASCII, Unicode, etc.) are not defined by > this standard. > ''' > > My interpretation of that is that Python's current behaviour is > conforming but that it would also be conforming if it didn't accept > non-ascii decimal digits. > > > With this we could agree if python completely drop support for non-ascii input. As for IEEE-754-2008, it not fully defined presentation of infinity because we now could have several strings that could identified as "infinity" (inf, Inf, INF, infinity, Infinity, INFINITY). So if IEEE committee will decide to designate one single textual view of infinity that could be ok. -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Mon Jul 15 14:17:02 2013 From: flying-sheep at web.de (Philipp A.) Date: Mon, 15 Jul 2013 14:17:02 +0200 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> <51E3379F.4080306@pearwood.info> Message-ID: 2013/7/15 Joshua Landau > I even tend to use "thing is Ellipses" instead of "thing is ..." > because it reads better -- "thing == ?" just goes back to that. > At least ?...? is correct, while ?Ellips*e*s? isn?t. it?s ?Ellips*i*s?. but as said: ? can be in data, and other than e.g. ?, it unambiguously and always means the same as ?infinity?. there is a reason why float accepts ?inf(inity)? and ?NaN?, but no other prose. float() imho should accept all literals for values it can represent, and ? is just a symbol and therefore synonym for ?infinity?. if there would be a symbol for NaN, i?d propose to include it as well. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Mon Jul 15 15:20:52 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 14:20:52 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> <51E3379F.4080306@pearwood.info> Message-ID: On 15 July 2013 13:17, Philipp A. wrote: > 2013/7/15 Joshua Landau >> >> I even tend to use "thing is Ellipses" instead of "thing is ..." >> because it reads better -- "thing == ?" just goes back to that. > > > At least ?...? is correct, while ?Ellipses? isn?t. it?s ?Ellipsis?. ;). That's why I use autocomplete. > but as said: ? can be in data, and other than e.g. ?, it unambiguously and > always means the same as ?infinity?. there is a reason why float accepts > ?inf(inity)? and ?NaN?, but no other prose. float() imho should accept all > literals for values it can represent, and ? is just a symbol and therefore > synonym for ?infinity?. if there would be a symbol for NaN, i?d propose to > include it as well. From oscar.j.benjamin at gmail.com Mon Jul 15 15:25:42 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 14:25:42 +0100 Subject: [Python-ideas] =?utf-8?b?ZmxvYXQoJ+KInicpPWZsb2F0KCdpbmYnKQ==?= In-Reply-To: References: <51E07F3B.5000405@mrabarnett.plus.com> <51E1FDC5.1030501@pearwood.info> <51E26C9E.4080505@pearwood.info> <51E3379F.4080306@pearwood.info> Message-ID: On 15 July 2013 13:17, Philipp A. wrote: > but as said: ? can be in data, I have worked with a lot of numeric data and I have never encountered that symbol in any data. Oscar From ethan at stoneleaf.us Mon Jul 15 17:44:40 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 15 Jul 2013 08:44:40 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <51E3A094.9020107@egenix.com> References: <20130714222606.0f61f16e@sergey> <51E3A094.9020107@egenix.com> Message-ID: <51E418E8.8080300@stoneleaf.us> On 07/15/2013 12:11 AM, M.-A. Lemburg wrote: > I don't understand why people try to use sum() for anything > other than a sequence of numbers. Because you can. ;) -- ~Ethan~ From mertz at gnosis.cx Mon Jul 15 18:42:13 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 15 Jul 2013 09:42:13 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On Mon, Jul 15, 2013 at 12:16 AM, Joshua Landau wrote: > > flat = chain(list_of_lists) > > This does nothing more that iter(list)... > Right, I forgot a '*'. I don't think that changes the point that it's already very readable, intuitive, and efficient, without resorting to the counter-intuitive sum(). > > flat = [] > > map(flat.extend, list_of_lists) > > Gah! No like. > flat = [] > for lst in list_of_lists: > flat.extend(lst) > Well, a couple characters difference, but the explicit loop is fine also. > You think that "[1, 2, 3] + [4, 5, 6] == [1, 2, 3, 4, 5, 6]" is a > quirk of python's internals? > Basically yes. Except not quite. The "quirk" is more that "the plus sign stands for the .__add__() method, even when it is being used for very different meanings on different datatypes." And again, as I point out, it's not *necessary* that Python had chosen the "+" operator as its spelling of "concatenation" ... it's a good choice, but it does invite confusion with the very different use of "+" to mean "addition". https://en.wikipedia.org/wiki/Summation > > Besides numbers, other types of values can be added as well: vectors, > matrices, polynomials and, in general, elements of any additive group (or > even monoid). > Yep. And summing every one of those things means *addition* and never *concatenation*. There's a reason that article begins with "*Summation* is the operation of adding a sequenceof numbers" It's also notable that concatenation of sequences doesn't form an Abelian Group. Hell, concatenation of sequences isn't even commutative. Using sum() for a non-commutative operation verges on crazy. At the least, such a use is highly counter-intuitive. > Google's (aggregated) dictionary: > > The total amount resulting from the addition of two or more numbers, > amounts, or items > > the final aggregate; "the sum of all our troubles did not equal the > misery they suffered" (a good example of where you *already know* you can > sum things other than numbers*) > Summing troubles might resemble addition, metaphorically. It most certainly does not resemble concatenation. *Maybe* somewhere in the history of English usage you can find some oddball use where the meaning is vaguely similar to "concatenation." This is certainly not the common usage though. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Jul 15 19:25:44 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jul 2013 10:25:44 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On Sun, Jul 14, 2013 at 7:12 PM, Nick Coghlan wrote: > One interesting point I see is that the "*expr" syntax in comprehensions is > getting close to a nested "yield from": [...] > # Flattening generator expression > g = (*x for x in iterable) > > def _g(outermost_iterable): > for x in outermost_iterable: > yield from x > > g = _g(iterable) > > The meaning for list and set comprehensions then follows from the generator > expression semantics. Interesting. I don't know if Joshua intended this translation, but given that it's special syntax anyway it does sound interesting. I'd like to see an implementation though -- to verify that it's not too hard to implement correct, and that the syntax is actually unambiguous. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Jul 15 19:39:42 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jul 2013 10:39:42 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On Mon, Jul 15, 2013 at 3:40 AM, Oscar Benjamin wrote: > I definitely like the general colour of this shed but would probably > repaint one side of it: while unpacking can create tuples, sets, lists > and dicts there's no way to create an iterator. I would like it if the > unpacking syntax could somehow be used for iterators. For example: > > first_line = next(inputfile) > # inspect first_line > for line in chain([first_line], inputfile): > # process line > > could be rewritten as > > first_line = next(inputfile): > for line in first_line, *inputfile: > pass > > without reading the whole file into memory. Using the tuple syntax is > probably confusing but it would be great if there were some way to > spell this and get an iterator instead of a concrete collection. I think this is going down a slippery slope that could jeopardize the whole PEP, which is nice and non-controversial so far. The problem is that "tuples" (more precisely, things separated by commas) are already overloaded to the point where both the parser and most human readers are strained to the max to tell the different cases apart. For example, this definitely creates a tuple: a = 1, 2, 3 Now consider this: b = 2, 3 a = 1, *b Why would that not create the same tuple? In general, all other uses of *x and **xx create concrete objects (there is nothing "iterator-like" about an argument list). I think overloading these same operators to return iterators in some contexts would just cause too much confusion, and discontinuities in edge cases, making it harder to reason about the equivalency of different ways to write the same thing. (The use of *x in a generator expression is an exception -- a generator expression is *already* an iterator, so here there is no confusion.) > Also this may be outside the scope of this PEP but since unpacking is > likely to be overhauled I'd like to put forward a previous suggestion > by Greg Ewing that there be a way to unpack some items from an > iterator without consuming the whole thing e.g.: > > a, ... = iterable Definitely a different PEP. -- --Guido van Rossum (python.org/~guido) From joshua at landau.ws Mon Jul 15 19:54:46 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 18:54:46 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 15 July 2013 17:42, David Mertz wrote: > On Mon, Jul 15, 2013 at 12:16 AM, Joshua Landau wrote: >> >> > flat = chain(list_of_lists) >> >> This does nothing more that iter(list)... > > > Right, I forgot a '*'. I don't think that changes the point that it's > already very readable, intuitive, and efficient I agree in that chain.from_iterable is currently TOOWTDI and I don't think need that to change. > , without resorting to the > counter-intuitive sum(). > >> You think that "[1, 2, 3] + [4, 5, 6] == [1, 2, 3, 4, 5, 6]" is a >> quirk of python's internals? > > > Basically yes. Except not quite. The "quirk" is more that "the plus sign > stands for the .__add__() method, even when it is being used for very > different meanings on different datatypes." And again, as I point out, it's > not *necessary* that Python had chosen the "+" operator as its spelling of > "concatenation" ... it's a good choice, but it does invite confusion with > the very different use of "+" to mean "addition". But it *is* addition as far as Python is concerned. I don't care much about that Python's "+" aren't exclusively for any particular groups; heck -- I'd be happy if we could add a ton *more* things than we currently can. It's still addition; we're not planning Haskell here. *Even if* it was a good idea to restrict "+" to commutative groups with constant-time addition (which it is not), the ship has sailed and addition in Python means what it does. [Hence sum, being ?reduce(.__add__, iterable)? so to speak, makes a ton of sense on lists.] >> https://en.wikipedia.org/wiki/Summation >> >> > Besides numbers, other types of values can be added as well: vectors, >> > matrices, polynomials and, in general, elements of any additive group (or >> > even monoid). > > > Yep. And summing every one of those things means *addition* and never > *concatenation*. There's a reason that article begins with "Summation is > the operation of adding a sequence of numbers" As far as Python is concerned, contancation is a form of addition. Maybe not in mathematics, nor Haskell, nor C. But it is in Python, so lists in Python are additive groups. > It's also notable that concatenation of sequences doesn't form an Abelian > Group. Hell, concatenation of sequences isn't even commutative. Notable? Yeah, fine. Important? No. > Using > sum() for a non-commutative operation verges on crazy. At the least, such a > use is highly counter-intuitive. Why? It's not to me. Sure, you need to know the order that it will operate, but you need to know that for "reduce" too and no-one says using "reduce" in non-commutative ways is insane. >> Google's (aggregated) dictionary: >> > The total amount resulting from the addition of two or more numbers, >> > amounts, or items >> > the final aggregate; "the sum of all our troubles did not equal the >> > misery they suffered" (a good example of where you *already know* you can >> > sum things other than numbers*) > > > Summing troubles might resemble addition, metaphorically. It most certainly > does not resemble concatenation. Yet another way in which we disagree. > *Maybe* somewhere in the history of > English usage you can find some oddball use where the meaning is vaguely > similar to "concatenation." This is certainly not the common usage though. Oh, again we disagree. From mertz at gnosis.cx Mon Jul 15 20:54:41 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 15 Jul 2013 11:54:41 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On Mon, Jul 15, 2013 at 10:54 AM, Joshua Landau wrote: > But it *is* addition as far as Python is concerned. > Yeah, it is. I do know that. What concatenation is NOT is "addition as far as HUMANS are concerned". Now I'll take your claim as true that 'sum(list_of_lists)' is somehow intuitive to you. So I can't say it is counter-intuitive to ALL humans. But it is counter-intuitive to MANY of us, and for us readers who think of "sum" as meaning addition in the mathematical sense, code that uses this is difficult to understand. At the least it requires a double take and an extra moment to think through what is meant (i.e. via understanding what Python does internally). I will argue that sum()'ing sequences is counter-intuitive for MOST humans. It's certainly counter-intuitive to me, and I've written a Python book, taught Python, and taken graduate mathematics courses (those experiences may pull in opposite directions though). I'm also certain it would be counter-intuitive to programming learners. I imagine trying to explain it while teaching Python, and the only thing I could do is tell them to "ignore what you think sum means, and think about the internals of Python" ... that's really not a good situation to be in pedagogically. > *Even if* it was a good idea to restrict "+" to commutative groups > with constant-time addition (which it is not), the ship has sailed and > addition in Python means what it does. I'm not sure it has actually "sailed." It's very well possible to restrict sum()'ing lists or tuples in much the same way strings are excluded (even though strings have an .__add__() method). If special attention were taken to *not* work for cases where it shouldn't, we could remove this counter-intuitive behavior. That said, even though I think it is *weird* to sum a non-commutuative or non-associative operation, the runtime checks to figure out whether some custom type was such would be either outright impossible or needlessly time-consuming. > As far as Python is concerned, contancation is a form of addition. > Maybe not in mathematics, nor Haskell, nor C. But it is in Python, so > lists in Python are additive groups. > No, lists are not additive groups! They do satisfy closure, associativity, and an identity element of []. However, there's no invertibility on lists. So even without considering commutativity, lists fail as groups. > Why? It's not to me. Sure, you need to know the order that it will > operate, but you need to know that for "reduce" too and no-one says > using "reduce" in non-commutative ways is insane. > Huh?! I have no expectation generically that a sequence must be reduce'd by a commutative (nor associative) operation. That's another big way that reduce is different from sum. For example, if someone had a function "fastsum()" that took an initial pass to strike out inverse elements, that might be a reasonable approach. I'm sure it won't speed up adding ints or floats, but maybe someone has a complicated "numeric-like" type where "+" is expensive and recognition of inverse elements is cheap. Possibly this optimization would be a perfectly sensible approach to a special ?sum() function. (No, I don't think the generic sum() should be engineered to do this). This concept makes no sense whatsoever thinking about reduce(operator.add, ...) because reduce() itself doesn't make sense that way. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Mon Jul 15 21:24:25 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 20:24:25 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 15 July 2013 19:54, David Mertz wrote: > On Mon, Jul 15, 2013 at 10:54 AM, Joshua Landau wrote: >> >> But it *is* addition as far as Python is concerned. > > > Yeah, it is. I do know that. > > What concatenation is NOT is "addition as far as HUMANS are concerned". But it is (sort of). I asked my brother (under 20, above 10, not sure how much more I should say on a mailing list), who is about as not-programmer-techy as any computer user could reasonably be. I asked him to add two lists. He *concatenated them*?. I *didn't*, mind you, ask him to *do* it, but what it meant to him, pointing out that "that doesn't make much sense" is a completely valid response. I also asked about an extension to finding the "sum" of lists, and he did not find that confusing. I hence have more evidence than you as of now :P. > Now I'll take your claim as true that 'sum(list_of_lists)' is somehow > intuitive to you. So I can't say it is counter-intuitive to ALL humans. > But it is counter-intuitive to MANY of us, and for us readers who think of > "sum" as meaning addition in the mathematical sense, code that uses this is > difficult to understand. At the least it requires a double take and an > extra moment to think through what is meant (i.e. via understanding what > Python does internally). > > I will argue that sum()'ing sequences is counter-intuitive for MOST humans. Nah. No-one can really extrapolate that far with this tiny data-set of conflicting beliefs, and hence I shall just ignore this point. > It's certainly counter-intuitive to me, and I've written a Python book, > taught Python, and taken graduate mathematics courses (those experiences may > pull in opposite directions though). It's not counter-intuitive to me and I *haven't* done those things?. > I'm also certain it would be > counter-intuitive to programming learners. I imagine trying to explain it > while teaching Python, and the only thing I could do is tell them to "ignore > what you think sum means, and think about the internals of Python" ... > that's really not a good situation to be in pedagogically. Why don't you show it to a random new coder?? Given a fair random sample, I'll be willing to put in a fairly large non-monetary internet-pride bet that I'll win this point. >> *Even if* it was a good idea to restrict "+" to commutative groups >> with constant-time addition (which it is not), the ship has sailed and >> addition in Python means what it does. > > I'm not sure it has actually "sailed." It's very well possible to restrict > sum()'ing lists or tuples in much the same way strings are excluded (even > though strings have an .__add__() method). If special attention were taken > to *not* work for cases where it shouldn't, we could remove this > counter-intuitive behavior. That has nothing (directly) to do with whether addition should be restricted to commutative groups. > That said, even though I think it is *weird* to sum a non-commutuative or > non-associative operation, the runtime checks to figure out whether some > custom type was such would be either outright impossible or needlessly > time-consuming. Agreed on that last part. >> As far as Python is concerned, contancation is a form of addition. >> Maybe not in mathematics, nor Haskell, nor C. But it is in Python, so >> lists in Python are additive groups. > > No, lists are not additive groups! They do satisfy closure, associativity, > and an identity element of []. However, there's no invertibility on lists. > So even without considering commutativity, lists fail as groups. I wasn't saying group in your techy math-major way, but the general English form of the word. Fair 'nuff though, I sorta' asked for that. However, I don't get your last sentence there; what does commutativity change? >> Why? It's not to me. Sure, you need to know the order that it will >> operate, but you need to know that for "reduce" too and no-one says >> using "reduce" in non-commutative ways is insane. > > Huh?! I have no expectation generically that a sequence must be reduce'd by > a commutative (nor associative) operation. That's another big way that > reduce is different from sum. > > For example, if someone had a function "fastsum()" that took an initial pass > to strike out inverse elements, that might be a reasonable approach. I'm > sure it won't speed up adding ints or floats, but maybe someone has a > complicated "numeric-like" type where "+" is expensive and recognition of > inverse elements is cheap. Possibly this optimization would be a perfectly > sensible approach to a special ?sum() function. (No, I don't think the > generic sum() should be engineered to do this). This concept makes no sense > whatsoever thinking about reduce(operator.add, ...) because reduce() itself > doesn't make sense that way. But that only makes sense for types where you can find an inverse, and thus would be applicable only to those elements (with the API that allows you to find the inverse). You wouldn't say that because fsum takes only floats (not sure if true) that sum does not apply to integers, even though it's a *completely reasonable optimisation* (in the sense that accuracy is an optimisation). To put it another way, there are specialised reduces you can write that are faster for specific types; you could think of "list(chain.from_iterable(x))" as "reduce(operator.add, x)" optimised for lists. That doesn't mean that "reduce(operator.add, x)" is only applicable to lists. ? Sort of. He combined them as you would sorted Counters, where duplicate items were doubled-up on, but otherwise order was preserved. I think that is reasonably close. ? Well, I've taught Python, but hardly to the extent you are claiming. ? Not so new that they wouldn't get, say, "map(list, list_of_tuples)", though. From oscar.j.benjamin at gmail.com Mon Jul 15 21:53:53 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 20:53:53 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 15 July 2013 18:39, Guido van Rossum wrote: > On Mon, Jul 15, 2013 at 3:40 AM, Oscar Benjamin > wrote: >> I would like it if the >> unpacking syntax could somehow be used for iterators. For example: >> >> first_line = next(inputfile) >> # inspect first_line >> for line in chain([first_line], inputfile): >> # process line >> >> could be rewritten as >> >> first_line = next(inputfile): >> for line in first_line, *inputfile: >> pass >> >> without reading the whole file into memory. Using the tuple syntax is >> probably confusing but it would be great if there were some way to >> spell this and get an iterator instead of a concrete collection. > > I think this is going down a slippery slope that could jeopardize the > whole PEP, which is nice and non-controversial so far. The problem is > that "tuples" (more precisely, things separated by commas) are already > overloaded to the point where both the parser and most human readers > are strained to the max to tell the different cases apart. For > example, this definitely creates a tuple: > > a = 1, 2, 3 > > Now consider this: > > b = 2, 3 > a = 1, *b > > Why would that not create the same tuple? Yeah, this is what I mean that tuple syntax is confusing. I think, though, that it would be good if some way of creating iterators evolved from this. Consider that while list, set and dict comprehensions can create lists, sets and dicts. Generator expressions can create tuples, OrderedDicts, blists, deques and many more. They can also be used with folds like min, max, sum, any, all and many more. Creating an iterator provides a much more general tool then creating a particular concrete type. Oscar From oscar.j.benjamin at gmail.com Mon Jul 15 22:01:19 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 21:01:19 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 15 July 2013 12:08, Joshua Landau wrote: > On 15 July 2013 11:40, Oscar Benjamin wrote: > > In fact, I'd much like it if there was an iterable "unpacking" method > for functions, too, so "chain.from_iterable()" could use the same > interface as "chain" (and str.format with str.format_map, etc.). I > feel we already have a good deal of redundancy due to this. I've also considered this before. I don't know what a good spelling would be but lets say that it uses *args* so that you have a function signature like: def chain(*iterables*): for iterable in iterables: yield from iterable And then if the function is called with for line in chain(first_line, *inputfile): # do stuff then iterables would be bound to a lazy generator that chains [first_line] and inputfile. Then you could create the unpacking iterator I wanted by just using chain e.g.: chain(prepend, *iterable, append) Oscar From guido at python.org Mon Jul 15 22:01:40 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jul 2013 13:01:40 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On Mon, Jul 15, 2013 at 12:53 PM, Oscar Benjamin wrote: > On 15 July 2013 18:39, Guido van Rossum wrote: >> On Mon, Jul 15, 2013 at 3:40 AM, Oscar Benjamin >> wrote: >>> I would like it if the >>> unpacking syntax could somehow be used for iterators. For example: >>> >>> first_line = next(inputfile) >>> # inspect first_line >>> for line in chain([first_line], inputfile): >>> # process line >>> >>> could be rewritten as >>> >>> first_line = next(inputfile): >>> for line in first_line, *inputfile: >>> pass >>> >>> without reading the whole file into memory. Using the tuple syntax is >>> probably confusing but it would be great if there were some way to >>> spell this and get an iterator instead of a concrete collection. >> >> I think this is going down a slippery slope that could jeopardize the >> whole PEP, which is nice and non-controversial so far. The problem is >> that "tuples" (more precisely, things separated by commas) are already >> overloaded to the point where both the parser and most human readers >> are strained to the max to tell the different cases apart. For >> example, this definitely creates a tuple: >> >> a = 1, 2, 3 >> >> Now consider this: >> >> b = 2, 3 >> a = 1, *b >> >> Why would that not create the same tuple? > > Yeah, this is what I mean that tuple syntax is confusing. I think, > though, that it would be good if some way of creating iterators > evolved from this. > > Consider that while list, set and dict comprehensions can create > lists, sets and dicts. Generator expressions can create tuples, > OrderedDicts, blists, deques and many more. They can also be used with > folds like min, max, sum, any, all and many more. Creating an iterator > provides a much more general tool then creating a particular concrete > type. But the point remains that I see no way to creatively reuse the *x notation to serve both purposes. So I recommend that you think of a different way to obtain your goal, proposing a different PEP, which can be discussed independently from PEP 448. (And I don't mean this in the sense of "go away, I don't want to listen to you". I do want to hear your ideas. I just think that it is better for PEP 448 to be more limited in scope.) Regarding the importance of more general/abstract tools, I have just started reading Seymour Papert's Mindstorms, and one of his early insights about learning programming seems to be that the learner's path goes from more concrete things to more abstract things. In this context it feels appropriate that Python's syntax has notations to create concrete objects such as lists, tuples, dicts but that the more general concepts like iterators must be created without much syntactic help (generator expressions notwithstanding). -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Jul 15 22:06:08 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jul 2013 13:06:08 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On Mon, Jul 15, 2013 at 1:01 PM, Oscar Benjamin wrote: > On 15 July 2013 12:08, Joshua Landau wrote: >> On 15 July 2013 11:40, Oscar Benjamin wrote: >> >> In fact, I'd much like it if there was an iterable "unpacking" method >> for functions, too, so "chain.from_iterable()" could use the same >> interface as "chain" (and str.format with str.format_map, etc.). I >> feel we already have a good deal of redundancy due to this. > > I've also considered this before. I don't know what a good spelling > would be but lets say that it uses *args* so that you have a function > signature like: > > def chain(*iterables*): > for iterable in iterables: > yield from iterable > > And then if the function is called with > > for line in chain(first_line, *inputfile): > # do stuff > > then iterables would be bound to a lazy generator that chains > [first_line] and inputfile. Then you could create the unpacking > iterator I wanted by just using chain e.g.: > > chain(prepend, *iterable, append) But how could you do this without generating different code depending on how the function you are calling is declared? Python's compiler doesn't have access to that information. -- --Guido van Rossum (python.org/~guido) From oscar.j.benjamin at gmail.com Mon Jul 15 22:23:36 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 21:23:36 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 15 July 2013 21:06, Guido van Rossum wrote: > On Mon, Jul 15, 2013 at 1:01 PM, Oscar Benjamin > wrote: >> On 15 July 2013 12:08, Joshua Landau wrote: >>> On 15 July 2013 11:40, Oscar Benjamin wrote: >>> >>> In fact, I'd much like it if there was an iterable "unpacking" method >>> for functions, too, so "chain.from_iterable()" could use the same >>> interface as "chain" (and str.format with str.format_map, etc.). I >>> feel we already have a good deal of redundancy due to this. >> >> I've also considered this before. I don't know what a good spelling >> would be but lets say that it uses *args* so that you have a function >> signature like: >> >> def chain(*iterables*): >> for iterable in iterables: >> yield from iterable >> >> And then if the function is called with >> >> for line in chain(first_line, *inputfile): >> # do stuff >> >> then iterables would be bound to a lazy generator that chains >> [first_line] and inputfile. Then you could create the unpacking >> iterator I wanted by just using chain e.g.: >> >> chain(prepend, *iterable, append) > > But how could you do this without generating different code depending > on how the function you are calling is declared? Python's compiler > doesn't have access to that information. Good point. Maybe you'd have to spell it that way at both ends: chain(prepend, *iterable*, append) Oscar From joshua at landau.ws Mon Jul 15 22:32:08 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 21:32:08 +0100 Subject: [Python-ideas] Iterable function calls and unpacking [was: PEP for issue2292, "Missing *-unpacking generalizations"] Message-ID: I've moved this to a different thread, as I agree with Guido that it's a different PEP. On 15 July 2013 21:06, Guido van Rossum wrote: > On Mon, Jul 15, 2013 at 1:01 PM, Oscar Benjamin > wrote: >> On 15 July 2013 12:08, Joshua Landau wrote: >>> On 15 July 2013 11:40, Oscar Benjamin wrote: >>> >>> In fact, I'd much like it if there was an iterable "unpacking" method >>> for functions, too, so "chain.from_iterable()" could use the same >>> interface as "chain" (and str.format with str.format_map, etc.). I >>> feel we already have a good deal of redundancy due to this. >> >> I've also considered this before. I don't know what a good spelling >> would be but lets say that it uses *args* so that you have a function >> signature like: >> >> def chain(*iterables*): >> for iterable in iterables: >> yield from iterable Personally some form of decorator would be simpler: @lazy_unpack() def chain(*iterables): ... (and it could theoretically work for mappings too, by just "bundling" them; useful for circumstances where the mapping is left untouched and just passed to the next function in line.) >> And then if the function is called with >> >> for line in chain(first_line, *inputfile): >> # do stuff >> >> then iterables would be bound to a lazy generator that chains >> [first_line] and inputfile. Then you could create the unpacking >> iterator I wanted by just using chain e.g.: >> >> chain(prepend, *iterable, append) > > But how could you do this without generating different code depending > on how the function you are calling is declared? Python's compiler > doesn't have access to that information. You could simply make the code such that if has an unpack inside the call it does a run-time check. Whilst this will be slower for the false-positives, the number of times *args is pass-through (and thus you save a redundant copy of the argument tuple) and *args is a simple loop-once construct makes it plausible that those losses would be outweighed. It doesn't even reduce efficiency that much, too, as the worst case scenario is immediately falling back after checking a single C-level attribute of the function, and the function doesn't need to be fetched again or anything suchlike. Then again, I'm guessing. You'd also need to add a call to exhaust the iterator at the end of every function utilising this (transparently, probably) to make this have no obvious externally-visible effects. There would still be a call-order change, but that's much more minor. From oscar.j.benjamin at gmail.com Mon Jul 15 22:43:46 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 21:43:46 +0100 Subject: [Python-ideas] Iterable function calls and unpacking [was: PEP for issue2292, "Missing *-unpacking generalizations"] In-Reply-To: References: Message-ID: On 15 July 2013 21:32, Joshua Landau wrote: > I've moved this to a different thread, as I agree with Guido that it's > a different PEP. > > On 15 July 2013 21:06, Guido van Rossum wrote: >> On Mon, Jul 15, 2013 at 1:01 PM, Oscar Benjamin >> wrote: >>> On 15 July 2013 12:08, Joshua Landau wrote: >>>> On 15 July 2013 11:40, Oscar Benjamin wrote: >>>> >>>> In fact, I'd much like it if there was an iterable "unpacking" method >>>> for functions, too, so "chain.from_iterable()" could use the same >>>> interface as "chain" (and str.format with str.format_map, etc.). I >>>> feel we already have a good deal of redundancy due to this. >>> >>> I've also considered this before. I don't know what a good spelling >>> would be but lets say that it uses *args* so that you have a function >>> signature like: >>> >>> def chain(*iterables*): >>> for iterable in iterables: >>> yield from iterable > > Personally some form of decorator would be simpler: > > @lazy_unpack() > def chain(*iterables): > ... How would the above decorator work? It would need to exploit some new capability since this requires unpacking everything: def lazy_unpack(func): @wraps(func) def wrapper(*args **kwargs): # The line above has already expanded *args return func(*args, **kwargs) return wrapper @lazy_unpack def chain(*iterables): ... > (and it could theoretically work for mappings too, by just "bundling" > them; useful for circumstances where the mapping is left untouched and > just passed to the next function in line.) I don't understand. Do you mean to use it somehow for **kwargs? >>> And then if the function is called with >>> >>> for line in chain(first_line, *inputfile): >>> # do stuff >>> >>> then iterables would be bound to a lazy generator that chains >>> [first_line] and inputfile. Then you could create the unpacking >>> iterator I wanted by just using chain e.g.: >>> >>> chain(prepend, *iterable, append) >> >> But how could you do this without generating different code depending >> on how the function you are calling is declared? Python's compiler >> doesn't have access to that information. > > You could simply make the code such that if has an unpack inside the > call it does a run-time check. Whilst this will be slower for the > false-positives, the number of times *args is pass-through (and thus > you save a redundant copy of the argument tuple) and *args is a simple > loop-once construct makes it plausible that those losses would be > outweighed. It probably would be better to have a specific syntax at the calling site since you probably want to know when you look at f(*infinite_iterator) whether or not infinite_iterator is going to be expanded. Oscar From tjreedy at udel.edu Mon Jul 15 22:45:01 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 15 Jul 2013 16:45:01 -0400 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 7/15/2013 2:54 PM, David Mertz wrote: > What concatenation is NOT is "addition as far as HUMANS are concerned". Have you really never added together two or more shopping lists? Have you never lengthened a string (such as a kite string) or rope by adding (concatenating) another piece? Have you never added two piles of papers together by piling one on top of the other (and order matters here). As I posted before, What HUMANS do not do is 'concatenate' things. Nor is addition by concatenation, as in the examples above, usually considered summation. Summation usually implies condensation. Summing multiple numbers produces one number. With a mixture of + and - munbers, the sum may even be less in magniture than the largest. Summing up an hour meeting should produce a statement of, say, a minute or less. A concatenation of everything said is not a summation. Concatenation does not 'condense' or 'reduce', and that, I think, is why some do not see sum as applying to sequence joining. In Peano arithmetic, in math, addition of numbers (counts) amounts to concatenation of sequences of successor operators. -- Terry Jan Reedy From guido at python.org Mon Jul 15 22:50:27 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Jul 2013 13:50:27 -0700 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: I am doubtful that syntax would be LR(1), as required. On Mon, Jul 15, 2013 at 1:23 PM, Oscar Benjamin wrote: > On 15 July 2013 21:06, Guido van Rossum wrote: > > On Mon, Jul 15, 2013 at 1:01 PM, Oscar Benjamin > > wrote: > >> On 15 July 2013 12:08, Joshua Landau > wrote: > >>> On 15 July 2013 11:40, Oscar Benjamin > wrote: > >>> > >>> In fact, I'd much like it if there was an iterable "unpacking" method > >>> for functions, too, so "chain.from_iterable()" could use the same > >>> interface as "chain" (and str.format with str.format_map, etc.). I > >>> feel we already have a good deal of redundancy due to this. > >> > >> I've also considered this before. I don't know what a good spelling > >> would be but lets say that it uses *args* so that you have a function > >> signature like: > >> > >> def chain(*iterables*): > >> for iterable in iterables: > >> yield from iterable > >> > >> And then if the function is called with > >> > >> for line in chain(first_line, *inputfile): > >> # do stuff > >> > >> then iterables would be bound to a lazy generator that chains > >> [first_line] and inputfile. Then you could create the unpacking > >> iterator I wanted by just using chain e.g.: > >> > >> chain(prepend, *iterable, append) > > > > But how could you do this without generating different code depending > > on how the function you are calling is declared? Python's compiler > > doesn't have access to that information. > > Good point. Maybe you'd have to spell it that way at both ends: > > chain(prepend, *iterable*, append) > > > Oscar > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Mon Jul 15 23:23:28 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 22:23:28 +0100 Subject: [Python-ideas] Iterable function calls and unpacking [was: PEP for issue2292, "Missing *-unpacking generalizations"] In-Reply-To: References: Message-ID: On 15 July 2013 21:43, Oscar Benjamin wrote: > On 15 July 2013 21:32, Joshua Landau wrote: >> I've moved this to a different thread, as I agree with Guido that it's >> a different PEP. >> >> On 15 July 2013 21:06, Guido van Rossum wrote: >>> On Mon, Jul 15, 2013 at 1:01 PM, Oscar Benjamin >>> wrote: >>>> On 15 July 2013 12:08, Joshua Landau wrote: >>>>> On 15 July 2013 11:40, Oscar Benjamin wrote: >>>>> >>>>> In fact, I'd much like it if there was an iterable "unpacking" method >>>>> for functions, too, so "chain.from_iterable()" could use the same >>>>> interface as "chain" (and str.format with str.format_map, etc.). I >>>>> feel we already have a good deal of redundancy due to this. >>>> >>>> I've also considered this before. I don't know what a good spelling >>>> would be but lets say that it uses *args* so that you have a function >>>> signature like: >>>> >>>> def chain(*iterables*): >>>> for iterable in iterables: >>>> yield from iterable >> >> Personally some form of decorator would be simpler: >> >> @lazy_unpack() >> def chain(*iterables): >> ... > > How would the above decorator work? It would need to exploit some new > capability since this requires unpacking everything: Yeah, it would just set an attribute on the function that tells Python to special-case it. It's new functionality, just without new syntax. My way also makes it so you can change old-style unpackers into new-style iter-packers by doing "lazy_version = lazy_unpack(original)". >> (and it could theoretically work for mappings too, by just "bundling" >> them; useful for circumstances where the mapping is left untouched and >> just passed to the next function in line.) > > I don't understand. Do you mean to use it somehow for **kwargs? Yup. A "lazy_kwargs" version that lets you do nothing more than pass it along or convert to dict. In fact, for str.format you'd want a "frozen non-copy" that lets you access elements of the original dicts and kwargs without changing them too. Say you have: def foo(*args, **kwargs): return bar(*modified_args, **kwargs, possibly=more_keywords) there's little point in copying kwargs twice, is there? Same idea with: "{foo}".format(**very_many_things) >>>> And then if the function is called with >>>> >>>> for line in chain(first_line, *inputfile): >>>> # do stuff >>>> >>> But how could you do this without generating different code depending >>> on how the function you are calling is declared? Python's compiler >>> doesn't have access to that information. >> >> You could simply make the code such that if has an unpack inside the >> call it does a run-time check. Whilst this will be slower for the >> false-positives, the number of times *args is pass-through (and thus >> you save a redundant copy of the argument tuple) and *args is a simple >> loop-once construct makes it plausible that those losses would be >> outweighed. > > It probably would be better to have a specific syntax at the calling > site since you probably want to know when you look at > f(*infinite_iterator) whether or not infinite_iterator is going to be > expanded. True. "*args*" and "**kwargs**" are actually quite reasonable; *args* for chain.from_iterable and **kwargs** for collections.ChainMap. Some of the silly things you could do: (item for item in *iter_a*, *iter_b*) == itertools.chain(iter_a, iter_b) {default: foo, **mydict**}[default] === mydict.get({default: foo, **mydict**}) === mydict.get(default, foo) "{SPARKLE}{HOUSE}{TANABATA TREE}".format(**unicode**) === "{SPARKLE}{HOUSE}{TANABATA TREE}".format_map(unicode) print("Hey, look!", *lines*, sep="\n") === print("Hey, look!", "\n".join(lines), sep="\n") next((*iterable*, default)) == next(iterable, default)? Next PEP anyone? :D ? What's the point? Well, we wouldn't have *needed* the default argument if it was that easy in the first place. Same with dict.get. I also have changed my mind where I said: > You'd also need to add a call to exhaust the iterator at the end of > every function utilising this (transparently, probably) to make this > have no obvious externally-visible effects. There would still be a > call-order change, but that's much more minor. because I obviously wasn't thinking when I said it. From mertz at gnosis.cx Mon Jul 15 23:36:17 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 15 Jul 2013 14:36:17 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On Mon, Jul 15, 2013 at 12:24 PM, Joshua Landau wrote: > But it is (sort of). I asked my brother (under 20, above 10, not sure > how much more I should say on a mailing list), who is about as > not-programmer-techy as any computer user could reasonably be. I asked > him to add two lists. He *concatenated them*?. > Here's my experimental contribution. I cut and drew on three pieces of paper similar to the below ASCII art: ______________________ | 4 5 6 2 1 ______________________ | 6 12 13 19 100 ______________________ | 100 200 300 In particular, I put small integers on them, but for generality made the numbers sometimes out of natural sort order. I also made the lists of different lengths so that elementwise addition would pose a problem (a subject *could* decide to fill in the additive identity zero for the "missing" elements if she wanted to, but this would have to be a decision). I also placed the papers deliberately so that the left edges were not aligned (as pictured) so the notion of columns would not be forced on an informant (but not prohibited either). I found as a subject a "programming-naive" but well-educated subject in her 40s (a friend, no kidnapping of strangers off the street). I asked something worded close to the following: "Can you sum these lists? An acceptable answer would be that the question does not make sense. If it does make sense, what result do you get?" As a possible aid, I had a notepad placed nearby, in case some sort of copying operation was felt relevant (but I just made sure the notepad was on the table, I didn't say anything about whether it should or should not be used). Her answer was to write the additive sum of *each* slip of paper (list). I.e. three numbers: 18, 150, 600. In other words, she reads it as: sum([[4,5,6,2,1], [6,12,13,19,100], [100,200,300]]) == [18, 150, 600] Well, this doesn't technically mean *anything* in Python since no 'start' value is given on the left. But essentially her intuition is that it means: map(sum, [[4,5,6,2,1], [6,12,13,19,100], [100,200,300]]) Actually, that Python version is especially accurate, because what my informant actually said was "Do you want me to actually make the calculations?! That's what I'd do!" So much as with 3.x map, she didn't actually consume the iterator until needed. ? Sort of. He combined them as you would sorted Counters, where > duplicate items were doubled-up on, but otherwise order was preserved. > I think that is reasonably close. > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zuo at chopin.edu.pl Mon Jul 15 23:34:10 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Mon, 15 Jul 2013 23:34:10 +0200 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: 15.07.2013 12:40, Oscar Benjamin wrote: > first_line = next(inputfile) > # inspect first_line > for line in chain([first_line], inputfile): > # process line > > could be rewritten as > > first_line = next(inputfile): > for line in first_line, *inputfile: > pass > > without reading the whole file into memory. Please note, that with PEP 448 syntax you could express it by: first_line = next(inputfile) for line in (*it for it in ([first_line], inputfile)): ... Event now, in Python 3.3, you can[*] write: first_line = next(inputfile) for line in [(yield from it) for it in [[first_line], inputfile]]: ... Cheers, *j [*] Please note that it is `yield from` within a *list comprehension*, not a generator expression... And that this list cimprehension still evaluates to a *generator*, not a list! (a [None, None] list is set as StopIteration's value when the generator is exhausted) An interesting fact (but understandable after a though) is that: while a generator created with: [(yield from it) for it in [[1,2,3], 'abc']] produces items: 1, 2, 3, 'a', 'b', 'c', a generator created with: ((yield from it) for it in [[1,2,3], 'abc']) produces items: 1, 2, 3, None, 'a', 'b', 'c', None I am not sure if ability to use it that way is only an implementation artifact, but it works. From joshua at landau.ws Mon Jul 15 23:56:10 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 22:56:10 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 15 July 2013 22:36, David Mertz wrote: > On Mon, Jul 15, 2013 at 12:24 PM, Joshua Landau wrote: >> >> But it is (sort of). I asked my brother (under 20, above 10, not sure >> how much more I should say on a mailing list), who is about as >> not-programmer-techy as any computer user could reasonably be. I asked >> him to add two lists. He *concatenated them*?. > > > Here's my experimental contribution. Thank you, this was interesting (although I don't particularly agree with your interpretation). > I found as a subject a "programming-naive" but well-educated subject in her > 40s (a friend, no kidnapping of strangers off the street). I asked > something worded close to the following: > > "Can you sum these lists? An acceptable answer would be that the question > does not make sense. If it does make sense, what result do you get?" Unfortunately, that's not, in my opinion, an accurate way to phrase the question. If I were to say "can you mow these lawns" or "can you catch these criminals" or "can you run these tests", I would be asking for something along the lines of "map(verb, noun)". That's what she dutifully did, as you note. You should have asked "can you sum this list [note the singular noun] of lists?" or somesuch. Because that's odd linguistically, you can say instead "can you sum these lists together", although that's obviously a bit of a rigged statement. You're free to think of a better compromise. Apologies for my fussiness. > As a possible aid, I had a notepad placed nearby, in case some sort of > copying operation was felt relevant (but I just made sure the notepad was on > the table, I didn't say anything about whether it should or should not be > used). Copying your experimental choices, I actually think I rigged mine a bit by giving a non-numeric values inside my lists, making more imaginative choices much less likely. Kudos for the well-run experiment, too (ignoring my issue of phrasing). > Her answer was to write the additive sum of *each* slip of paper (list). > I.e. three numbers: 18, 150, 600. > > In other words, she reads it as: > > sum([[4,5,6,2,1], [6,12,13,19,100], [100,200,300]]) == [18, 150, 600] > > Well, this doesn't technically mean *anything* in Python since no 'start' > value is given on the left. But essentially her intuition is that it means: > > map(sum, [[4,5,6,2,1], [6,12,13,19,100], [100,200,300]]) > > Actually, that Python version is especially accurate, because what my > informant actually said was "Do you want me to actually make the > calculations?! That's what I'd do!" So much as with 3.x map, she didn't > actually consume the iterator until needed. :). This just proves that Python is a human. From mertz at gnosis.cx Tue Jul 16 00:09:01 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 15 Jul 2013 15:09:01 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On Mon, Jul 15, 2013 at 2:56 PM, Joshua Landau wrote: > > "Can you sum these lists? An acceptable answer would be that the question > > does not make sense. If it does make sense, what result do you get?" > > You should have asked "can you sum this list [note the singular noun] > of lists?" or somesuch. Because that's odd linguistically, you can say > instead "can you sum these lists together", although that's obviously > a bit of a rigged statement. You're free to think of a better > compromise. > I agree that the result is likely to depend a lot on the nuance of how it is worded to a native speaker. I just asked my friend--who is now, however, no longer experimentally naive--what she would have said had I asked "Can you sum these lists together?" I feel like "Can you sum this list of lists?" would just sound perverse to a non-programmer, although I agree that if they thought about it slowly they'd be more likely to come up with concatenation (but I still think most wouldn't do so). Her answer was that she would have produced a single number that was the total of all the numbers in all the lists (or equivalently, the sum of the three map(sum, ...) items). I didn't actually lay out the papers again or make her perform the additions though :-). Of course, we're still talking about one more data point really. Although I have intuitions, there are millions or billions of non-programmers or aspiring programmers, and obviously answers would vary. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Tue Jul 16 00:25:46 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 15 Jul 2013 23:25:46 +0100 Subject: [Python-ideas] PEP for issue2292, "Missing *-unpacking generalizations" In-Reply-To: References: <52ac158ebee1a835988b81ec8001f4d1@chopin.edu.pl> Message-ID: On 15 July 2013 22:34, Jan Kaliszewski wrote: > 15.07.2013 12:40, Oscar Benjamin wrote: > >> first_line = next(inputfile) >> # inspect first_line >> for line in chain([first_line], inputfile): >> # process line >> >> could be rewritten as >> >> first_line = next(inputfile): >> for line in first_line, *inputfile: >> pass >> >> without reading the whole file into memory. > > Please note, that with PEP 448 syntax you could express it by: > > first_line = next(inputfile) > for line in (*it for it in ([first_line], inputfile)): > ... I had realised that but I probably prefer chain() to the above. Thinking about it now what really bothers me about writing that kind of code is the need to trap StopIteration around calls to next() (or to supply and check for a default value). The chain part isn't so bad. > Event now, in Python 3.3, you can[*] write: > > first_line = next(inputfile) > for line in [(yield from it) for it in [[first_line], inputfile]]: > ... > > [*] Please note that it is `yield from` within a *list comprehension*, > not a generator expression... And that this list cimprehension still > evaluates to a *generator*, not a list! (a [None, None] list is set > as StopIteration's value when the generator is exhausted) Where exactly is the above defined/discussed? I looked through PEP 380 (yield from) but I can't find any mention of comprehensions or generator expressions. I guess that it unrolls as def _func(): tmp = [] for it in [[first_line], inputfile]: tmp.append(yield from it) # Now _func is a generator function return tmp # becomes raise StopIteration(tmp) for line in _func(): ... but I hadn't considered the fact that using yield from in the expression would turn a list comprehension into a generator function according to the unrolling logic. Oscar From ron3200 at gmail.com Tue Jul 16 00:27:04 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 15 Jul 2013 17:27:04 -0500 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 07/15/2013 04:36 PM, David Mertz wrote: > On Mon, Jul 15, 2013 at 12:24 PM, Joshua Landau > wrote: > > But it is (sort of). I asked my brother (under 20, above 10, not sure > how much more I should say on a mailing list), who is about as > not-programmer-techy as any computer user could reasonably be. I asked > him to add two lists. He *concatenated them*?. > > > Here's my experimental contribution. I cut and drew on three pieces of > paper similar to the below ASCII art: > > ______________________ > | 4 5 6 2 1 > > ______________________ > | 6 12 13 19 100 > > ______________________ > | 100 200 300 > > In particular, I put small integers on them, but for generality made the > numbers sometimes out of natural sort order. I also made the lists of > different lengths so that elementwise addition would pose a problem (a > subject *could* decide to fill in the additive identity zero for the > "missing" elements if she wanted to, but this would have to be a > decision). I also placed the papers deliberately so that the left edges > were not aligned (as pictured) so the notion of columns would not be forced > on an informant (but not prohibited either). > > I found as a subject a "programming-naive" but well-educated subject in her > 40s (a friend, no kidnapping of strangers off the street). I asked > something worded close to the following: > > "Can you sum these lists? An acceptable answer would be that the question > does not make sense. If it does make sense, what result do you get?" > > As a possible aid, I had a notepad placed nearby, in case some sort of > copying operation was felt relevant (but I just made sure the notepad was > on the table, I didn't say anything about whether it should or should not > be used). > > Her answer was to write the additive sum of *each* slip of paper (list). > I.e. three numbers: 18, 150, 600. Nice test. So what the discussion is trying to determine is, is it better to treat sequences as fundamentally different, than values, in python. Often times in human language, we need to give more information to get the desired point across. One way is to use a common simpler expression, with another simple hint. OR the other way is to use more specific language that doesn't require a hint. The 'hint' can be subtle, and is usually not included in examples we use when comparing human language to computer language. That makes the argument for a simpler term a bit stronger. So we have these two approaches... (1) In the case of sum(x, start), the hint would be the variable name of either x, or start. Without that hint, you need to scan forward or backwards in the source code to figure out what sum() will actually do. (2) The more specific approach would be to have two functions that don't need hints because their purpose is more limited. sum_values(x) # Not really needed as sum() is good at this already. sum_iters(x) # Fast sum of iters.. Because they are more limited in scope, they can be optimised to a greater deal, and made to work without a start value. Sergey's patch does increase the speed quite a bit, (over the other suggestions), and combining lists is fairly common, so I do thing it should be used in either a new function, or in sum(), depending on how the developers feel about weather or not it is better to create more separation between how sequences and values are treated. Although, if one of the other alternatives can be made as fast, then that would be good too. So ... + 1 Add specialised sum_iters(), (based on the sum() patch.) + .5 increase other options speed.. (not easy or even possible) + .25 Patch sum, and document it's use for sequences. [1] [1] Assuming this can be done with no backwards compatibility or new side effects. Lots of new tests should be added for this. By far, the easiest and least problematic choice is to add a new function. Cheers, Ron From joshua at landau.ws Tue Jul 16 00:40:01 2013 From: joshua at landau.ws (Joshua Landau) Date: Mon, 15 Jul 2013 23:40:01 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 15 July 2013 23:27, Ron Adam wrote: > So ... > > + 1 Add specialised sum_iters(), (based on the sum() patch.) You mean chain.from_iterable? That's the fastest we're getting for iterables. Maybe sum_lists() could be faster, but then we're into "if you need that niche and you need it that fast, write it yourself" territory. > + .5 increase other options speed.. (not easy or even possible) > + .25 Patch sum, and document it's use for sequences. [1] > > [1] Assuming this can be done with no backwards compatibility or new side > effects. Lots of new tests should be added for this. But then it doesn't duck-type well ? people should avoid using it ? the original change just becomes an attractive nuisance chain.from_iterable doesn't have this problem. > By far, the easiest and least problematic choice is to add a new function. From ron3200 at gmail.com Tue Jul 16 04:12:07 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 15 Jul 2013 21:12:07 -0500 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 07/15/2013 05:40 PM, Joshua Landau wrote: > On 15 July 2013 23:27, Ron Adam wrote: >> >So ... >> > >> > + 1 Add specialised sum_iters(), (based on the sum() patch.) > You mean chain.from_iterable? No, I mean a new function written in C, which writes (appends) the values directly into a new (or start) sequence. Chain from_iterable builds a generator and yields the items out. That's not going to be as fast, but it does use much less memory in many situations. > That's the fastest we're getting for > iterables. Maybe sum_lists() could be faster, but then we're into "if > you need that niche and you need it that fast, write it yourself" > territory. If it was a python function writen in python, this would be true, but as a builtin C function, it could be faster. Common built-in types could be optimised in sum_iters(), just as Sergery has done in the patch for sum(). One of the main sticking points in the discussion is weather or not sum() should be a recommended way of summing non-number types. Adding a new function supports the (current) view that sum shouldn't be recommended to sum non-number types. (Although it would still work for backwards compatibility reasons.) >> > + .5 increase other options speed.. (not easy or even possible) >> > + .25 Patch sum, and document it's use for sequences. [1] >> > >> >[1] Assuming this can be done with no backwards compatibility or new side >> >effects. Lots of new tests should be added for this. > But then it doesn't duck-type well ? people should avoid using it ? > the original change just becomes an attractive nuisance That's why the tests are needed, and why it's not my first choice. > chain.from_iterable doesn't have this problem. I agree. It's about using the right tool for the right job. Not weather one is better than the other. >> >By far, the easiest and least problematic choice is to add a new function. How about a decision tree? # <<<< Are we still stuck here? (YES) patch sum. Add more tests to cover questionable cases. (YES) [GOTO 2] (NO) [GOTO 1] A: (NO) # I think we should be here. Crate new function based on the sum() patch. create many tests. (YES) {AND} (YES) [GOTO B] (NO) [DONE] # Good idea, but not worth doing. (NO) [DONE] # Something wrong with idea. B: Add docs to patch. (YES) # Accepted! Add news entry if needed to patch. apply patch [DONE] # Yay (NO) rejected # Goto "A" if changes are needed. [DONE] # We tried. [*2] [*1] A preferred way to verify this is to find places in python's library where it helps make the code better in some way. Finding enough of these examples is a good indication it's on the right track and helps significantly in convincing others it's worth doing. [*2] It might be determined that the patch would cause problems down the road, be difficult to maintain, or there might be some other competing idea that would be preferred. Of course there would be feedback cycles in most of these steps and others might put things in a different order, but it's pretty much follows the standard path most patches are done on the tracker. Lets help Sergey through this process and not be too quick to reject his ideas. Cheers, Ron From eliben at gmail.com Tue Jul 16 05:10:06 2013 From: eliben at gmail.com (Eli Bendersky) Date: Mon, 15 Jul 2013 20:10:06 -0700 Subject: [Python-ideas] regex module - proper implementation of alternation? Message-ID: Since the 'regex' module is a candidate for inclusion into the stdlib, I figured this would be a good place to ask. While discussing something related in pss (https://github.com/eliben/pss), Ben Hoyt brought to my attention that the implementation of alternation (foo|bar) in Python's default regex module (the SRE implementation) is very inefficient. And indeed, looking there it seems that | gets translated to an opcode that simply means going over all the alternatives in a loop trying to match each. This is not how a proper regex engine should implement alternation! A common advice given to Python programmers is to combine their regexes into a single one with |. This makes code faster, but it turns out that it's far from its full potential because the combination doesn't go full way to the DFA as it's supposed to. A question about 'regex' - is it implemented properly there? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Tue Jul 16 05:21:29 2013 From: sergemp at mail.ru (Sergey) Date: Tue, 16 Jul 2013 06:21:29 +0300 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: <20130716062129.4af447db@sergey> On Jul 15, 2013 David Mertz wrote: > It seems to me that in order to make sum() look more attractive, > Sergey presents ugly versions of alternative ways to (efficiently) > concatenate sequences. No, I actually tried to choose the most popular and obvious ones. Of course there're more. E.g. I tested 6 [1]. Do you think that people often know *args notation or add infinite lists? ;) But anyway all of them are just workarounds. Better workarounds, maybe, but still workarounds, and their existence don't fix sum(). Same as existence of wget or curl does not fix the bug in httplib [2]. > flat = [] > map(flat.extend, list_of_lists) > Using map() for a side effect is slightly wrong, but this is short, > readable, and obvious in purpose. Ehm, you were joking when you called that obvious, right? ;) > Now I'll take your claim as true that 'sum(list_of_lists)' is > somehow intuitive to you. So I can't say it is counter-intuitive > to ALL humans. So what? Lot's of things are counter-intuitive in programming languages, including python. For example, don't you think that a = a + b and a += b should be same, and it's counter-intuitive to have them different. Yet you can have them different in python. > But it is counter-intuitive to MANY of us, and for us readers who > think of "sum" as meaning addition in the mathematical sense, code > that uses this is difficult to understand. If it makes no mathematical sense then don't apply math to it. :) In mathematical sense something like: x = x + 1 is completely counter-intuitive. There's no "x" for which that equation could be true. Yet, this line is obvious to every programmer. I mean, a function does not have to be intuitive for mathematicians in order to be a useful tool for programmers. :) And the point is: they already use it, so we already have a bug, I'm just searching for the best way to fix it. Having a broken tool on your shelves and saying "don't take that, it's broken" to everyone every time is not a solution. > I will argue that sum()'ing sequences is counter-intuitive for MOST > humans. It's certainly counter-intuitive to me, and I've written > a Python book, taught Python, and taken graduate mathematics > courses (those experiences may pull in opposite directions though). Be careful about things like "I've written books, taught Python..." because someone may catch you on some stupid typo (e.g. a missing * in chain() call) and ask what are you teaching if you don't know basics yourself. :) > I'm not sure it has actually "sailed." It's very well possible to restrict > sum()'ing lists or tuples in much the same way strings are excluded (even > though strings have an .__add__() method). If special attention were taken > to *not* work for cases where it shouldn't, we could remove this > counter-intuitive behavior. It's actually easier to fix it, than to restrict it. To fix it you just need to apply a patch [3]. You can do that even for Python2, since it introduces no behaviour change. But restricting it means removing a feature, it's a major change that breaks backward compatibility. ?? [1]http://mail.python.org/pipermail/python-ideas/2013-July/022065.html [2]http://bugs.python.org/issue6838 [3]http://bugs.python.org/file30897/fastsum-special-tuplesandlists.patch From mertz at gnosis.cx Tue Jul 16 05:22:23 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 15 Jul 2013 20:22:23 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On Mon, Jul 15, 2013 at 7:12 PM, Ron Adam wrote: > > + 1 Add specialised sum_iters(), (based on the sum() > You mean chain.from_iterable? > > No, I mean a new function written in C, which writes (appends) the values > directly into a new (or start) sequence. Chain from_iterable builds a > generator and yields the items out. That's not going to be as fast, but it > does use much less memory in many situations. If a new function could *actually* be significantly faster than chain.from_iterable(), I think it would be reasonable to have. However, if writing something new as basically an alias for 'list(chain(...))' only gets us, say 10% speedup, I think nothing should be included. But PLEASE, don't call such a new function sum_iters(). The obviously correct name for such a thing is 'concat()'. This is what I've argued a bunch of times, but let's just call concatenation by its actual name rather than try to squint in just the right way to convince ourselves that "summation" is the same concept. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Tue Jul 16 05:35:57 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 16 Jul 2013 04:35:57 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: On 16 July 2013 04:22, David Mertz wrote: > On Mon, Jul 15, 2013 at 7:12 PM, Ron Adam wrote: >> >> > + 1 Add specialised sum_iters(), (based on the sum() >> >> You mean chain.from_iterable? >> >> No, I mean a new function written in C, which writes (appends) the values >> directly into a new (or start) sequence. Chain from_iterable builds a >> generator and yields the items out. That's not going to be as fast, but it >> does use much less memory in many situations. > > > If a new function could *actually* be significantly faster than > chain.from_iterable(), I think it would be reasonable to have. However, if > writing something new as basically an alias for 'list(chain(...))' only gets > us, say 10% speedup, I think nothing should be included. I'm not convinced. I have three reasons, and I have full faith in all three. 1) Ignoring speed, I don't believe there are *any* use-cases where concat(...) *isn't worse* than list(chain.from_iterable(...)), excluding fabricated ones that have never happened. 2) I'm not convinced this is a bottleneck for a significant number of people; chain is much faster than most constructs we have already, so it'd be off for the chaining to be the slowest part 3) This belongs on PyPi, not stdlib, as it is niche and we can already do the same thing with the same asymptotic performance. Blist gives *way* more speed advantages yet that's been rejected from stdlib so I disagree that this can cross the barrier. I realise blist was rejected for the additional reason of trying to replace our normal lists, but it could've gotten into stdlib so that doesn't discount the analogy. > But PLEASE, don't call such a new function sum_iters(). The obviously > correct name for such a thing is 'concat()'. This is what I've argued a > bunch of times, but let's just call concatenation by its actual name rather > than try to squint in just the right way to convince ourselves that > "summation" is the same concept. I agree, because a summation that only works on lists is really just concat after-all. From sergemp at mail.ru Tue Jul 16 05:58:13 2013 From: sergemp at mail.ru (Sergey) Date: Tue, 16 Jul 2013 06:58:13 +0300 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: <20130716065813.7aa8eae2@sergey> On Jul 15, 2013 Ron Adam wrote: > You could copy the code from sum() and make a function with a different > name that specialises in non-number addition. That would not have any > backwards compatibility issues. That would just add one more workaround, but would not solve the problem. Sum is already O(N*N) for many containers, and adding more workarounds would not make it any faster. As for me using "+" e.g. to add strings looks rather obvious. Adding lists looks similar to adding strings. And sum() looks like a lot of "+"es. I mean I see nothing strange in using sum for list of lists. It as natural as using max() to find the "largest" word in a list of strings ? a nice and expected feature. But *if* in some distant python version we would have separate operations for numerical addition and sequence concatenation, then we might split our current sum() into two functions: sum() and concat(). But to do that we must first solve the problem for sum(). Or we'll have exactly same problem with concat() anyway. > Do you think you could use what you learned with sum to make chain, or a > new fold function faster? I'm not sure that chain or reduce/fold could benefit from them, since all suggestions are about containers and "+" operator, except, maybe, suggestion #2, since everybody are expected to benefit from it. > The advantages are.. > > Chain works with most iterables and uses much less memory in some cases. That's the main difference: sum returns a container, which is complete and ready to use, while chain returns some weird type. :) It does not use a container, so you can't have any container-specific optimization in it. I.e. imagine: x = [[1,2], [3,4], 5, 6] now try: for i in sum(x, []): print(i) and: for i in chain.from_iterable(x): print(i) in first case you'll get an error instantly, while in second case you'll have print called several times before you get an error. > A new fold function would do quite a lot more depending on the operator > passed to it. It may be possible to speed up some common cases that use > methods on builtin types. I'm not sure what would be a common case for it. It looks like a renamed reduce. What's a common case for reduce? From mertz at gnosis.cx Tue Jul 16 06:53:06 2013 From: mertz at gnosis.cx (David Mertz) Date: Mon, 15 Jul 2013 21:53:06 -0700 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <20130716062129.4af447db@sergey> References: <20130714222606.0f61f16e@sergey> <20130716062129.4af447db@sergey> Message-ID: On Mon, Jul 15, 2013 at 8:21 PM, Sergey wrote: > Do you think that people often know *args notation or add infinite lists? > ;) > I think people now know *args notation, yes (and yes, I make typos notwithstanding being a professional writer). As for infinite lists, I actually had in mind a fairly specific and ordinary example. I can imagine that one has one iterator the yields files, which themselves iterate over lines within files. Maybe not *infinite* quite, but one might have a lot of files (even a lot matching some spec) on a slow network drive, and each of those files might have a lot of lines. One plausible thing to do is keep searching until we find "the right line" ... but we don't want to actually open millions of files if we don't need to. E.g.: for line in chain.from_iterable(lines_in_many_files): got_it = check_the(line) if got_it: break -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergemp at mail.ru Tue Jul 16 07:39:11 2013 From: sergemp at mail.ru (Sergey) Date: Tue, 16 Jul 2013 08:39:11 +0300 Subject: [Python-ideas] Fast sum summary [was Re: Fast sum() for non-numbers - why so much worries?] In-Reply-To: References: <20130702211209.6dbde663@sergey> <20130709123530.2afa1adf@sergey> <51DC36C2.8000509@pearwood.info> <51DD8BDB.6050101@pearwood.info> <51DF5368.6020505@pearwood.info> <51DF57E6.8090206@mrabarnett.plus.com> <20130713005718.78d01516@sergey> Message-ID: <20130716083911.370a9f82@sergey> On Jul 13, 2013 Paul Moore wrote: >> So you can, kind of, say that sum was DESIGNED to have special cases >> from the very beginning. >> > > Thanks for the reference. That's the *original* implementation. So why does > the current sum() not do this? You need to locate the justification for the > removal of this special case As I already said the problem was in mixed list of strings and non-strings[1]. Later Guido suggested to check second argument and raise TypeError for strings [2], and Alex Martelli answered [3] "I like this!!!" and implemented it. > and explain why that reason no longer applies. It still does, and it's still there ? sum() still has a special case check for strings (and bytes, and bytearrays) rejecting it. But my patch does not use that approach, it does not call external join-like function, so it can correctly recover from mixed lists case. Meaning, my patch does not have the bug of initial sum(). -- [1]http://mail.python.org/pipermail/python-dev/2003-April/034854.html [2]http://mail.python.org/pipermail/python-dev/2003-April/034853.html [3]http://mail.python.org/pipermail/python-dev/2003-April/034855.html From sergemp at mail.ru Tue Jul 16 07:36:05 2013 From: sergemp at mail.ru (Sergey) Date: Tue, 16 Jul 2013 08:36:05 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> Message-ID: <20130716083605.16da9f9f@sergey> On Jul 11, 2013 Andrew Barnert wrote: >>> sum is not the obvious way to concatenate sequences today, and my >>> preferred way is for that to stay true. So, I'm: [...] >> >> You're saying what IS NOT your preferred way. But I'm asking what IS >> your preferred way. > > You just quoted it. My preferred way to handle this is what we > already do: don't encourage people to misuse sum for concatenating > sequences, encourage them to use something appropriate. I don't understand it. It makes no sense to me. Do you like having many broken tools? E.g. would you liked if someone added __mult__ to set()s, but made it O(N*N*N) so that people would not used it too often? Anyway, I've got your point. You want sum() to be O(N) for numbers and some rare/custom containers, but want it to stay O(N*N) for the most popular container types for some reasons, that are too hard for me to understand. >> Do you prefer to have a slow sum in python and people asking why >> it's slow forever? Do you see that as a best possible case for >> python? > Yes. Given that it is impossible to make sum fast for all > collection types Technically it may be possible. As if technically it's possible to have no wars on Earth. It requires a great cooperation of many people, so it's not probable, but possible. So what? It's (kind of) impossible to make python fast for all programs, but it does not mean that python should not be fast for some programs, Same applies to sum(): even if it's impossible to make if fast for all collection types, it does not mean that it should not be fast for some of them, e.g. lists and tuples. After all it's quite easy to make it fast for most (if not all) commonly used cases. > Also, It's less surprising this way, not more. Today, people only > have to learn one thing: Don't use sum on collections. That's much > easier than having to learn a complex mishmash like: Don't use sum on > immutable collections, except for tuple, and also don't use it on > some mutable collections, but it's hard to characterize exactly > which, and also don't use it on things that are iterable but that > you don't want to treat as sequences, and... It's much easier to just learn: don't use sum(). And it's even easier to learn: use sum(), because you don't have to learn that. ;) > Finally, I've ignored your requests for examples because in every > case you've already been given examples and haven't dealt with any > of them. Oh, really? You said, I can't make sum fast for strings and tuples, so I did that and showed a patch. Then you said that I can't make sum fast for linked lists, so I suggested how to do that. You did not liked my linked lists, so I explained how you can do that for your cons-lists. You did not liked my explanation (and said some weird things about thread safety) so I showed you two code samples, both using your cons-lists and sum(). You also said that I can't make tuples __add__ faster without patching sum() so explained how you can do that (you never answered whether you like such a patch and whether you would agree to write it). I even wrote a simple fasttuple proof of concept [1]. What examples I haven't dealt with? ;) > If I want to concatenate cons lists, chain does it in linear time > your design does not; Quite the opposite, my design would give you a list, while chain won't even give you a correct list, since `next` elements of that "list" would not point to correct locations. BTW, chain would not work for your list, because it's not iterable by default, is it? And even if you implement __iter__ for it, how are you going to handle modifications of you list while you iterate it? You don't have those problems with sum(). Hm, that basically marks chain() as error-prone replacement for sum, that is sometimes harder to implement support for. > instead of answering that, you just keep arguing that you can sum > a different kind of linked list in linear time. That doesn't even > approach answering the objection. If the "objection" was "you can't make it fast for linked lists" then "you can" is the exact answer to that objection. :) -- [1] http://bugs.python.org/file30917/fasttuple.py From ron3200 at gmail.com Tue Jul 16 07:50:36 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 16 Jul 2013 00:50:36 -0500 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <20130716065813.7aa8eae2@sergey> References: <20130714222606.0f61f16e@sergey> <20130716065813.7aa8eae2@sergey> Message-ID: On 07/15/2013 10:58 PM, Sergey wrote: > On Jul 15, 2013 Ron Adam wrote: > >> >You could copy the code from sum() and make a function with a different >> >name that specialises in non-number addition. That would not have any >> >backwards compatibility issues. > That would just add one more workaround, but would not solve the > problem. Sum is already O(N*N) for many containers, and adding > more workarounds would not make it any faster. Sorry for not being clearer. I meant to use your patch for sum as a basis to write a new function that is efficient for containers. > As for me using "+" e.g. to add strings looks rather obvious. Adding > lists looks similar to adding strings. And sum() looks like a lot of > "+"es. I mean I see nothing strange in using sum for list of lists. > It as natural as using max() to find the "largest" word in a list of > strings ? a nice and expected feature. But*if* in some distant > python version we would have separate operations for numerical > addition and sequence concatenation, then we might split our current > sum() into two functions: sum() and concat(). > > But to do that we must first solve the problem for sum(). > Or we'll have exactly same problem with concat() anyway.> Or.. you can solve the problems for concat(), and later when a new major version of python is released, we might be able to make sum() use the same techniques. It can work both ways I think. I got the impression that you already know how, or several possible ways to solve the problem with sum(), but are running up against some backward compatibility issues? ie.. can't use __iadd__ in place of __add__. And also some ideological issues about what others think sum() should or shouldn't do. And how much should or should not be special cased. A new function would allow you to write the function how you think it will work the best. And there would not be any of those issues. As far as the meaning of the word "sum" goes and how it's used in english, It's not the most important issue to me. Get something to work, and demonstrate it works and is useful... then we can have a discussion about what to call it. Take a poll, and if it's still not decided, ask one of the core developers to choose from the top name candidates. That would work fine for me. What's important to me is that I have a way to write nice programs. Once I used whatever function a few times, it's name takes on the meaning of what it does. I just want something I can easily find if I need to look up the details for it. Cheers, Ron From ncoghlan at gmail.com Tue Jul 16 08:50:32 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Jul 2013 16:50:32 +1000 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol Message-ID: I haven't been following the sum() threads fully, but something Ron suggested gave me an idea for a concatenation API and protocol. I think we may also be able to use a keyword-only argument to solve the old string.join vs str.join problem in a more intuitive way. def concat(start, iterable, *, interleave=None): try: build = start.__concat__ except AttributeError: result = start if interleave is None: for x in iterable: result += x else: for x in iterable: result += interleave result += x else: result = build(iterable, interleave=interleave) If implementing this as a third party API you'd use a tool like functools.singledispatch (which has a backport available on PyPI) rather than defining a new protocol. Registering implementations for the immutable builtin types like str, bytes and tuple would then allow those to be handled efficiently, just as if they provided appropriate __concat__ implementations. A simple "use sum for numbers, concat for containers" approach is simpler and clearer than trying to coerce sum into being fast for both when its assumptions are thoroughly grounded in manipulating numbers rather than containers. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Tue Jul 16 08:51:48 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 16 Jul 2013 15:51:48 +0900 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130716083605.16da9f9f@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> Message-ID: <87mwpnro8r.fsf@uwakimon.sk.tsukuba.ac.jp> Sergey writes: > I don't understand it. It makes no sense to me. Just accept that many people, *for several different reasons*, dislike your proposal. The technical objections are the least of those reasons. Please just write the PEP and ask for a pronouncement. If you don't feel confident in your PEP-writing skills, ask for help. (If you don't get any, you probably should take that as "the community says No".) > Do you like having many broken tools? And please stop this. sum() is not broken, any more than a screwdriver is broken just because it is rather inefficient when used to pound in nails. From steve at pearwood.info Tue Jul 16 09:22:38 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 16 Jul 2013 17:22:38 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130716083605.16da9f9f@sergey> References: <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> Message-ID: <20130716072237.GA31779@ando> On Tue, Jul 16, 2013 at 08:36:05AM +0300, Sergey wrote: > Anyway, I've got your point. You want sum() to be O(N) for numbers > and some rare/custom containers, but want it to stay O(N*N) for > the most popular container types for some reasons, that are too > hard for me to understand. Right now, sum() behaves in a certain way. There are certain things which people expect sum() to do. Some of those things are documented explicitly. Some of them are implied. Some of them have regression tests. Some of them don't. But regardless, we can tell how sum() behaves right now by running it and seeing what it does. Your suggested optimizations change that behaviour. It does not just speed sum() up, they lead to an actual semantic change. So we are not just arguing about speed, we are arguing about behaviour as well. You are worried about sum() being slow for people who call it with list arguments. That is a valid concern. Nobody here *wants* sum() to be slow. If it was a pure speed optimization, then we would all be 100% behind it. But it is not a pure speed optimization, it also changes behaviour, sometimes in subtle, hard to see ways. So there are three approaches we can take: - Do nothing. sum() continues to work exactly the same way as it currently works, even if that means sometimes it is slow. - Accept your patches. sum() changes its behaviour, which will break somebody's working code, but it will be fast, at least for some objects. - Accept a compromise position. We can make sum() faster for built-in lists, and maybe built-in tuples, while keeping the same behaviour. Everything else, including subclasses of list and tuple, keep the same behaviour, which may mean it remains slow. They are the only choices. You are concerned more about sum() being slow than you are about breaking code that today works. Some of us here disagree, and think that breaking code is worse than slow code, especially for something as uncommon as sum(list_of_lists). It's not that we want sum() to be slow. But if we have a choice between accepting your patch: # this code works now, but your patch will break it result = sum(list_of_objects) and rejecting it: # this code works now, but is slow, and will remain slow result = sum(list_of_lists) I believe that the decision is simple. Breaking code that works now for a mere optimization is unacceptable. But, a compromise patch that speeds up some code without breaking any code may be acceptable. > Same applies to sum(): even if it's impossible to make if fast for > all collection types, it does not mean that it should not be fast for > some of them, e.g. lists and tuples. That is change from your previous posts where you said you could make it fast for "everything". I am glad to see you have accepted this. -- Steven From steve at pearwood.info Tue Jul 16 09:31:29 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 16 Jul 2013 17:31:29 +1000 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130716083605.16da9f9f@sergey> References: <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> Message-ID: <20130716073129.GB31779@ando> On Tue, Jul 16, 2013 at 08:36:05AM +0300, Sergey wrote: > You also said that I can't make tuples __add__ faster without patching > sum() so explained how you can do that (you never answered whether > you like such a patch and whether you would agree to write it). > I even wrote a simple fasttuple proof of concept [1]. > [1] http://bugs.python.org/file30917/fasttuple.py I do not like that implementation, because it shares the underlying storage. This means that tuples which ought to be small will grow and grow and grow just because you have called __add__ on a different tuple. Using Python 2.7 and your implementation above: py> a = ft([]) # empty tuple py> len(a._store) 0 py> b = ft([1]) py> c = a + b py> d = ft([2]*10000) py> c = c + d py> len(a._store) 10001 So adding a big tuple to c changes the internal storage of a. -- Steven From ronaldoussoren at mac.com Tue Jul 16 09:58:15 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 16 Jul 2013 09:58:15 +0200 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: On 16 Jul, 2013, at 8:50, Nick Coghlan wrote: > I haven't been following the sum() threads fully, but something Ron > suggested gave me an idea for a concatenation API and protocol. I > think we may also be able to use a keyword-only argument to solve the > old string.join vs str.join problem in a more intuitive way. > > def concat(start, iterable, *, interleave=None): > [...] > > A simple "use sum for numbers, concat for containers" approach is > simpler and clearer than trying to coerce sum into being fast for both > when its assumptions are thoroughly grounded in manipulating numbers > rather than containers. I like the basic idea, using 'concat' to concatenate sequences and strings is clear than using sum and the __contact__ (or singledispatch) protocol has a nice way to make this fast enough for any type without trying to push knowlegde about all types in the implementation of concat. Ronald From oscar.j.benjamin at gmail.com Tue Jul 16 12:21:14 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 16 Jul 2013 11:21:14 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: On 16 July 2013 07:50, Nick Coghlan wrote: > I haven't been following the sum() threads fully, but something Ron > suggested gave me an idea for a concatenation API and protocol. I > think we may also be able to use a keyword-only argument to solve the > old string.join vs str.join problem in a more intuitive way. The sum() threads have highlighted one and only one problem which is that people are often using (or at least suggesting to use) sum() in order to concatenate sequences even though it has quadratic performance for this. The stdlib already has a solution for this: chain. No one in the sum threads has raised any issue with using chain (or chain.from_iterable) except to argue that it is not widely used. If people are using sum() to concatenate lists then this should be taken not as evidence that a new solution needs to be found but as evidence that chain is not sufficiently well-known. The obvious solution to that is not to implement a new protocol but to make the existing solution more well known i.e. move chain.from_iterable to builtins and rename it (the obvious choice being concat). > def concat(start, iterable, *, interleave=None): > try: > build = start.__concat__ > except AttributeError: > result = start > if interleave is None: > for x in iterable: > result += x > else: > for x in iterable: > result += interleave > result += x > else: > result = build(iterable, interleave=interleave) That doesn't seem like a very nice signature e.g.: concat(lines[0], lines[1:], interleave='\n') is not as good as '\n'.join(lines) It's worse with an iterator: it = iter(iterable) try: start = next(it) except StopIteration: result = '' else: result = concat(start, it, interleave=sep) Or have I misunderstood? > If implementing this as a third party API you'd use a tool like > functools.singledispatch (which has a backport available on PyPI) > rather than defining a new protocol. Registering implementations for > the immutable builtin types like str, bytes and tuple would then allow > those to be handled efficiently, just as if they provided appropriate > __concat__ implementations. Since they all expose the iterator protocol and can be built from iterators, chain already solves the problem for tuple, list and many more non-string type sequences in an easily extensible way. String type sequences have different constructor signatures so they use join methods instead. There's no point in special casing chain (as happens in sum) to check for str/bytes/etc since it clearly doesn't do what you wanted: >>> from itertools import chain >>> str(chain(['123', '456'])) '' >>> bytes(chain(['123', '456'])) Traceback (most recent call last): File "", line 1, in TypeError: 'str' object cannot be interpreted as an integer >>> bytearray(chain(['123', '456'])) Traceback (most recent call last): File "", line 1, in TypeError: an integer is required > A simple "use sum for numbers, concat for containers" approach is > simpler and clearer than trying to coerce sum into being fast for both > when its assumptions are thoroughly grounded in manipulating numbers > rather than containers. Use sum for numbers, join for strings, and chain for other sequences (even though the equivalent operation can be invoked with + or += in all cases). Oscar From ronaldoussoren at mac.com Tue Jul 16 12:37:28 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 16 Jul 2013 12:37:28 +0200 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: <7651BDB8-B71E-47F9-83A3-19DB265C1217@mac.com> On 16 Jul, 2013, at 12:21, Oscar Benjamin wrote: > On 16 July 2013 07:50, Nick Coghlan wrote: >> I haven't been following the sum() threads fully, but something Ron >> suggested gave me an idea for a concatenation API and protocol. I >> think we may also be able to use a keyword-only argument to solve the >> old string.join vs str.join problem in a more intuitive way. > > The sum() threads have highlighted one and only one problem which is > that people are often using (or at least suggesting to use) sum() in > order to concatenate sequences even though it has quadratic > performance for this. The stdlib already has a solution for this: > chain. No one in the sum threads has raised any issue with using chain > (or chain.from_iterable) except to argue that it is not widely used. > > If people are using sum() to concatenate lists then this should be > taken not as evidence that a new solution needs to be found but as > evidence that chain is not sufficiently well-known. The obvious > solution to that is not to implement a new protocol but to make the > existing solution more well known i.e. move chain.from_iterable to > builtins and rename it (the obvious choice being concat). > >> def concat(start, iterable, *, interleave=None): >> try: >> build = start.__concat__ >> except AttributeError: >> result = start >> if interleave is None: >> for x in iterable: >> result += x >> else: >> for x in iterable: >> result += interleave >> result += x >> else: >> result = build(iterable, interleave=interleave) > > That doesn't seem like a very nice signature e.g.: > > concat(lines[0], lines[1:], interleave='\n') > > is not as good as > > '\n'.join(lines) > > It's worse with an iterator: > > it = iter(iterable) > try: > start = next(it) > except StopIteration: > result = '' > else: > result = concat(start, it, interleave=sep) > > Or have I misunderstood? concat('', iterable, interleave=sep) should work. Ronald From oscar.j.benjamin at gmail.com Tue Jul 16 13:06:15 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 16 Jul 2013 12:06:15 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: <7651BDB8-B71E-47F9-83A3-19DB265C1217@mac.com> References: <7651BDB8-B71E-47F9-83A3-19DB265C1217@mac.com> Message-ID: On 16 July 2013 11:37, Ronald Oussoren wrote: > > On 16 Jul, 2013, at 12:21, Oscar Benjamin wrote: > >> On 16 July 2013 07:50, Nick Coghlan wrote: >>> I haven't been following the sum() threads fully, but something Ron >>> suggested gave me an idea for a concatenation API and protocol. I >>> think we may also be able to use a keyword-only argument to solve the >>> old string.join vs str.join problem in a more intuitive way. >> >> The sum() threads have highlighted one and only one problem which is >> that people are often using (or at least suggesting to use) sum() in >> order to concatenate sequences even though it has quadratic >> performance for this. The stdlib already has a solution for this: >> chain. No one in the sum threads has raised any issue with using chain >> (or chain.from_iterable) except to argue that it is not widely used. >> >> If people are using sum() to concatenate lists then this should be >> taken not as evidence that a new solution needs to be found but as >> evidence that chain is not sufficiently well-known. The obvious >> solution to that is not to implement a new protocol but to make the >> existing solution more well known i.e. move chain.from_iterable to >> builtins and rename it (the obvious choice being concat). >> >>> def concat(start, iterable, *, interleave=None): >>> try: >>> build = start.__concat__ >>> except AttributeError: >>> result = start >>> if interleave is None: >>> for x in iterable: >>> result += x >>> else: >>> for x in iterable: >>> result += interleave >>> result += x >>> else: >>> result = build(iterable, interleave=interleave) >> >> That doesn't seem like a very nice signature e.g.: >> >> concat(lines[0], lines[1:], interleave='\n') >> >> is not as good as >> >> '\n'.join(lines) >> >> It's worse with an iterator: >> >> it = iter(iterable) >> try: >> start = next(it) >> except StopIteration: >> result = '' >> else: >> result = concat(start, it, interleave=sep) >> >> Or have I misunderstood? > > concat('', iterable, interleave=sep) should work. Not with the code as shown. The result would be prepended with sep. Oscar From joshua at landau.ws Tue Jul 16 14:28:41 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 16 Jul 2013 13:28:41 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: On 16 July 2013 11:21, Oscar Benjamin wrote: > On 16 July 2013 07:50, Nick Coghlan wrote: > > If people are using sum() to concatenate lists then this should be > taken not as evidence that a new solution needs to be found but as > evidence that chain is not sufficiently well-known. The obvious > solution to that is not to implement a new protocol but to make the > existing solution more well known i.e. move chain.from_iterable to > builtins and rename it (the obvious choice being concat). You could wait for PEP 448, which will let you use [*sublist for sublist in list_to_be_flattened]. From ron3200 at gmail.com Tue Jul 16 14:59:42 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 16 Jul 2013 07:59:42 -0500 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: <7651BDB8-B71E-47F9-83A3-19DB265C1217@mac.com> Message-ID: On 07/16/2013 06:06 AM, Oscar Benjamin wrote: > On 16 July 2013 11:37, Ronald Oussoren wrote: >> >> On 16 Jul, 2013, at 12:21, Oscar Benjamin wrote: >> >>> On 16 July 2013 07:50, Nick Coghlan wrote: >>>> I haven't been following the sum() threads fully, but something Ron >>>> suggested gave me an idea for a concatenation API and protocol. I >>>> think we may also be able to use a keyword-only argument to solve the >>>> old string.join vs str.join problem in a more intuitive way. >>> >>> The sum() threads have highlighted one and only one problem which is >>> that people are often using (or at least suggesting to use) sum() in >>> order to concatenate sequences even though it has quadratic >>> performance for this. The stdlib already has a solution for this: >>> chain. No one in the sum threads has raised any issue with using chain >>> (or chain.from_iterable) except to argue that it is not widely used. >>> >>> If people are using sum() to concatenate lists then this should be >>> taken not as evidence that a new solution needs to be found but as >>> evidence that chain is not sufficiently well-known. The obvious >>> solution to that is not to implement a new protocol but to make the >>> existing solution more well known i.e. move chain.from_iterable to >>> builtins and rename it (the obvious choice being concat). Yes, currently chain is the best way to do this. And no, concat would not be a good name for a relocated chain unless it's also wrapped in a constructor to give an object instead of a generator. This isn't the idea that is being suggested. >>>> def concat(start, iterable, *, interleave=None): >>>> try: >>>> build = start.__concat__ >>>> except AttributeError: >>>> result = start >>>> if interleave is None: >>>> for x in iterable: >>>> result += x >>>> else: >>>> for x in iterable: >>>> result += interleave >>>> result += x >>>> else: >>>> result = build(iterable, interleave=interleave) >>> >>> That doesn't seem like a very nice signature e.g.: >>> >>> concat(lines[0], lines[1:], interleave='\n') >>> >>> is not as good as >>> >>> '\n'.join(lines) That will still work, and concat wouldn't join lines like this. Although I think a lot of people may complain about that. Concatination, "concat" is associated fairly strongly with strings, so it would be a surprise if it didn't do strings with that name. But this is a nit=pic, and we may be able to come up with a better name that does't have that baggage. >>> It's worse with an iterator: >>> >>> it = iter(iterable) >>> try: >>> start = next(it) >>> except StopIteration: >>> result = '' >>> else: >>> result = concat(start, it, interleave=sep) >>> >>> Or have I misunderstood? >> >> concat('', iterable, interleave=sep) should work. > > Not with the code as shown. The result would be prepended with sep. It would be a TypeError. The part you are misunderstanding is this all depends on weather or not a builtin version of this can be significantly faster than chain. And/or if there is enough use cases where this will be beneficial. Ideas like this don't just get in automatically, they still need to be "worth it". Cheers, Ron From ncoghlan at gmail.com Tue Jul 16 15:01:13 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Jul 2013 23:01:13 +1000 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: On 16 July 2013 22:28, Joshua Landau wrote: > On 16 July 2013 11:21, Oscar Benjamin wrote: >> On 16 July 2013 07:50, Nick Coghlan wrote: >> >> If people are using sum() to concatenate lists then this should be >> taken not as evidence that a new solution needs to be found but as >> evidence that chain is not sufficiently well-known. The obvious >> solution to that is not to implement a new protocol but to make the >> existing solution more well known i.e. move chain.from_iterable to >> builtins and rename it (the obvious choice being concat). > > You could wait for PEP 448, which will let you use [*sublist for > sublist in list_to_be_flattened]. Ah, true, I forgot about that. Too many interesting things going on for me to keep track of everything :) In effect, PEP 448 goes further than making chain a builtin: it gives it syntax! With PEP 448, the generator expression: (*itr for itr in iterables) would be equivalent to either of the current: itertools.chain(*iterables) itertools.chain.from_iterable(iterables) That's pretty cool. It also means I can go back to happily ignoring the sum threads :) Cheers, Nick. P.S. Something about this should probably be added to the rationale section of PEP 448 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Jul 16 15:02:39 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 16 Jul 2013 23:02:39 +1000 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: <51E5446F.5050505@pearwood.info> On 16/07/13 16:50, Nick Coghlan wrote: > I haven't been following the sum() threads fully, but something Ron > suggested gave me an idea for a concatenation API and protocol. I > think we may also be able to use a keyword-only argument to solve the > old string.join vs str.join problem in a more intuitive way. What is the string.join vs str.join problem? Are you referring to the fact that in Python 1.5, the string.join() function takes arguments in the opposite order to str.join() method? I'm not sure that's a problem, except in the sense that people has to unlearn one and learn the other. > def concat(start, iterable, *, interleave=None): > try: > build = start.__concat__ > except AttributeError: > result = start > if interleave is None: > for x in iterable: > result += x > else: > for x in iterable: > result += interleave > result += x > else: > result = build(iterable, interleave=interleave) I assume that you missed a "return result" at the end of the function. I don't understand the purpose of interleave: py> concat([99], [[1], [2], [3]], interleave=[100, 101]) [99, 100, 101, 1, 100, 101, 2, 100, 101, 3] I would expect interleave should be a zip-like function with this effect: interleave([1, 2, 3], [100, 101]) => [1, 100, 2, 101, 3] so I don't understand why I might want to use the interleave argument above. I also wonder why this potentially modifies start in place. I would expect that, like sum(), it should return a new object even if start is mutable. I dislike that start is a mandatory argument. I should be able to concatenate a bunch of (say) strings, or lists, without necessarily supplying a start value. E.g. I can do this with reduce: py> from functools import reduce py> from operator import add py> reduce(add, [[1], [2], [3]]) [1, 2, 3] That I can't do so with sum() is another reason why sum() is not a well-designed API for strings and lists. But apart from those criticisms, I like the general idea. > A simple "use sum for numbers, concat for containers" approach is > simpler and clearer than trying to coerce sum into being fast for both > when its assumptions are thoroughly grounded in manipulating numbers > rather than containers. +1 -- Steven From oscar.j.benjamin at gmail.com Tue Jul 16 16:14:10 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 16 Jul 2013 15:14:10 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: <7651BDB8-B71E-47F9-83A3-19DB265C1217@mac.com> Message-ID: On 16 July 2013 13:59, Ron Adam wrote: > On 07/16/2013 06:06 AM, Oscar Benjamin wrote: >> On 16 July 2013 11:37, Ronald Oussoren wrote: >>> On 16 Jul, 2013, at 12:21, Oscar Benjamin >>> wrote: >>>> On 16 July 2013 07:50, Nick Coghlan wrote: >>>>> >>>>> def concat(start, iterable, *, interleave=None): >>>>> try: >>>>> build = start.__concat__ >>>>> except AttributeError: >>>>> result = start >>>>> if interleave is None: >>>>> for x in iterable: >>>>> result += x >>>>> else: >>>>> for x in iterable: >>>>> result += interleave >>>>> result += x >>>>> else: >>>>> result = build(iterable, interleave=interleave) >>>> >>>> That doesn't seem like a very nice signature e.g.: >>>> It's worse with an iterator: >>>> >>>> it = iter(iterable) >>>> try: >>>> start = next(it) >>>> except StopIteration: >>>> result = '' >>>> else: >>>> result = concat(start, it, interleave=sep) >>>> >>>> Or have I misunderstood? >>> >>> concat('', iterable, interleave=sep) should work. >> >> Not with the code as shown. The result would be prepended with sep. > > It would be a TypeError. > > The part you are misunderstanding is this all depends on weather or not a > builtin version of this can be significantly faster than chain. And/or if > there is enough use cases where this will be beneficial. No, looking at it I think that the part I misunderstood was that Nick intended for the concat function to behave in a slightly different way than the example code which places the interleave value between start and iterable[0]. > Ideas like this don't just get in automatically, they still need to be > "worth it". True. I personally don't think that there is any problem with summing lists not because I think that it's a good thing to do but because it's already easy to fix any code that does that. If it is the case that moving chain to builtins would help people to understand better ways of writing code then that might be a good thing to do. For me it would give the slight convenience that something I often use would be available without an import (and with a better name). I really like Joshua's PEP but I will probably still prefer chain to an unpacking generator in most situations. Oscar From oscar.j.benjamin at gmail.com Tue Jul 16 16:25:08 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 16 Jul 2013 15:25:08 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: On 16 July 2013 13:28, Joshua Landau wrote: > On 16 July 2013 11:21, Oscar Benjamin wrote: >> On 16 July 2013 07:50, Nick Coghlan wrote: >> >> If people are using sum() to concatenate lists then this should be >> taken not as evidence that a new solution needs to be found but as >> evidence that chain is not sufficiently well-known. The obvious >> solution to that is not to implement a new protocol but to make the >> existing solution more well known i.e. move chain.from_iterable to >> builtins and rename it (the obvious choice being concat). > > You could wait for PEP 448, which will let you use [*sublist for > sublist in list_to_be_flattened]. Well that does look good. How exactly does it unroll? Does the * translate as yield from but without the weird comprehension turning into a generator function behaviour? Oscar From masklinn at masklinn.net Tue Jul 16 16:48:01 2013 From: masklinn at masklinn.net (Masklinn) Date: Tue, 16 Jul 2013 14:48:01 +0000 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: <09325798-B4C7-4B00-BF1C-140854D638F1@masklinn.net> On 16 juil. 2013, at 14:25, Oscar Benjamin wrote: > On 16 July 2013 13:28, Joshua Landau wrote: >> On 16 July 2013 11:21, Oscar Benjamin wrote: >>> On 16 July 2013 07:50, Nick Coghlan wrote: >>> >>> If people are using sum() to concatenate lists then this should be >>> taken not as evidence that a new solution needs to be found but as >>> evidence that chain is not sufficiently well-known. The obvious >>> solution to that is not to implement a new protocol but to make the >>> existing solution more well known i.e. move chain.from_iterable to >>> builtins and rename it (the obvious choice being concat). >> >> You could wait for PEP 448, which will let you use [*sublist for >> sublist in list_to_be_flattened]. > > Well that does look good. How exactly does it unroll? Does the * > translate as yield from but without the weird comprehension turning > into a generator function behaviour? For a listcomp, surely it translates into a .extend of the accumulator? From oscar.j.benjamin at gmail.com Tue Jul 16 17:22:06 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 16 Jul 2013 16:22:06 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: <09325798-B4C7-4B00-BF1C-140854D638F1@masklinn.net> References: <09325798-B4C7-4B00-BF1C-140854D638F1@masklinn.net> Message-ID: On 16 July 2013 15:48, Masklinn wrote: >>> You could wait for PEP 448, which will let you use [*sublist for >>> sublist in list_to_be_flattened]. >> >> Well that does look good. How exactly does it unroll? Does the * >> translate as yield from but without the weird comprehension turning >> into a generator function behaviour? > > For a listcomp, surely it translates into a .extend of the accumulator? So [a for b in c] is for b in c: result.append(a) and [*a for b in c] is for b in c: result.extend(a) Set and dict comps presumably use .update. And the generator expression (*a for b in c) becomes for b in c: for x in a: yield x or is it actually (this is not equivalent): for b in c: yield from a Currently, ((yield from a) for b in c) becomes: for b in c: yield (yield from a) which is perhaps less useful (because of the additional yielded None values). Oscar Oscar From joshua at landau.ws Tue Jul 16 17:25:50 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 16 Jul 2013 16:25:50 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: On 16 July 2013 14:01, Nick Coghlan wrote: > On 16 July 2013 22:28, Joshua Landau wrote: >> On 16 July 2013 11:21, Oscar Benjamin wrote: >>> On 16 July 2013 07:50, Nick Coghlan wrote: >>> >>> If people are using sum() to concatenate lists then this should be >>> taken not as evidence that a new solution needs to be found but as >>> evidence that chain is not sufficiently well-known. The obvious >>> solution to that is not to implement a new protocol but to make the >>> existing solution more well known i.e. move chain.from_iterable to >>> builtins and rename it (the obvious choice being concat). >> >> You could wait for PEP 448, which will let you use [*sublist for >> sublist in list_to_be_flattened]. > > Ah, true, I forgot about that. Too many interesting things going on > for me to keep track of everything :) ... > That's pretty cool. It also means I can go back to happily ignoring > the sum threads :) :D > P.S. Something about this should probably be added to the rationale > section of PEP 448 There is: "The addition of unpacking to comprehensions is a logical extension. It's usage will primarily be a neat replacement for [i for j in 2D_list for i in j], as the more readable [*l for l in 2D_list]. Other uses are possible, but expected to occur rarely." If you require more than that, I'll be happy to add something in. From joshua at landau.ws Tue Jul 16 17:31:48 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 16 Jul 2013 16:31:48 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: <09325798-B4C7-4B00-BF1C-140854D638F1@masklinn.net> Message-ID: On 16 July 2013 16:22, Oscar Benjamin wrote: > On 16 July 2013 15:48, Masklinn wrote: >>>> You could wait for PEP 448, which will let you use [*sublist for >>>> sublist in list_to_be_flattened]. >>> >>> Well that does look good. How exactly does it unroll? Does the * >>> translate as yield from but without the weird comprehension turning >>> into a generator function behaviour? >> >> For a listcomp, surely it translates into a .extend of the accumulator? > > So > [a for b in c] > is > for b in c: > result.append(a) > and > [*a for b in c] > is > for b in c: > result.extend(a) Correct in essence, I don't know how the implementation works. > Set and dict comps presumably use .update. And the generator expression > (*a for b in c) > becomes > for b in c: > for x in a: > yield x > or is it actually (this is not equivalent): > for b in c: > yield from a I imagine it would use "yield from", although it is not actually defined in the PEP. I see no reason to prefer the explicit loop. If this matters enough, I can add it to the PEP, but I'd need a consensus in order to dictate a specific methodology over others. From eliben at gmail.com Tue Jul 16 17:41:45 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 16 Jul 2013 08:41:45 -0700 Subject: [Python-ideas] regex module - proper implementation of alternation? In-Reply-To: <51E56029.1000604@mrabarnett.plus.com> References: <51E56029.1000604@mrabarnett.plus.com> Message-ID: On Tue, Jul 16, 2013 at 8:00 AM, MRAB wrote: > On 16/07/2013 04:10, Eli Bendersky wrote: > >> Since the 'regex' module is a candidate for inclusion into the stdlib, I >> figured this would be a good place to ask. >> >> While discussing something related in pss >> (https://github.com/eliben/pss**), Ben Hoyt brought to my attention that >> the implementation of alternation (foo|bar) in Python's default regex >> module (the SRE implementation) is very inefficient. And indeed, looking >> there it seems that | gets translated to an opcode that simply means >> going over all the alternatives in a loop trying to match each. This is >> not how a proper regex engine should implement alternation! >> >> A common advice given to Python programmers is to combine their regexes >> into a single one with |. This makes code faster, but it turns out that >> it's far from its full potential because the combination doesn't go full >> way to the DFA as it's supposed to. >> >> A question about 'regex' - is it implemented properly there? >> >> There are 2 ways of implementing regex: DFA and NFA. > > DFA is faster, but those using NFA do so because the implementation > offers additional features that make DFA tricky or impossible, such as > backreferences. > > Of course, you could use DFA when it's possible, NFA when it isn't, at > the cost of yet more code. > > The regex module uses NFA, just like re. > > If you want to improve regex, making it use DFA when possible, well, > the source code is open, and your contributions are welcome. Good luck! > :-) > > You can use NFA without backtracking, though, by keeping track of the set of possible states. I believe (but am not 100% sure) this is the way re2 works, for example. In the particular case of alternations, such approach is vastly superior because the "possible states" set never grows large (assuming the alternatives are not mostly the same). Whereas with backtracking you always have to iterate over all of them. That said, I think you have answered my question - regex also uses a backtracking implementation of NFA and iterates in case of alternations. OK :-) Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Tue Jul 16 17:44:54 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 16 Jul 2013 16:44:54 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: <09325798-B4C7-4B00-BF1C-140854D638F1@masklinn.net> Message-ID: On 16 July 2013 16:31, Joshua Landau wrote: >> Set and dict comps presumably use .update. And the generator expression >> (*a for b in c) >> becomes >> for b in c: >> for x in a: >> yield x >> or is it actually (this is not equivalent): >> for b in c: >> yield from a > > I imagine it would use "yield from", although it is not actually > defined in the PEP. I see no reason to prefer the explicit loop. If > this matters enough, I can add it to the PEP, but I'd need a consensus > in order to dictate a specific methodology over others. There are other implications to using 'yield from' such as delegating to subgenerators for .send() et al. I think that it would be good to be able to delegate but the syntax might be seen as cryptic compared to an explicit 'yield from'. The explicit 'yield from' currently unrolls as the weird 'yield (yield from x)' though, which is less useful. Oscar From eliben at gmail.com Tue Jul 16 18:00:34 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 16 Jul 2013 09:00:34 -0700 Subject: [Python-ideas] regex module - proper implementation of alternation? In-Reply-To: <51E56C7B.3000806@mrabarnett.plus.com> References: <51E56029.1000604@mrabarnett.plus.com> <51E56C7B.3000806@mrabarnett.plus.com> Message-ID: On Tue, Jul 16, 2013 at 8:53 AM, MRAB wrote: > On 16/07/2013 16:41, Eli Bendersky wrote: > >> >> >> >> On Tue, Jul 16, 2013 at 8:00 AM, MRAB > >> >> wrote: >> >> On 16/07/2013 04:10, Eli Bendersky wrote: >> >> Since the 'regex' module is a candidate for inclusion into the >> stdlib, I >> figured this would be a good place to ask. >> >> While discussing something related in pss >> (https://github.com/eliben/**pss__), >> Ben Hoyt brought to my >> >> attention that >> the implementation of alternation (foo|bar) in Python's default >> regex >> module (the SRE implementation) is very inefficient. And indeed, >> looking >> there it seems that | gets translated to an opcode that simply >> means >> going over all the alternatives in a loop trying to match each. >> This is >> not how a proper regex engine should implement alternation! >> >> A common advice given to Python programmers is to combine their >> regexes >> into a single one with |. This makes code faster, but it turns >> out that >> it's far from its full potential because the combination doesn't >> go full >> way to the DFA as it's supposed to. >> >> A question about 'regex' - is it implemented properly there? >> >> There are 2 ways of implementing regex: DFA and NFA. >> >> DFA is faster, but those using NFA do so because the implementation >> offers additional features that make DFA tricky or impossible, such as >> backreferences. >> >> Of course, you could use DFA when it's possible, NFA when it isn't, at >> the cost of yet more code. >> >> The regex module uses NFA, just like re. >> >> If you want to improve regex, making it use DFA when possible, well, >> the source code is open, and your contributions are welcome. Good >> luck! >> :-) >> >> >> You can use NFA without backtracking, though, by keeping track of the >> set of possible states. I believe (but am not 100% sure) this is the way >> re2 works, for example. >> >> In the particular case of alternations, such approach is vastly superior >> because the "possible states" set never grows large (assuming the >> alternatives are not mostly the same). Whereas with backtracking you >> always have to iterate over all of them. >> >> That said, I think you have answered my question - regex also uses a >> backtracking implementation of NFA and iterates in case of alternations. >> OK :-) >> >> Have you tried timing them (re, re2, regex, and possibly others) to see > whether > it's a problem in practice? > I have not; this page - http://swtch.com/~rsc/regexp/regexp1.html - explains the problems with backtracking, but focuses on a different aspect (where backtracking leads to exponential behavior). The problem did, however, come up in practice in pss (see https://github.com/eliben/pss/issues/4); there was an attempt to make heavier use of regex alternations and the result ended up being much slower than one would expect. I had this experience in a different place as well (writing regex-based lexers). There is a performance improvement gained from moving from explicit Python-level looping to alternations, but it's a small gain - something that makes sense for just moving a loop from Python into C, but not for doing something actually smart with the alternation (like looking at each incoming character only once). [P.S. please use reply-all in this thread] Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Jul 16 19:34:16 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 16 Jul 2013 20:34:16 +0300 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: 16.07.13 16:01, Nick Coghlan ???????(??): > In effect, PEP 448 goes further than making chain a builtin: it gives > it syntax! With PEP 448, the generator expression: > > (*itr for itr in iterables) > > would be equivalent to either of the current: > > itertools.chain(*iterables) > itertools.chain.from_iterable(iterables) This looks as an argument against PEP 448. Why we need a new syntax if we can do this just with existent function? From python at mrabarnett.plus.com Tue Jul 16 19:39:54 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 16 Jul 2013 18:39:54 +0100 Subject: [Python-ideas] regex module - proper implementation of alternation? In-Reply-To: References: <51E56029.1000604@mrabarnett.plus.com> <51E56C7B.3000806@mrabarnett.plus.com> Message-ID: <51E5856A.6000303@mrabarnett.plus.com> On 16/07/2013 17:00, Eli Bendersky wrote: > > On Tue, Jul 16, 2013 at 8:53 AM, MRAB > wrote: > > On 16/07/2013 16:41, Eli Bendersky wrote: > > On Tue, Jul 16, 2013 at 8:00 AM, MRAB > > >> wrote: > > On 16/07/2013 04:10, Eli Bendersky wrote: > > Since the 'regex' module is a candidate for inclusion > into the > stdlib, I > figured this would be a good place to ask. > > While discussing something related in pss > (https://github.com/eliben/__pss__ > ), Ben Hoyt brought to my > > attention that > the implementation of alternation (foo|bar) in Python's > default > regex > module (the SRE implementation) is very inefficient. > And indeed, > looking > there it seems that | gets translated to an opcode that > simply means > going over all the alternatives in a loop trying to > match each. > This is > not how a proper regex engine should implement alternation! > > A common advice given to Python programmers is to > combine their > regexes > into a single one with |. This makes code faster, but > it turns > out that > it's far from its full potential because the > combination doesn't > go full > way to the DFA as it's supposed to. > > A question about 'regex' - is it implemented properly > there? > > There are 2 ways of implementing regex: DFA and NFA. > > DFA is faster, but those using NFA do so because the > implementation > offers additional features that make DFA tricky or > impossible, such as > backreferences. > > Of course, you could use DFA when it's possible, NFA when > it isn't, at > the cost of yet more code. > > The regex module uses NFA, just like re. > > If you want to improve regex, making it use DFA when > possible, well, > the source code is open, and your contributions are > welcome. Good luck! > :-) > > > You can use NFA without backtracking, though, by keeping track > of the > set of possible states. I believe (but am not 100% sure) this is > the way > re2 works, for example. > > In the particular case of alternations, such approach is vastly > superior > because the "possible states" set never grows large (assuming the > alternatives are not mostly the same). Whereas with backtracking you > always have to iterate over all of them. > > That said, I think you have answered my question - regex also uses a > backtracking implementation of NFA and iterates in case of > alternations. > OK :-) > > Have you tried timing them (re, re2, regex, and possibly others) to > see whether > it's a problem in practice? > > > I have not; this page - http://swtch.com/~rsc/regexp/regexp1.html - > explains the problems with backtracking, but focuses on a different > aspect (where backtracking leads to exponential behavior). > > The problem did, however, come up in practice in pss (see > https://github.com/eliben/pss/issues/4); there was an attempt to make > heavier use of regex alternations and the result ended up being much > slower than one would expect. I had this experience in a different place > as well (writing regex-based lexers). > > There is a performance improvement gained from moving from explicit > Python-level looping to alternations, but it's a small gain - something > that makes sense for just moving a loop from Python into C, but not for > doing something actually smart with the alternation (like looking at > each incoming character only once). > I'd welcome any realistic speed tests that would help me improve the performance of regex. From mertz at gnosis.cx Tue Jul 16 19:48:48 2013 From: mertz at gnosis.cx (David Mertz) Date: Tue, 16 Jul 2013 10:48:48 -0700 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: On Tue, Jul 16, 2013 at 10:34 AM, Serhiy Storchaka wrote: > 16.07.13 16:01, Nick Coghlan ???????(??): > > In effect, PEP 448 goes further than making chain a builtin: it gives >> it syntax! With PEP 448, the generator expression: >> >> (*itr for itr in iterables) >> >> would be equivalent to either of the current: >> >> itertools.chain(*iterables) >> itertools.chain.from_iterable(**iterables) >> > > This looks as an argument against PEP 448. Why we need a new syntax if we > can do this just with existent function? PEP 448 would make a bunch of other contexts work differently too, and arguably improved. It's just an edge-case side-effect that it gives us syntax for a function in itertools in the process. The large majority of what PEP 448 would change has no relation to itertools.chain. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jul 17 06:30:02 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 17 Jul 2013 13:30:02 +0900 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: <87k3kpst9x.fsf@uwakimon.sk.tsukuba.ac.jp> David Mertz writes: > It's just an edge-case side-effect that it gives us syntax for a > function in itertools in the process. Serendipitous syntax unification means it *may* be a good idea! From sergemp at mail.ru Wed Jul 17 16:28:31 2013 From: sergemp at mail.ru (Sergey) Date: Wed, 17 Jul 2013 17:28:31 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130716072237.GA31779@ando> References: <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> <20130716072237.GA31779@ando> Message-ID: <20130717172831.6f6d3ad2@sergey> On Jul 16, 2013 Steven D'Aprano wrote: > Right now, sum() behaves in a certain way. There are certain things > which people expect sum() to do. Some of those things are documented > explicitly. Some of them are implied. Some of them have regression > tests. Some of them don't. But regardless, we can tell how sum() behaves > right now by running it and seeing what it does. > > Your suggested optimizations change that behaviour. It does not just > speed sum() up, they lead to an actual semantic change. So we are not > just arguing about speed, we are arguing about behaviour as well. All my suggestions produce NO behaviour change. If they lead to some semantic changes ? it's a bug, that should be fixed, or the patch should be removed from suggestions list. > You are worried about sum() being slow for people who call it with list > arguments. That is a valid concern. Nobody here *wants* sum() to be > slow. If it was a pure speed optimization, then we would all be 100% > behind it. But it is not a pure speed optimization, it also changes > behaviour, sometimes in subtle, hard to see ways. > > So there are three approaches we can take: > > - Do nothing. sum() continues to work exactly the same way as it > currently works, even if that means sometimes it is slow. > > - Accept your patches. sum() changes its behaviour, which will break > somebody's working code, but it will be fast, at least for some > objects. If you talk about "+=" patch then I already removed it from my suggestion list exactly because it changed the behaviour. Others should not change sum() in any way except performance. > - Accept a compromise position. We can make sum() faster for built-in > lists, and maybe built-in tuples, while keeping the same behaviour. > Everything else, including subclasses of list and tuple, keep the > same behaviour, which may mean it remains slow. That would probably cover most of use-cases, at least most of those I could find, but yes, it does not help other types and subclasses. (are there some known use cases of tuple subclasses additions?) > They are the only choices. No, there're lots of others! That's what I'm trying to do here! I'm searching for the best choice to make sum performance consistent. If while doing that we'll also improve overall python performance, reduce its memory usage or find a "more obvious" way for sequences concatenation ? great! > You are concerned more about sum() being slow than you are about > breaking code that today works. Some of us here disagree, and > think that breaking code is worse than slow code, especially for > something as uncommon as sum(list_of_lists). Then I agree with some of us. :) Because I believe that backward compatibility is more important, than speed. That's why I'm mainly focused on those options, that do not break it. > But, a compromise patch that speeds up some code without breaking any > code may be acceptable. Yes, but which one? * Special case for lists and tuples is easy to do and breaks nothing, but does nothing good for subclasses and other types. * Direct optimization of lists/tuples looks like a great idea, works for subclasses, should also change nothing, but it's a large patch, that is harder to write and test compared to a special case one. * Universal concatenation interface (which one? there were at least 3 suggested) looks interesting, but needs lots of additional polishing. >> Same applies to sum(): even if it's impossible to make if fast for >> all collection types, it does not mean that it should not be fast for >> some of them, e.g. lists and tuples. > > That is change from your previous posts where you said you could make it > fast for "everything". I am glad to see you have accepted this. I have not really changed my mind about that. :) I still think that for every real-world type you name I can probably find a way to make it O(N) summable with sum() patch or without it. But of course, I do not expect that any of my suggestions implemented will instantly make all the types in the world O(N) summable. It may help them become O(N) however. >> [1] http://bugs.python.org/file30917/fasttuple.py > I do not like that implementation, because it shares the underlying > storage. This means that tuples which ought to be small will grow and > grow and grow just because you have called __add__ on a different tuple. > > Using Python 2.7 and your implementation above: > > py> a = ft([]) # empty tuple > py> len(a._store) > 0 > py> b = ft([1]) > py> c = a + b > py> d = ft([2]*10000) > py> c = c + d > py> len(a._store) > 10001 > > So adding a big tuple to c changes the internal storage of a. Yes, that's right. All 3 variables `a`, `b` and `c` share the same storage, so you effectively get 3 variables for the price of one. :) That's the concept. Why is that bad? -- From sergemp at mail.ru Wed Jul 17 17:03:50 2013 From: sergemp at mail.ru (Sergey) Date: Wed, 17 Jul 2013 18:03:50 +0300 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: Message-ID: <20130717180350.24565872@sergey> On Jul 16, 2013 Oscar Benjamin wrote: > On 16 July 2013 07:50, Nick Coghlan wrote: >> I haven't been following the sum() threads fully, but something Ron >> suggested gave me an idea for a concatenation API and protocol. I >> think we may also be able to use a keyword-only argument to solve the >> old string.join vs str.join problem in a more intuitive way. >> >> def concat(start, iterable, *, interleave=None): >> try: >> build = start.__concat__ >> except AttributeError: >> result = start >> if interleave is None: >> for x in iterable: >> result += x >> else: >> for x in iterable: >> result += interleave >> result += x >> else: >> result = build(iterable, interleave=interleave) (I assume `return result` in the end) That's an interesting idea. Somewhat similar to my #4 suggestion with awful name __init_concatenable_sequence_from_iterable__. Two questions about this idea: * What obj.__concat__ is expected to mean? E.g. class X: def __add__(self, other): returns new object being sum of `self` and `other` But: class X: def __concat__(self, ): * What should happen for mixed lists, i.e. code: concat(["str1", "str2", "str3"]) looks rather obvious, but what about code: concat(["string", some_object, some_other_object]) Would it raise an error or not? If not, what type would be a result of such operation? What if that `some_object` is somehow "concatenable" with string, while string has no idea how to concat that some_object? > The sum() threads have highlighted one and only one problem which is > that people are often using (or at least suggesting to use) sum() in > order to concatenate sequences even though it has quadratic > performance for this. The stdlib already has a solution for this: > chain. No one in the sum threads has raised any issue with using chain > (or chain.from_iterable) except to argue that it is not widely used. I did. Here's one of issues. Imagine a type, that somehow modifies items that it stores, removes duplicates, or sorts them, or something else, e.g.: class aset(set): def __add__(self, other): return self|other Now we have a code: list_of_sets = [ aset(["item1","item2","item3"]) ] * 1000 [...] for i in sum(list_of_sets, aset()): deal_with(i) If you replace `sum` with `chain` you get something like: for i in chain.from_iterable(list_of_sets): deal_with(i) Which works! (that's the worst part) but produces WRONG result! This example makes `chain` error-prone replacement for `sum`. It does not make `chain` bad, if you understand what you do you're free to use `chain`. It just makes `chain` not so good general replacement. -- From mertz at gnosis.cx Wed Jul 17 17:23:47 2013 From: mertz at gnosis.cx (David Mertz) Date: Wed, 17 Jul 2013 08:23:47 -0700 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: <20130717180350.24565872@sergey> References: <20130717180350.24565872@sergey> Message-ID: > Imagine a type, that somehow modifies items that it stores, removes > duplicates, or sorts them, or something else, e.g.: > class aset(set): > def __add__(self, other): > return self|other > > Now we have a code: > list_of_sets = [ aset(["item1","item2","item3"]) ] * 1000 > [...] > for i in sum(list_of_sets, aset()): > deal_with(i) > > If you replace `sum` with `chain` you get something like: > for i in chain.from_iterable(list_of_sets): > deal_with(i) > > Which works! (that's the worst part) but produces WRONG result! In this example you can use: aset(chain(list_of_sets)) This gives the same answer with the same big-O runtime. It's possible to come up with more perverse customizations where this won't hold. But I think all of them involve redefining __add__ as something with little relation to it's normal meaning. Odd behavior in those cases is to be expected. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Wed Jul 17 18:01:07 2013 From: mertz at gnosis.cx (David Mertz) Date: Wed, 17 Jul 2013 09:01:07 -0700 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: <20130717180350.24565872@sergey> Message-ID: > > class aset(set): > > def __add__(self, other): > > return self|other > > > > Now we have a code: > > list_of_sets = [ aset(["item1","item2","item3"]) ] * 1000 > > [...] > > for i in sum(list_of_sets, aset()): > > deal_with(i) > > > > If you replace `sum` with `chain` you get something like: > > for i in chain.from_iterable(list_of_sets): > > deal_with(i) > > > > Which works! (that's the worst part) but produces WRONG result! > > In this example you can use: > > aset(chain(list_of_sets)) > > This gives the same answer with the same big-O runtime. It's possible to come up with more perverse customizations where this won't hold. But I think all of them involve redefining __add__ as something with little relation to it's normal meaning. Odd behavior in those cases is to be expected. I perpetually forget the signature of chain. I mean, aset(chain(*list_of_sets)) But I have a slight excuse that it's a PITA to type code on phone. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed Jul 17 20:07:09 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 17 Jul 2013 20:07:09 +0200 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <51E3A094.9020107@egenix.com> References: <20130714222606.0f61f16e@sergey> <51E3A094.9020107@egenix.com> Message-ID: M.-A. Lemburg, 15.07.2013 09:11: > I don't understand why people try to use sum() for anything > other than a sequence of numbers. > > If you want to flatten a list, use a flatten function. +1 And, while we're at it, we can just as well ask why sum([[1,2,1], [2,1,2], [3,4,5,[6,7]], [[4,3], 1]]) doesn't return 42. IMHO, this would make a lot more sense than returning a concatenated list. Stefan From ron3200 at gmail.com Wed Jul 17 22:51:59 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 17 Jul 2013 15:51:59 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. Message-ID: [May be a duplicate... access to news server acting up.] On 07/16/2013 08:01 AM, Nick Coghlan wrote: > On 16 July 2013 22:28, Joshua Landau wrote: >> On 16 July 2013 11:21, Oscar Benjamin wrote: >>> On 16 July 2013 07:50, Nick Coghlan wrote: >>> >>> If people are using sum() to concatenate lists then this should be >>> taken not as evidence that a new solution needs to be found but as >>> evidence that chain is not sufficiently well-known. The obvious >>> solution to that is not to implement a new protocol but to make the >>> existing solution more well known i.e. move chain.from_iterable to >>> builtins and rename it (the obvious choice being concat). >> >> You could wait for PEP 448, which will let you use [*sublist for >> sublist in list_to_be_flattened]. > > Ah, true, I forgot about that. Too many interesting things going on > for me to keep track of everything :) > > In effect, PEP 448 goes further than making chain a builtin: it gives > it syntax! With PEP 448, the generator expression: > > (*itr for itr in iterables) > > would be equivalent to either of the current: > > itertools.chain(*iterables) > itertools.chain.from_iterable(iterables) > > That's pretty cool. It also means I can go back to happily ignoring > the sum threads :) > > Cheers, > Nick. > > P.S. Something about this should probably be added to the rationale > section of PEP 448 I played around with trying to find something that would work like the example Nick put up and found out that the different python types are not similar enough in how they do things to make a function that takes a method or other operator work well. What happens is you either end up with widely varying results depending on how the methods are implemented on each type, or an error because only a few methods are very common on all types. Mostly introspection methods. I believe this to be stronger underlying reason why functions like reduce and map were removed. And it's also a good reason not to recommend functions like sum() for things other than numbers. To use functions similar to that, you really have to think about what will happen in each case because the gears of the functions and methods are not visible in the same way a comprehension or generator expression is. It's too late to change how a lot of those methods work and I'm not sure it will still work very well. One of the most consistent protocols python has is the iterator and generator protocols. The reason they work so well is that they need to interface with for-loops and nearly all containers support that. examples... >>> a = [1,2,3] >>> iter(a) >>> b = (1,2,3) >>> iter(b) >>> c = {1:2, 3:4} >>> iter(c) >>> d = {1, 2, 3} >>> iter(d) >>> e = "123" >>> iter(e) And is why chain is the recommended method of joining multiple containers. This really only addresses getting stuff OUT of containers. PEP 448's * unpacking in comprehensions helps with the problem of putting things into containers. But that isn't the PEP's main point. What I'm thinking of is the inverse operation of an iter. Lets call it a "getter". You would get a getter the same way you get an iter. g = getter(obj) But instead of __next__ or send() methods, it would have an iter_send(), or isend() method. The isend method would takes an iter object, or an object that iter() can be called on. The getter would return either the object it came from, or a new object depending on weather or not it was created from a mutable or immutable obj. Mutable objects... g = getter(A) # A needs a __getter__ method. A = g.isend(B) A += B # extend Mutable objects... g = getter(A) C = g.isend(B) C = A + B # join The point, is to have something that works on many types and is as consistent in how it's defined as the iter protocol. Having a strict and clear definition is a very important! The internal implementation of a getter could do a direct copy to make it faster, like slicing does, but that would be a private implementation detail. They don't replace generator expressions or comprehensions. Those generally will do something with each item. Functions like extend() and concat() could be implemented with *getter-iters*, and work with a larger variety of objects with much less work and special handling. def extend(A, B): return getter(A).isend(B) def concat(A, B): """ Extend A with multiple containers from B. """ g = getter(A) if g.isend() is not A: raise TypeError("can't concat immutable arg, use merge()") for x in B: g.isend(x) return A def merge(A, B): """ Combine A with containers in B, return new container. """ a = list(A) g = getter(a) for x in B: g.isend(x) return type(A)(a) Expecting many holes to be punched in this idea ... But hope not too many. ;-) Ron From ron3200 at gmail.com Thu Jul 18 00:47:07 2013 From: ron3200 at gmail.com (Ron Adam) Date: Wed, 17 Jul 2013 17:47:07 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: Message-ID: Heh... Apologies for the poor writing. I think I haven't had enough sleep lately. Ron From steve at pearwood.info Thu Jul 18 04:27:51 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 18 Jul 2013 12:27:51 +1000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: Message-ID: <51E752A7.1060906@pearwood.info> On 18/07/13 06:51, Ron Adam wrote: > I played around with trying to find something that would work like the example Nick put up and found out that the different python types are not similar enough in how they do things to make a function that takes a method or other operator work well. I don't understand this paragraph. Functions that take methods/operators/other functions work perfectly well, they're called second-order functions. Decorators, factory functions, map, filter, and functools.reduce are all good examples of this. > What happens is you either end up with widely varying results depending on how the methods are implemented on each type, or an error because only a few methods are very common on all types. Mostly introspection methods. Yes. If you call a function f on arbitrary objects, some of those objects may be appropriate arguments to f, some may not. What's your point? > I believe this to be stronger underlying reason why functions like reduce and map were removed. And it's also a good reason not to recommend functions like sum() for things other than numbers. reduce and map have not been removed. map hasn't even been moved out of the builtins. > To use functions similar to that, you really have to think about what will happen in each case because the gears of the functions and methods are not visible in the same way a comprehension or generator expression is. I don't understand this sentence. > It's too late to change how a lot of those methods work and I'm not sure it will still work very well. > > One of the most consistent protocols python has is the iterator and generator protocols. The reason they work so well is that they need to interface with for-loops and nearly all containers support that. > > examples... > >>>> a = [1,2,3] >>>> iter(a) > What point are you trying to make? Builtins have custom iterator types. And? That's an implementation choice. One might make different choices: py> type(iter(set([]))) is type(iter(frozenset([]))) True Sets and frozen sets, despite being different types, share the same iterator type. > And is why chain is the recommended method of joining multiple containers. This really only addresses getting stuff OUT of containers. > > PEP 448's * unpacking in comprehensions helps with the problem of putting things into containers. But that isn't the PEP's main point. > Now we come to your actual proposal: > What I'm thinking of is the inverse operation of an iter. Lets call it a "getter". > You would get a getter the same way you get an iter. > > g = getter(obj) > > But instead of __next__ or send() methods, it would have an iter_send(), or isend() method. The isend method would takes an iter object, or an object that iter() can be called on. > > The getter would return either the object it came from, or a new object depending on weather or not it was created from a mutable or immutable obj. > > > Mutable objects... > > g = getter(A) # A needs a __getter__ method. > A = g.isend(B) What's B? Why is it needed as an argument, since g was fully specified by A only. That is: g.isend(B) g.isend(None) g.isend(42) etc. should all return the same A, so what is the purpose of passing the argument? > A += B # extend Since we don't know what A is, we cannot know in advance that it has an __iadd__ method that is the same as extend. I don't really understand why I would want to do this: start with an object A call getter(A) to create a "getter" object g call g.isend() to get A back again call some method on A when I could just do this: start with an object A call some method on A Nor do I understand what this has to do with iterators and generators, or why the method is called "isend" (iter_send). As far as I can tell, the only similarity between your getter and the built-in iter is that they are both functions that take a single argument. > Mutable objects... > > g = getter(A) > C = g.isend(B) > > C = A + B # join > > The point, is to have something that works on many types and is as consistent in how it's defined as the iter protocol. Having a strict and clear definition is a very important! This last sentence is very true. Would you like to give us a strict and clear definition of your getter proposal? > The internal implementation of a getter could do a direct copy to make it faster, like slicing does, but that would be a private implementation detail. A direct copy of what? A? Then why not spell it like this? A = copy.copy(A) instead of A = getter(A).isend(23) > They don't replace generator expressions or comprehensions. Those generally will do something with each item. > > Functions like extend() and concat() could be implemented with *getter-iters*, and work with a larger variety of objects with much less work and special handling. > > def extend(A, B): > return getter(A).isend(B) > > def concat(A, B): > """ Extend A with multiple containers from B. """ > g = getter(A) > if g.isend() is not A: > raise TypeError("can't concat immutable arg, use merge()") > for x in B: > g.isend(x) > return A How is that better than this? def concat(A, B): """ Extend A with multiple containers from B. """ for x in B: A.extend(x) return A (But note, that's not the definition of concat() I would expect. I would expect concat to return a new object, not modify A in place.) > Expecting many holes to be punched in this idea ... > But hope not too many. ;-) I'm afraid that to me the idea seems too incoherent to punch holes in it. -- Steven From steve at pearwood.info Thu Jul 18 04:52:36 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 18 Jul 2013 12:52:36 +1000 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> <51E3A094.9020107@egenix.com> Message-ID: <51E75874.5030009@pearwood.info> On 18/07/13 04:07, Stefan Behnel wrote: > M.-A. Lemburg, 15.07.2013 09:11: >> I don't understand why people try to use sum() for anything >> other than a sequence of numbers. >> >> If you want to flatten a list, use a flatten function. > > +1 > > And, while we're at it, we can just as well ask why > > sum([[1,2,1], [2,1,2], [3,4,5,[6,7]], [[4,3], 1]]) > > doesn't return 42. Why would it return 42? List addition is not defined as element-by-element addition, it is defined as concatenation. To put it another way, since [1, 2, 1] + [2, 1, 2] returns [1, 2, 1, 2, 1, 2], not 9, sum() returns the same. > IMHO, this would make a lot more sense than returning a > concatenated list. I'm afraid that this doesn't make sense to me. What you're showing is neither list addition as it is defined now (concatenation), nor element-by-element addition, both of which would raise exceptions, but a recursive flatten immediately followed by a summation. And this is just after you agreed that if you want to flatten a list, you should call a flatten function, not sum! Regardless of how sum() might have been defined, or should have been designed, or whether list addition should be defined as element-by-element addition rather than concatenation, sum() has been around for about ten years now, and we're constrained by backwards compatibility. If anyone wants to seriously argue for breaking backwards compatibility, please don't argue here until you have got at least the beginnings of a PEP written. -- Steven From joshua at landau.ws Thu Jul 18 07:11:11 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 18 Jul 2013 06:11:11 +0100 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: Message-ID: I believe that I understand what you are saying better than Steven D'Aprano does, so I thought I'd try and explain it. If I am wrong, at least you know what my misconceptions are then. On 17 July 2013 21:51, Ron Adam wrote: > I played around with trying to find something that would work like the > example Nick put up and found out that the different python types are not > similar enough in how they do things to make a function that takes a method > or other operator work well. > > What happens is you either end up with widely varying results depending on > how the methods are implemented on each type, or an error because only a few > methods are very common on all types. Mostly introspection methods. "Appending items to generic sequences is hard." > I believe this to be stronger underlying reason why functions like reduce > and map were removed. ... > It's too late to change how a lot of those methods work and I'm not sure it > will still work very well. "???" > One of the most consistent protocols python has is the iterator and > generator protocols. The reason they work so well is that they need to > interface with for-loops and nearly all containers support that. "The iterator protocol is really good at what it does -- perhaps we could make item-appending repurpose that idea." > And is why chain is the recommended method of joining multiple containers. "Chain is good for joining containers because that's exactly what it does -- use the iterator protocol." > This really only addresses getting stuff OUT of containers. ??? (I think I know what you are saying, I don't get how that's true for chain) > What I'm thinking of is the inverse operation of an iter. Lets call it a > "getter". > > You would get a getter the same way you get an iter. > > g = getter(obj) > > But instead of __next__ or send() methods, it would have an iter_send(), or > isend() method. The isend method would takes an iter object, or an object > that iter() can be called on. "There should be a generic interface to extending and appending that uses something similar to the iterator protocol." > The getter would return either the object it came from, or a new object > depending on weather or not it was created from a mutable or immutable obj ... "This would return the original type when the original type was mutable, so that you extend the original object. For immutable types it should return a new mutable object that "contains" the original immutable object's items." > The point, is to have something that works on many types and is as > consistent in how it's defined as the iter protocol. Having a strict and > clear definition is a very important! "$ditto" > The internal implementation of a getter could do a direct copy to make it > faster, like slicing does, but that would be a private implementation > detail. "For immutables, this can copy if it wishes, but does not have to." > They don't replace generator expressions or comprehensions. Those generally > will do something with each item. ??? > Functions like extend() and concat() could be implemented with > *getter-iters*, and work with a larger variety of objects with much less > work and special handling. "This would be useful for implementing extend and concat functions." > Expecting many holes to be punched in this idea ... > But hope not too many. ;-) AFAICT, this is just like a mutable chain, but that can affect the original items. I'm out of power, so I have to go, but this has basically led me to think: "Hey, why doesn't itertools.chain have .append() and .extend() -- I would use those loads!" Sorry that it's not actually a comment on your proposal. From tjreedy at udel.edu Thu Jul 18 07:59:14 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 18 Jul 2013 01:59:14 -0400 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130717172831.6f6d3ad2@sergey> References: <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> <20130716072237.GA31779@ando> <20130717172831.6f6d3ad2@sergey> Message-ID: On 7/17/2013 10:28 AM, Sergey wrote: > On Jul 16, 2013 Steven D'Aprano wrote: > >> Right now, sum() behaves in a certain way. There are certain things >> which people expect sum() to do. Some of those things are documented >> explicitly. Some of them are implied. Some of them have regression >> tests. Some of them don't. But regardless, we can tell how sum() behaves >> right now by running it and seeing what it does. >> >> Your suggested optimizations change that behaviour. It does not just >> speed sum() up, they lead to an actual semantic change. So we are not >> just arguing about speed, we are arguing about behaviour as well. > > All my suggestions produce NO behaviour change. If they lead to some > semantic changes ? it's a bug, that should be fixed, or the patch > should be removed from suggestions list. > >> You are worried about sum() being slow for people who call it with list >> arguments. That is a valid concern. Nobody here *wants* sum() to be >> slow. If it was a pure speed optimization, then we would all be 100% >> behind it. But it is not a pure speed optimization, it also changes >> behaviour, sometimes in subtle, hard to see ways. >> >> So there are three approaches we can take: >> >> - Do nothing. sum() continues to work exactly the same way as it >> currently works, even if that means sometimes it is slow. >> >> - Accept your patches. sum() changes its behaviour, which will break >> somebody's working code, but it will be fast, at least for some >> objects. > > If you talk about "+=" patch then I already removed it from my > suggestion list exactly because it changed the behaviour. Others > should not change sum() in any way except performance. > >> - Accept a compromise position. We can make sum() faster for built-in >> lists, and maybe built-in tuples, while keeping the same behaviour. >> Everything else, including subclasses of list and tuple, keep the >> same behaviour, which may mean it remains slow. > > That would probably cover most of use-cases, at least most of those I > could find, but yes, it does not help other types and subclasses. > (are there some known use cases of tuple subclasses additions?) > >> They are the only choices. > > No, there're lots of others! That's what I'm trying to do here! I'm > searching for the best choice to make sum performance consistent. > > If while doing that we'll also improve overall python performance, > reduce its memory usage or find a "more obvious" way for sequences > concatenation ? great! > >> You are concerned more about sum() being slow than you are about >> breaking code that today works. Some of us here disagree, and >> think that breaking code is worse than slow code, especially for >> something as uncommon as sum(list_of_lists). > > Then I agree with some of us. :) Because I believe that backward > compatibility is more important, than speed. That's why I'm mainly > focused on those options, that do not break it. > >> But, a compromise patch that speeds up some code without breaking any >> code may be acceptable. > > Yes, but which one? > * Special case for lists and tuples is easy to do and breaks nothing, > but does nothing good for subclasses and other types. > * Direct optimization of lists/tuples looks like a great idea, works > for subclasses, should also change nothing, but it's a large patch, > that is harder to write and test compared to a special case one. > * Universal concatenation interface (which one? there were at least > 3 suggested) looks interesting, but needs lots of additional > polishing. > >>> Same applies to sum(): even if it's impossible to make if fast for >>> all collection types, it does not mean that it should not be fast for >>> some of them, e.g. lists and tuples. >> >> That is change from your previous posts where you said you could make it >> fast for "everything". I am glad to see you have accepted this. > > I have not really changed my mind about that. :) I still think that > for every real-world type you name I can probably find a way to make > it O(N) summable with sum() patch or without it. > > But of course, I do not expect that any of my suggestions implemented > will instantly make all the types in the world O(N) summable. It may > help them become O(N) however. > >>> [1] http://bugs.python.org/file30917/fasttuple.py > >> I do not like that implementation, because it shares the underlying >> storage. This means that tuples which ought to be small will grow and >> grow and grow just because you have called __add__ on a different tuple. >> >> Using Python 2.7 and your implementation above: >> >> py> a = ft([]) # empty tuple >> py> len(a._store) >> 0 >> py> b = ft([1]) >> py> c = a + b >> py> d = ft([2]*10000) >> py> c = c + d >> py> len(a._store) >> 10001 >> >> So adding a big tuple to c changes the internal storage of a. > > Yes, that's right. All 3 variables `a`, `b` and `c` share the same > storage, so you effectively get 3 variables for the price of one. :) > That's the concept. Why is that bad? What happens to len(a._store) after del c? -- Terry Jan Reedy From ron3200 at gmail.com Thu Jul 18 08:12:38 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 18 Jul 2013 01:12:38 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: <51E752A7.1060906@pearwood.info> References: <51E752A7.1060906@pearwood.info> Message-ID: On 07/17/2013 09:27 PM, Steven D'Aprano wrote: > On 18/07/13 06:51, Ron Adam wrote: > >> I played around with trying to find something that would work like the >> example Nick put up and found out that the different python types are not >> similar enough in how they do things to make a function that takes a >> method or other operator work well. > > I don't understand this paragraph. That what the next paragraph was explaining. But not very well I guess. > Functions that take > methods/operators/other functions work perfectly well, they're called > second-order functions. Decorators, factory functions, map, filter, and > functools.reduce are all good examples of this. Yes, In the example I was referring to, it took an operator that was used to select a method to combine, append, or extend a group of other objects. While it looked simple, I think beginners would have a lot of trouble using it. >> What happens is you either end up with widely varying results depending >> on how the methods are implemented on each type, or an error because only >> a few methods are very common on all types. Mostly introspection methods. > > Yes. If you call a function f on arbitrary objects, some of those objects > may be appropriate arguments to f, some may not. What's your point? This proposal may improve some of those cases. Correct, and we can do better when it comes to moving content into containers. The iter protocol is pretty good for getting content out, It's a single way of extracting data that is the same for a lot of types. What is missing is the inverse of that, a single uniform way of getting data into containers. >> I believe this to be stronger underlying reason why functions like reduce >> and map were removed. And it's also a good reason not to recommend >> functions like sum() for things other than numbers. > > reduce and map have not been removed. map hasn't even been moved out of the > builtins. I should of checked, It was discussed at one time. I tend to write my own functions of that sort. Usually with a comprehension or generator expression. >> To use functions similar to that, you really have to think about what >> will happen in each case because the gears of the functions and methods >> are not visible in the same way a comprehension or generator expression is. > > I don't understand this sentence. When you look at a generator expression, you can see what it does. >> It's too late to change how a lot of those methods work and I'm not sure >> it will still work very well. >> >> One of the most consistent protocols python has is the iterator and >> generator protocols. The reason they work so well is that they need to >> interface with for-loops and nearly all containers support that. >> >> examples... >> >>>>> a = [1,2,3] >>>>> iter(a) >> > > What point are you trying to make? Builtins have custom iterator types. > And? That's an implementation choice. One might make different choices: And a good choice. We should do more of that. :-) > py> type(iter(set([]))) is type(iter(frozenset([]))) > True > > Sets and frozen sets, despite being different types, share the same > iterator type. No problem, thats good too. >> And is why chain is the recommended method of joining multiple >> containers. This really only addresses getting stuff OUT of containers. >> >> PEP 448's * unpacking in comprehensions helps with the problem of putting >> things into containers. But that isn't the PEP's main point. >> > > > Now we come to your actual proposal: I probably could of left out most of the above and put the proposal first. It's just how it came to me. >> What I'm thinking of is the inverse operation of an iter. Lets call it a >> "getter". >> You would get a getter the same way you get an iter. >> >> g = getter(obj) >> >> But instead of __next__ or send() methods, it would have an iter_send(), >> or isend() method. The isend method would takes an iter object, or an >> object that iter() can be called on. >> >> The getter would return either the object it came from, or a new object >> depending on weather or not it was created from a mutable or immutable obj. >> >> >> Mutable objects... >> >> g = getter(A) # A needs a __getter__ method. >> A = g.isend(B) > > What's B? Why is it needed as an argument, since g was fully specified by A > only. That is: The example is, moving items from B to A. The object B is fed into isend, then it's contents are read into A. So the getter, is an input interface to A. Just as an iter is an output interface for the object it is created from. Any iter can work with any getter. So you can tranfer the contents of any iterable to any other iterable. (Dictionaries would still need (key, value) pairs.) In the case of immutable objects, the getter creates a new object of the same type as it's from. So in the above, A = g.isend(B), 'A', is a new object contianing (A+B) if A was immutable. > g.isend(B) > g.isend(None) > g.isend(42) > etc. should all return the same A, so what is the purpose of passing the > argument? Only g.isend(B) would work here, if B can be iterated. The others would raise an exception. The getter, from A, reads the contents of B into A's storage space. In the case of ordered objects though it always add to the end. So you might need the reversed() function to add to the begining. There is an option of telling the getter how to star when it's created. Possibly by passing it a slice object. Unordered types could just ignore it. >> A += B # extend > > Since we don't know what A is, we cannot know in advance that it has an > __iadd__ method that is the same as extend. It dosen't need to know...and it dosen't need an __iadd__, it needs a __getter__. I was just trying to show the equivalent operation. The __getter__ always takes an iterator, or an object that can be iterated. This is it. """A getter is an objects iterator input interface.""" > I don't really understand why I would want to do this: > > start with an object A > call getter(A) to create a "getter" object g > call g.isend() to get A back again > call some method on A > > when I could just do this: > > start with an object A > call some method on A Sorry, I wasn't clear. No need to call a method on A. A getter could insert a lot of data into an object very fast. It's works like the extend method on lists, but could work on many other types and even transfer data from different types. It creates a uniform way to move data into objects, just like there is a uniform way to get data out of objects. > Nor do I understand what this has to do with iterators and generators, or > why the method is called "isend" (iter_send). As far as I can tell, the > only similarity between your getter and the built-in iter is that they are > both functions that take a single argument. They are both generators. And iter uses next(g), while a getter uses g.send(seq). It doesn't need to be called isend(), but in this case I think it's a helpful hint that it requires a sequence or an iterator of some type. >> Mutable objects... >> >> g = getter(A) >> C = g.isend(B) >> >> C = A + B # join >> >> The point, is to have something that works on many types and is as >> consistent in how it's defined as the iter protocol. Having a strict and >> clear definition is a very important! > > This last sentence is very true. Would you like to give us a strict and > clear definition of your getter proposal? > > >> The internal implementation of a getter could do a direct copy to make it >> faster, like slicing does, but that would be a private implementation >> detail. > > A direct copy of what? A? Then why not spell it like this? def extend_items(A, B): """ Add the content of the sub_items in B to A. """ # A must be mutable for this function to work. g = A.getter() # g is an input interface to "A" for sub_list in B: g.isend(sub_list) # inserts the contents of sub_list into A A = [list of very many items.] B = [bunch of large sub lists to add to A] extend_items(A, B) You could write that using list.extend() with the same result. But consider the same example but with dictionaries. A = {dictonary of words with frequency counts for index} B = [{dictionaries of word counts for each chapter}, {...}, {...}, ...] B is a list of dictonaries to be added to A. extend_items(A, B) In this case, it would work like the dictionairies update() method. But we didnt' need to change the function to get that. The dictionaries getter does that part for us. So it's exactly the same interface and the same function works on both of them without any special casing or testing of types. This is the main part I'm trying to communicate. > A = copy.copy(A) > instead of > > A = getter(A).isend(23) 23 isn't a container. It won't work if the object can't be iterated. So you would need to write that as... A = getter(A).isend([23]) # append 23 to A using a getter. # or extend([23]) That is why the method on a getter is names 'isend' rather than just 'send'. It's really the same thing, but the 'isend' is a reminder that it needs an iterator. A __getter__ method on a list object might be.. def __getter__(self): def g(): seq = yield self.extend(seq) return self gtr = g() next(gtr) # start it, so send method will work. return gtr And on a dictionary: def __getter__(self): def g(): seq = yield self.update(seq) return self getter = g() next(getter) return getter on a string: (bytes and tuples are very much like this.) def __getter__(self): def g(): seq = yield return self + seq getter = g() next(getter) return getter etc... It's pretty simple, but builtin versions of these would not need to use the 'extend', 'update', or '__add__' methods, but can do the eqivalent directly bypassing the method calls. Then what you have is a input protocol that complements the iter output protocol. >> They don't replace generator expressions or comprehensions. Those >> generally will do something with each item. >> >> Functions like extend() and concat() could be implemented with >> *getter-iters*, and work with a larger variety of objects with much less >> work and special handling. >> >> def extend(A, B): >> return getter(A).isend(B) >> >> def concat(A, B): >> """ Extend A with multiple containers from B. """ >> g = getter(A) >> if g.isend() is not A: >> raise TypeError("can't concat immutable arg, use merge()") >> for x in B: >> g.isend(x) >> return A > > How is that better than this? > > def concat(A, B): > """ Extend A with multiple containers from B. """ > for x in B: > A.extend(x) > return A Try it with a dictionary, a set, tuples, and other objects that don't have an extend method. > (But note, that's not the definition of concat() I would expect. I would > expect concat to return a new object, not modify A in place.) concat is just an exmaple of how a getter could be used. To write a general purpose concat function that works with different container types without getters, isn't as easy. >> Expecting many holes to be punched in this idea ... >> But hope not too many. ;-) > > > I'm afraid that to me the idea seems too incoherent to punch holes in it. Yes, and is why I apologised for the not to concise writing. :/ Cheers, Ron From ron3200 at gmail.com Thu Jul 18 08:26:08 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 18 Jul 2013 01:26:08 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: Message-ID: On 07/18/2013 12:11 AM, Joshua Landau wrote: > I believe that I understand what you are saying better than Steven > D'Aprano does, so I thought I'd try and explain it. If I am wrong, at > least you know what my misconceptions are then. You got the most of it. I really shouldn't write proposals while I'm dead tired. ;-) I did reply to Steven and explained it in more detail. Basically the idea is to get a getter obj from a container and be able to use it to send things into that container just like we use iter objects to get thing out. They are both generators. Cheers, Ron > On 17 July 2013 21:51, Ron Adam wrote: >> I played around with trying to find something that would work like the >> example Nick put up and found out that the different python types are not >> similar enough in how they do things to make a function that takes a method >> or other operator work well. >> >> What happens is you either end up with widely varying results depending on >> how the methods are implemented on each type, or an error because only a few >> methods are very common on all types. Mostly introspection methods. > > "Appending items to generic sequences is hard." > >> I believe this to be stronger underlying reason why functions like reduce >> and map were removed. > ... >> It's too late to change how a lot of those methods work and I'm not sure it >> will still work very well. > > "???" > >> One of the most consistent protocols python has is the iterator and >> generator protocols. The reason they work so well is that they need to >> interface with for-loops and nearly all containers support that. > > "The iterator protocol is really good at what it does -- perhaps we > could make item-appending repurpose that idea." > >> And is why chain is the recommended method of joining multiple containers. > > "Chain is good for joining containers because that's exactly what it > does -- use the iterator protocol." > >> This really only addresses getting stuff OUT of containers. > > ??? (I think I know what you are saying, I don't get how that's true for chain) > >> What I'm thinking of is the inverse operation of an iter. Lets call it a >> "getter". >> >> You would get a getter the same way you get an iter. >> >> g = getter(obj) >> >> But instead of __next__ or send() methods, it would have an iter_send(), or >> isend() method. The isend method would takes an iter object, or an object >> that iter() can be called on. > > "There should be a generic interface to extending and appending that > uses something similar to the iterator protocol." > >> The getter would return either the object it came from, or a new object >> depending on weather or not it was created from a mutable or immutable obj > ... > > "This would return the original type when the original type was > mutable, so that you extend the original object. For immutable types > it should return a new mutable object that "contains" the original > immutable object's items." > >> The point, is to have something that works on many types and is as >> consistent in how it's defined as the iter protocol. Having a strict and >> clear definition is a very important! > > "$ditto" > >> The internal implementation of a getter could do a direct copy to make it >> faster, like slicing does, but that would be a private implementation >> detail. > > "For immutables, this can copy if it wishes, but does not have to." > >> They don't replace generator expressions or comprehensions. Those generally >> will do something with each item. > > ??? > >> Functions like extend() and concat() could be implemented with >> *getter-iters*, and work with a larger variety of objects with much less >> work and special handling. > > "This would be useful for implementing extend and concat functions." > >> Expecting many holes to be punched in this idea ... >> But hope not too many. ;-) > > > AFAICT, this is just like a mutable chain, but that can affect the > original items. > > I'm out of power, so I have to go, but this has basically led me to > think: "Hey, why doesn't itertools.chain have .append() and .extend() > -- I would use those loads!" Sorry that it's not actually a comment on > your proposal. > From ubershmekel at gmail.com Thu Jul 18 08:45:24 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 18 Jul 2013 09:45:24 +0300 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: Message-ID: On Thu, Jul 18, 2013 at 9:26 AM, Ron Adam wrote: > > Basically the idea is to get a getter obj from a container and be able to > use it to send things into that container just like we use iter objects to > get thing out. > > This is an interesting summary of a perhaps interesting idea. I don't understand why you would call this a "getter" interface. Perhaps it's an "inserter", "adder" or "sender". Yuval PS I had a hard time editing the top-post. Please try and stick with the convention. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Thu Jul 18 08:46:46 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 18 Jul 2013 01:46:46 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: These methods would be called by a getter function which starts it by calling next on it before returning it. def getter(container): """ Get a getter from a container object. """ g = container.__getter__() next(g) return g On 07/18/2013 01:12 AM, Ron Adam wrote: > > > A __getter__ method on a list object might be.. > > def __getter__(self): > def g(): > seq = yield > self.extend(seq) > return self > gtr = g() > next(gtr) # start it, so send method will work. > return gtr Replace these last three lines with... return g() And the same for the rest of these. Ron > And on a dictionary: > > def __getter__(self): > def g(): > seq = yield > self.update(seq) > return self > getter = g() > next(getter) > return getter > > > on a string: (bytes and tuples are very much like this.) > > def __getter__(self): > def g(): > seq = yield > return self + seq > getter = g() > next(getter) > return getter > > etc... It's pretty simple, but builtin versions of these would not need to > use the 'extend', 'update', or '__add__' methods, but can do the eqivalent > directly bypassing the method calls. > > > Then what you have is a input protocol that complements the iter output > protocol. From clay.sweetser at gmail.com Thu Jul 18 08:51:08 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Thu, 18 Jul 2013 02:51:08 -0400 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: What is the difference between this "getter protocol" and using a generator's send method? On Jul 18, 2013 2:47 AM, "Ron Adam" wrote: > > > These methods would be called by a getter function which starts it by > calling next on it before returning it. > > > def getter(container): > """ Get a getter from a container object. """ > g = container.__getter__() > next(g) > return g > > > On 07/18/2013 01:12 AM, Ron Adam wrote: > >> >> >> A __getter__ method on a list object might be.. >> >> def __getter__(self): >> def g(): >> seq = yield >> self.extend(seq) >> return self >> gtr = g() >> next(gtr) # start it, so send method will work. >> return gtr >> > > Replace these last three lines with... > > return g() > > > And the same for the rest of these. > Ron > > > > And on a dictionary: >> >> def __getter__(self): >> def g(): >> seq = yield >> self.update(seq) >> return self >> getter = g() >> next(getter) >> return getter >> >> >> on a string: (bytes and tuples are very much like this.) >> >> def __getter__(self): >> def g(): >> seq = yield >> return self + seq >> getter = g() >> next(getter) >> return getter >> >> etc... It's pretty simple, but builtin versions of these would not need to >> use the 'extend', 'update', or '__add__' methods, but can do the eqivalent >> directly bypassing the method calls. >> >> >> Then what you have is a input protocol that complements the iter output >> protocol. >> > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Thu Jul 18 10:56:32 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 18 Jul 2013 09:56:32 +0100 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 18 July 2013 07:12, Ron Adam wrote: > > def extend_items(A, B): > """ Add the content of the sub_items in B to A. """ > # A must be mutable for this function to work. > g = A.getter() # g is an input interface to "A" > for sub_list in B: > g.isend(sub_list) # inserts the contents of sub_list into A 1) Say that A is a tuple -- how do you get the "aggregate" tuple out from this at the end? 2) Personally, I just don't see enough of a use-case to want this. Our standard "universal duck-types" includes "+" which does the same thing for most containers. 3) It needs better names. 4) Passing in a dictionary is odd, because iter(dictionary) iterates over keys; If you're passing in an iterable (hence "i"send) that wouldn't work great. It just seems inconsistent in a way. 5) Chain works fine most of the time I could imagine wanting this. From ncoghlan at gmail.com Thu Jul 18 11:08:18 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Jul 2013 19:08:18 +1000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 18 July 2013 16:46, Ron Adam wrote: > > > These methods would be called by a getter function which starts it by > calling next on it before returning it. > > > def getter(container): > """ Get a getter from a container object. """ > g = container.__getter__() > next(g) > return g Let's call it the "builder" protocol instead, since "getter" makes me think of "itemgetter" and "attrgetter", and this is well worn territory in Java with "StringBuilder" :) Let's say we defined the builder protocol this way: 1. Containers may define a "__builder__" method: def __builder__(self): "Returns a builder for a *new* instance of this type, pre-initialised with the contents of self" 2. Builders must define the following methods: def append(self, item): "Appends a single item to the builder" def extend(self, iterable): "Extends the builder with the contents of an iterable" __iadd__ = extend def finish(self): "Converts the contents of the builder to the final desired type" And added a new "builder" builtin with the following behaviour: def builder(obj): try: meth = obj.__builder__ except AttributeError: pass else: return meth return DefaultBuilder(obj) class DefaultBuilder: def __init__(self, obj): if not (hasattr(obj, "copy") and hasattr(obj, "append") and hasattr(obj, "extend")): raise TypeError("%r instance cannot be converted to a builder" % type(r)) self._obj = obj.copy() def append(self, item): if self._obj is None: raise RuntimeError("Cannot append to finished builder") self._obj.append(item) def extend(self, iterable): if self._obj is None: raise RuntimeError("Cannot extend finished builder") self._obj.extend(iterable) __iadd__ = extend def finish(self): if self._obj is None: raise RuntimeError("Builder already finished") result = self._obj self._obj = None return result Then builtin sum() would have the following case added to handle the builder protocol: try: bldr = builder(start) except TypeError: pass else: for item in iterable: bldr += item return bldr.finish() Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Thu Jul 18 14:54:43 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 18 Jul 2013 22:54:43 +1000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: Message-ID: <51E7E593.8070000@pearwood.info> On 18/07/13 16:26, Ron Adam wrote: > Basically the idea is to get a getter obj from a container and be able to use it to send things into that container just like we use iter objects to get thing out. I think your view of this is confused. The iterator protocol isn't about getting things out of something. It's about *iterating over collections*. Some iteratables consume their objects, "getting them out" as you put it, but others do not (e.g. range, sequences, dict views, etc). "Getting items out" is usually called pop, or popitem, or delete, or similar. The opposite of getting items out is pushing them in (extend, or append, or similar), but the opposite of iterating over a collection is not iterating over it. Your proposal is more about assembling collections than it is the opposite of iteration. I'm not sure that I like your proposal, but I'd dislike it a lot less if you treated it as an assembly protocol and dropped the "opposite of iteration" rationale. I don't think that rationale makes sense. It seems to me that your proposal is about creating some sort of proxy to arbitrary collection types, with a standard "add elements to you" interface, in-place if possible. I don't think this is really worthwhile. When I was first starting off learning Python, I was disturbed that lists, tuples, and dicts (there were no sets back then) had no common API for adding elements, so I wrote a helper function that looked something like this: def add(obj, elements): # This was Python 1.5, so no isinstance if type(obj) == type([]): # or extend for item in element: obj.append(element) elif type(obj) == type({}): obj.update(elements) elif type(obj) == type(()): for item in elements: obj = obj + (element,) return obj I soon discovered that I never used this function. Like a lot of my early code, it was more useful in theory than in practice. I rarely (never?) wanted to "add elements" to some arbitrary list, tuple or dict without knowing what it was before hand. And my add function was both *too* general (it applied to too many types, which I learned I didn't need) and *not general enough* (sometimes I wanted to insert at the start of a list, not append to the end; sometimes I wanted to replace dict items, sometimes I didn't). So it turned out to be useless. So I'm afraid that you will have an uphill battle convincing me that your suggested protocol is useful and not an over-generalization. -- Steven From ron3200 at gmail.com Thu Jul 18 15:46:12 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 18 Jul 2013 08:46:12 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 07/18/2013 01:51 AM, Clay Sweetser wrote: > What is the difference between this "getter protocol" and using a > generator's send method? A protocol specifies how something is done and what's required to do it. A generators send method is just a send method on any generator. The Protocol in this case, is each iterable object has a __getter__ method that uses its' send() method to receive an iterable object for the purposes of extending the instance. (or returning a new instance if it's a mutable object.) Cheers, Ron > On Jul 18, 2013 2:47 AM, "Ron Adam" > > wrote: > > > > These methods would be called by a getter function which starts it by > calling next on it before returning it. > > > def getter(container): > """ Get a getter from a container object. """ > g = container.__getter__() > next(g) > return g > > > On 07/18/2013 01:12 AM, Ron Adam wrote: > > > > A __getter__ method on a list object might be.. > > def __getter__(self): > def g(): > seq = yield > self.extend(seq) > return self > gtr = g() > next(gtr) # start it, so send method will work. > return gtr > > > Replace these last three lines with... > > return g() > > > And the same for the rest of these. > Ron > > > > And on a dictionary: > > def __getter__(self): > def g(): > seq = yield > self.update(seq) > return self > getter = g() > next(getter) > return getter > > > on a string: (bytes and tuples are very much like this.) > > def __getter__(self): > def g(): > seq = yield > return self + seq > getter = g() > next(getter) > return getter > > etc... It's pretty simple, but builtin versions of these would not > need to > use the 'extend', 'update', or '__add__' methods, but can do the > eqivalent > directly bypassing the method calls. > > > Then what you have is a input protocol that complements the iter output > protocol. > > > _________________________________________________ > Python-ideas mailing list > Python-ideas at python.org > > http://mail.python.org/__mailman/listinfo/python-ideas > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From ron3200 at gmail.com Thu Jul 18 15:59:45 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 18 Jul 2013 08:59:45 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 07/18/2013 03:56 AM, Joshua Landau wrote: > On 18 July 2013 07:12, Ron Adam wrote: >> >> def extend_items(A, B): >> """ Add the content of the sub_items in B to A. """ >> # A must be mutable for this function to work. >> g = A.getter() # g is an input interface to "A" >> for sub_list in B: >> g.isend(sub_list) # inserts the contents of sub_list into A > > 1) Say that A is a tuple -- how do you get the "aggregate" tuple out > from this at the end? You can't in this example. Which is why the comment is there. This function isn't meant to be in the proposal, the proposal allow you to write functions like this easier. A mutable version of this.. that also would work with immutables would be... def extend_items(A, B): """ Add the content of the sub_items in B to A. """ for sub_list in B: g = A.getter() # A Obj changes on each loop. A = g.isend(sub_list) # new A copy here. return A > 2) Personally, I just don't see enough of a use-case to want this. Our > standard "universal duck-types" includes "+" which does the same thing > for most containers. And we have slicing too. It's a suggestion, and there may not be enough in it to justify it. But they just don't work on as many things as iter() does for getting things out of containers. So I think something like this is a nice addition. > 3) It needs better names. Yes, getter is temporary, I'm not worried about the name. > 4) Passing in a dictionary is odd, because iter(dictionary) iterates > over keys; If you're passing in an iterable (hence "i"send) that > wouldn't work great. It just seems inconsistent in a way. Yes, that's unfortunate. You need to call the dictionaries .items() method in this case. g = getter(dict.items()) It is a view, so it would still work. > 5) Chain works fine most of the time I could imagine wanting this. Chain is a the other direction. It gets stuff out of a list of things by using iter(). A getter()'s would create that same flexibility for putting things into containers. Cheers, Ron From ron3200 at gmail.com Thu Jul 18 16:13:35 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 18 Jul 2013 09:13:35 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: <51E7E593.8070000@pearwood.info> References: <51E7E593.8070000@pearwood.info> Message-ID: On 07/18/2013 07:54 AM, Steven D'Aprano wrote: > On 18/07/13 16:26, Ron Adam wrote: > >> Basically the idea is to get a getter obj from a container and be able to >> use it to send things into that container just like we use iter objects >> to get thing out. > > > I think your view of this is confused. The iterator protocol isn't about > getting things out of something. It's about *iterating over collections*. > Some iteratables consume their objects, "getting them out" as you put it, > but others do not (e.g. range, sequences, dict views, etc). I would have said *remove* or *delete* if I meant this. > "Getting items out" is usually called pop, or popitem, or delete, or > similar. The opposite of getting items out is pushing them in (extend, or > append, or similar), but the opposite of iterating over a collection is not > iterating over it. > > Your proposal is more about assembling collections than it is the opposite > of iteration. I'm not sure that I like your proposal, but I'd dislike it a > lot less if you treated it as an assembly protocol and dropped the > "opposite of iteration" rationale. I don't think that rationale makes sense. You are understanding what I meant, and yes it is better described as assembling, or as Nick says. building. What I was meaning by getting things out.. is that of getting references out. Which is what iter() does. In the title I described getters (assemblers) as a complementing iter(), which is a better view point. > It seems to me that your proposal is about creating some sort of proxy to > arbitrary collection types, with a standard "add elements to you" > interface, in-place if possible. I don't think this is really worthwhile. > When I was first starting off learning Python, I was disturbed that lists, > tuples, and dicts (there were no sets back then) had no common API for > adding elements, so I wrote a helper function that looked something like this: > > def add(obj, elements): > # This was Python 1.5, so no isinstance > if type(obj) == type([]): > # or extend > for item in element: > obj.append(element) > elif type(obj) == type({}): > obj.update(elements) > elif type(obj) == type(()): > for item in elements: > obj = obj + (element,) > return obj > > I soon discovered that I never used this function. Like a lot of my early > code, it was more useful in theory than in practice. I rarely (never?) > wanted to "add elements" to some arbitrary list, tuple or dict without > knowing what it was before hand. And my add function was both *too* general > (it applied to too many types, which I learned I didn't need) and *not > general enough* (sometimes I wanted to insert at the start of a list, not > append to the end; sometimes I wanted to replace dict items, sometimes I > didn't). So it turned out to be useless. So I'm afraid that you will have > an uphill battle convincing me that your suggested protocol is useful and > not an over-generalization. It's still early in the discussion, and you may be correct. Cheers, Ron From ron3200 at gmail.com Thu Jul 18 16:29:37 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 18 Jul 2013 09:29:37 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 07/18/2013 04:08 AM, Nick Coghlan wrote: > On 18 July 2013 16:46, Ron Adam wrote: >> >> >> These methods would be called by a getter function which starts it by >> calling next on it before returning it. >> >> >> def getter(container): >> """ Get a getter from a container object. """ >> g = container.__getter__() >> next(g) >> return g > > Let's call it the "builder" protocol instead, since "getter" makes me > think of "itemgetter" and "attrgetter", and this is well worn > territory in Java with "StringBuilder" :) Is fine with me. The name "getter" was the first thing I thought of. > Let's say we defined the builder protocol this way: > > 1. Containers may define a "__builder__" method: > > def __builder__(self): > "Returns a builder for a *new* instance of this type, > pre-initialised with the contents of self" > > 2. Builders must define the following methods: > > def append(self, item): > "Appends a single item to the builder" > > def extend(self, iterable): > "Extends the builder with the contents of an iterable" > > __iadd__ = extend This is interesting. > def finish(self): > "Converts the contents of the builder to the final desired type" I was thinking a generator would be more efficient if it's called many times. But I think this is easier to understand. If there is more interest we can test both to see how much of a difference it makes. > And added a new "builder" builtin with the following behaviour: > > def builder(obj): > try: > meth = obj.__builder__ > except AttributeError: > pass > else: > return meth > return DefaultBuilder(obj) > > class DefaultBuilder: > def __init__(self, obj): > if not (hasattr(obj, "copy") and hasattr(obj, "append") > and hasattr(obj, "extend")): > raise TypeError("%r instance cannot be converted to a > builder" % type(r)) > self._obj = obj.copy() > > def append(self, item): > if self._obj is None: raise RuntimeError("Cannot append to > finished builder") > self._obj.append(item) > > def extend(self, iterable): > if self._obj is None: raise RuntimeError("Cannot extend > finished builder") > self._obj.extend(iterable) > > __iadd__ = extend > > def finish(self): > if self._obj is None: raise RuntimeError("Builder already finished") > result = self._obj > self._obj = None > return result > > Then builtin sum() would have the following case added to handle the > builder protocol: > > try: > bldr = builder(start) > except TypeError: > pass > else: > for item in iterable: > bldr += item > return bldr.finish() Yes, that would do it... and is a good example. Cheers, Ron From oscar.j.benjamin at gmail.com Thu Jul 18 17:06:44 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 18 Jul 2013 16:06:44 +0100 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 18 July 2013 10:08, Nick Coghlan wrote: > > Then builtin sum() would have the following case added to handle the > builder protocol: > > try: > bldr = builder(start) > except TypeError: > pass > else: > for item in iterable: > bldr += item > return bldr.finish() What use cases would the builder protocol have apart from using sum with collections (since that particular case is already well-covered by chain/join)? Wouldn't it be easier to put that logic into the constructor for type(collection) or into a factory function. Then you wouldn't need an additional protocol or an additional class (for each buildable collection). Why would you want to do this bldr = builder(()) # Build a tuple for val in stuff: bldr += item # Or append/extend result = bldr.finish() when you can just do this result = tuple(chain(stuff)) # or tuple(stuff) Most non-string collections already support this interface in their constructors or in a factory function. Oscar From stefan_ml at behnel.de Thu Jul 18 17:48:05 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 18 Jul 2013 17:48:05 +0200 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <51E75874.5030009@pearwood.info> References: <20130714222606.0f61f16e@sergey> <51E3A094.9020107@egenix.com> <51E75874.5030009@pearwood.info> Message-ID: Steven D'Aprano, 18.07.2013 04:52: > On 18/07/13 04:07, Stefan Behnel wrote: >> M.-A. Lemburg, 15.07.2013 09:11: >>> I don't understand why people try to use sum() for anything >>> other than a sequence of numbers. >>> >>> If you want to flatten a list, use a flatten function. >> >> +1 >> >> And, while we're at it, we can just as well ask why >> >> sum([[1,2,1], [2,1,2], [3,4,5,[6,7]], [[4,3], 1]]) >> >> doesn't return 42. > > > Why would it return 42? List addition is not defined as element-by-element > addition, it is defined as concatenation. To put it another way, since [1, > 2, 1] + [2, 1, 2] returns [1, 2, 1, 2, 1, 2], not 9, sum() returns the same. > > >> IMHO, this would make a lot more sense than returning a >> concatenated list. > > I'm afraid that this doesn't make sense to me. The point I was trying to make is that a "sum" is well defined for numbers but not for lists, so even a recursive sum over lists of lists makes more sense than calling sum() on a sequence of lists and expecting it to concatenate those lists. I really don't see the link between summing up items and concatenating them. If the function was called "concatenate()", then, yes, ok, I'd expect it to concatenate lists that I feed into it. But it's called sum(). And it's called that because the name describes what it does. The mere attempt to "sum" lists is just so misguided that it's really not worth any discussion, especially not about making it easier! Just to be clear, I definitely wasn't proposing to extend sum() to support recursive summing. Stefan From joshua at landau.ws Thu Jul 18 18:00:07 2013 From: joshua at landau.ws (Joshua Landau) Date: Thu, 18 Jul 2013 17:00:07 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> <51E3A094.9020107@egenix.com> <51E75874.5030009@pearwood.info> Message-ID: On 18 July 2013 16:48, Stefan Behnel wrote: > The point I was trying to make is that a "sum" is well defined for numbers > but not for lists, False. > so even a recursive sum over lists of lists makes more > sense than calling sum() on a sequence of lists and expecting it to > concatenate those lists. I disagree. > I really don't see the link between summing up > items and concatenating them. If the function was called "concatenate()", > then, yes, ok, I'd expect it to concatenate lists that I feed into it. But > it's called sum(). And it's called that because the name describes what it > does. The mere attempt to "sum" lists is just so misguided that it's really > not worth any discussion, especially not about making it easier! You're discussing it, and it isn't misguided. From stefan_ml at behnel.de Thu Jul 18 18:08:56 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 18 Jul 2013 18:08:56 +0200 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <20130714222606.0f61f16e@sergey> References: <20130714222606.0f61f16e@sergey> Message-ID: Sergey, 14.07.2013 21:26: > It's worth to note, that sum() is one of the most commonly suggested > options to add lists [1], despite usually someone also comes and says > that it may be slow. This means, that people at least often try to > use sum() for lists. That case was also explicitly mentioned in > comments to sum() sources. So the problem is not hypothetical. IMHO, the reason why sum() supports other input types than numbers is that in Python 2, at the time when it was implemented, there was no clear definition of a "number", so it was impossible to correctly distinguish between numberish input and input that should be rejected. I guess that explicitly rejecting arbitrary builtin types was considered overly, well, arbitrary, so it wasn't done at the time. If sum() was designed today, with the ABCs in Python 3 available, it would be easy to restrict it to meaningful input types, i.e. those that declare themselves as being numbers. However, changing this now would arbitrarily break existing code, so it's unlikely that this will happen. So I guess we'll just have to live with that little wart that sum() doesn't always reject stupid input. We can still tell those who try it that what they do is stupid, and show them better ways to do it. There were lots of suitable examples of better code presented over the last decade or so, some of which showed up in the recent threads again. Stefan From ron3200 at gmail.com Thu Jul 18 19:51:22 2013 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 18 Jul 2013 12:51:22 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 07/18/2013 10:06 AM, Oscar Benjamin wrote: > On 18 July 2013 10:08, Nick Coghlan wrote: >> >> Then builtin sum() would have the following case added to handle the >> builder protocol: >> >> try: >> bldr = builder(start) >> except TypeError: >> pass >> else: >> for item in iterable: >> bldr += item >> return bldr.finish() > > What use cases would the builder protocol have apart from using sum > with collections (since that particular case is already well-covered > by chain/join)? > > Wouldn't it be easier to put that logic into the constructor for > type(collection) or into a factory function. Then you wouldn't need an > additional protocol or an additional class (for each buildable > collection). > > Why would you want to do this > > bldr = builder(()) # Build a tuple > for val in stuff: > bldr += item # Or append/extend > result = bldr.finish() > > when you can just do this > > result = tuple(chain(stuff)) # or tuple(stuff) > > Most non-string collections already support this interface in their > constructors or in a factory function. If you know the result will always be a tuple, then you can certainly do that. And probably should. But if you want the same result to be the same type as the parts you start with, you need to write that for every type your routine may handle. It would look closer to this... result = stuff[0] result += chain(stuf[1:]) Which wouldn't always work due to not everything has an __iadd__. So you would need to do... result = stuff[0] rest = chain(stuff[1:]) result = result + rest But that creates a new object rather than extending the result. One of the points is to be able to extend an existing obj easily. We could do... result = stuff[0] rest = chain(stuff[1:]) result[:0] = rest[:] But here again, not every type support slicing. The current version I'm testing, based on Nicks example... class DefaultBuilder(list): def __init__(self, obj): if hasattr(obj, "__iter__"): self._obj = obj self.__iadd__ = self.extend else: raise TypeError("%r is not iterabe" % obj) def finish(self): if self._obj is None: raise RuntimeError("Builder already finished") if isinstance(self._obj, str): result = self._obj + ''.join(self) else: result = self._obj result += type(self._obj)(self) self._obj = None return result def builder(obj): try: meth = obj.__builder__ except AttributeError: pass else: return meth(obj) return DefaultBuilder(obj) The advantages over chain, is builtin builders (Written in C) could have access to both objects, (destination, and stuff to add), internally and may be able to do fast memory copies and/or moves of the objects. If we can reach that point with this suggestion, then: 1. More consistent way to copy and move data around. (Although it may not be obvious right now.) 2. Do it much faster due to not iterating over each object in many cases. 3. Makes it easier to write generalised routines like sum(). (*) (* Although we are using sum() as an example, there isn't any intention of replacing the current builtin sum() function at this time. It makes a nice test case though.) Cheers, Ron From g.rodola at gmail.com Thu Jul 18 20:13:46 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 18 Jul 2013 20:13:46 +0200 Subject: [Python-ideas] Enum and serialization Message-ID: I just received this bug report in psutil: https://code.google.com/p/psutil/issues/detail?id=408 Long story short, in psutil I have defined a constant type: http://code.activestate.com/recipes/577984-named-constant-type/ https://code.google.com/p/psutil/source/browse/psutil/_common.py#31 ...behaving along these lines: >>> FOO = constant(0, 'FOO') >>> FOO 0 >>> str(FOO) 'FOO' It's a nasty problem and I'm still not sure how to solve it in psutil but this got me thinking about PEP 435 (Adding an Enum type to the Python standard library) since the two things are kind of related. I haven't properly gone through the PEP yet (I will) but I notice it doesn't talk about serialization. Has it been considered? Regards, - Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ From mertz at gnosis.cx Thu Jul 18 20:42:51 2013 From: mertz at gnosis.cx (David Mertz) Date: Thu, 18 Jul 2013 14:42:51 -0400 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: OK... I have an update to my experiment. I asked by email another friend of mine (also middle-aged adult, well-educated, a doctorate in humanities FWIW [like me]), this one has had an introductory exposure to programming, and in Python specifically, but just one semester-long intro course (the wonderful edX MIT Intro to CS, a few months ago). This is what I sent her (I also attached my report of the physical experiment that I subjected our mutual friend to): Before you read the long post below, just consider this part up here and tell me what you think. Consider this Python code: list_of_lists = [ [4, 5, 6, 2, 1], [6, 12, 13, 19, 100], [100, 200, 300] ] result = sum(list_of_lists) You may not have used the built-in function sum before. But here is an example of its most common usage: >>> sum([10,20,30,40]) 100 So the question is: what is your intuition about what 'result' *should* be? If you happen to know that Python does one thing, but feel like it *should* do a different thing, that is very interesting too. One answer that you might feel is that it doesn't make sense and there should be an exception raised on that line. For example, this happens: >>> sum(['foo','bar']) Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'int' and 'str' Her reply: "I would want an exception raised cause it doesn't make sense." Yes, I know the code actually *does* raise an exception already; but I spoke with her to, and her answer isn't "it should raise an exception because the start= is missing" ... at a beginner level she explicitly means "It doesn't make sense to sum lists, only numbers." Actually, this makes me think even more strongly that an identity element for a type is necessary for sum() to make ANY sense. You need to have an implied identity as the starting state... yes, in principle we could let sum infer the list identity [], but for other types where that doesn't even exist, it seems even worse... even where it will work for technical reasons. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jul 18 20:24:26 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 18 Jul 2013 11:24:26 -0700 Subject: [Python-ideas] Enum and serialization In-Reply-To: References: Message-ID: <51E832DA.30703@stoneleaf.us> On 07/18/2013 11:13 AM, Giampaolo Rodola' wrote: > I just received this bug report in psutil: > https://code.google.com/p/psutil/issues/detail?id=408 > > Long story short, in psutil I have defined a constant type: > http://code.activestate.com/recipes/577984-named-constant-type/ > https://code.google.com/p/psutil/source/browse/psutil/_common.py#31 > ...behaving along these lines: > >>>> FOO = constant(0, 'FOO') >>>> FOO > 0 >>>> str(FOO) > 'FOO' > > It's a nasty problem and I'm still not sure how to solve it in psutil > but this got me thinking about PEP 435 (Adding an Enum type to the > Python standard library) since the two things are kind of related. > I haven't properly gone through the PEP yet (I will) but I notice it > doesn't talk about serialization. > Has it been considered? See http://bugs.python.org/issue18264. The basic issue is that json only knows how to serialize built-in types. As soon as we build on that, json barfs. One way around that is to write your own json handler that knows about your custom types. For the 3.4 stdlib there are two proposals on the table: 1) for IntEnum and FloatEnum cast the member to int or float, then procede; 2) for any Enum, extract the value and proceed. -- ~Ethan~ From g.rodola at gmail.com Fri Jul 19 01:54:07 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Fri, 19 Jul 2013 01:54:07 +0200 Subject: [Python-ideas] Enum and serialization In-Reply-To: <51E832DA.30703@stoneleaf.us> References: <51E832DA.30703@stoneleaf.us> Message-ID: On Thu, Jul 18, 2013 at 8:24 PM, Ethan Furman wrote: > See http://bugs.python.org/issue18264. Oh, OK. Glad to see this is being tracked. > The basic issue is that json only knows how to serialize built-in types. As > soon as we build on that, json barfs. > > One way around that is to write your own json handler that knows about your > custom types. > > For the 3.4 stdlib there are two proposals on the table: > > 1) for IntEnum and FloatEnum cast the member to int or float, then > procede; > > 2) for any Enum, extract the value and proceed. What about third party serialization libs though? --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ From ethan at stoneleaf.us Fri Jul 19 02:01:31 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 18 Jul 2013 17:01:31 -0700 Subject: [Python-ideas] Enum and serialization In-Reply-To: References: <51E832DA.30703@stoneleaf.us> Message-ID: <51E881DB.3070400@stoneleaf.us> On 07/18/2013 04:54 PM, Giampaolo Rodola' wrote: > On Thu, Jul 18, 2013 at 8:24 PM, Ethan Furman wrote: >> See http://bugs.python.org/issue18264. > > Oh, OK. Glad to see this is being tracked. > >> The basic issue is that json only knows how to serialize built-in types. As >> soon as we build on that, json barfs. >> >> One way around that is to write your own json handler that knows about your >> custom types. >> >> For the 3.4 stdlib there are two proposals on the table: >> >> 1) for IntEnum and FloatEnum cast the member to int or float, then >> procede; >> >> 2) for any Enum, extract the value and proceed. > > What about third party serialization libs though? They are in the same boat, and have the same options. Short of defining a new protocol, I don't think there's much we can do for other serialization libs, besides making it easy for them to do the same thing we are. -- ~Ethan~ From sergemp at mail.ru Fri Jul 19 03:50:56 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 19 Jul 2013 04:50:56 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <87mwpnro8r.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> <87mwpnro8r.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20130719045056.160b123f@sergey> On Jul 16, 2013 Stephen J. Turnbull wrote: >> I don't understand it. It makes no sense to me. > > Just accept that many people, *for several different reasons*, dislike > your proposal. The technical objections are the least of those > reasons. Sure, I already did. I just tried to understand, what part of my proposal they don't like and why, in that case I could fix that part, so that they liked id. If I don't understand ? I don't know what to fix. > Please just write the PEP and ask for a pronouncement. If you don't > feel confident in your PEP-writing skills, ask for help. (If you > don't get any, you probably should take that as "the community says > No".) Well, my current PEP-writing-skills are kind of zero, but the problem is that I don't know what to write there yet. PEP probably implies that I want to change something, but my simplest patch [1] covers most of use cases and changes nothing, except performance, and should need no PEP. The opposite to it is a unified container filling protocol. But it has lots of options, and still heavily discussed. >> Do you like having many broken tools? > > And please stop this. sum() is not broken, any more than a > screwdriver is broken just because it is rather inefficient when used > to pound in nails. No, that would be the case if I was using sum for something that it's not intended to do, e.g. instead of: x = 60*70 I could try: x = sum([60].__mul__(70)) That would work, but would be obviously inefficient, because sum() is not intended to do multiplication. `a + b + c + .. + z` is exactly what sum is supposed to do. But due to some implementation-specific details it does that slowly for some a...z types. It's not a technical problem, it's a political (or personal ideology?) problem to allow sum being fast for types that it was slow for last 10 years. It's more like a drill, that is good for metal and bad for wood, because it was equipped with metal drill bits, but not with wood drill bits. It could be just fine for wood too, but its developer believes, that his drill should not be used to make holes in wood. Would you call such a drill "broken"? I would. This bug was known for a long time, I just raised a question of what would be the best way to fix that. Should we just equip it with a few most common wood drill bits (e.g. lip&spur + spade)? Or should we put a grinding stone in, so that anyone could make a drill bit himself? Or maybe both? -- [1]http://bugs.python.org/file30897/fastsum-special-tuplesandlists.patch From sergemp at mail.ru Fri Jul 19 04:16:45 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 19 Jul 2013 05:16:45 +0300 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: References: <20130717180350.24565872@sergey> Message-ID: <20130719051645.5af5495e@sergey> On Jul 17, 2013 David Mertz: >> Imagine a type, that somehow modifies items that it stores, removes >> duplicates, or sorts them, or something else, e.g.: >> class aset(set): >> def __add__(self, other): >> return self|other >> >> Now we have a code: >> list_of_sets = [ aset(["item1","item2","item3"]) ] * 1000 >> [...] >> for i in sum(list_of_sets, aset()): >> deal_with(i) >> >> If you replace `sum` with `chain` you get something like: >> for i in chain.from_iterable(list_of_sets): >> deal_with(i) >> >> Which works! (that's the worst part) but produces WRONG result! > > In this example you can use: > > aset(chain(*list_of_sets)) > > This gives the same answer with the same big-O runtime. Sure, that's why I called it "error-prone" replacement. When you have a code like: >> for i in sum(list_of_sets, aset()): >> deal_with(i) You have pretty much no place for error. Well, it would be much better, if it was just: >> for i in sum(list_of_sets): >> deal_with(i) but for historical reasons we already have second parameter, so we have to deal with it. And now some newbie tries to use chain. So she does: >> for i in chain(list_of_sets): >> deal_with(i) oops, does not work. Ah, missing star (you miss it yourself!) >> for i in chain(*list_of_sets): >> deal_with(i) works, but incorrectly. Ok, let's hope that our newbie was careful enough with tests and noticed, that it does not do what it should. She reads the tutorial again, and notices that the example there was like: all_elems = list(chain(*list_of_lists)) So she tries: >> for i in list(chain(*list_of_sets)): >> deal_with(i) Nope, still wrong. Just in case she tries to remove a star, that she don't understand anyway: >> for i in list(chain(list_of_sets)): >> deal_with(i) Still no go. So after all these attempts she asks someone smart and finally gets the correct code: >> for i in aset(chain(list_of_sets)): >> deal_with(i) As I said, `chain` is a nice feature for smart people. But it is neither good for beginners, nor obvious, nor it's good as a sum replacement. > It's possible to come up with more perverse customizations where > this won't hold. But I think all of them involve redefining > __add__ as something with little relation to it's normal meaning. > Odd behavior in those cases is to be expected. Hah. Easy. Even for commonly used type ? for strings: str(chain(*list_of_strings)) it does not work. So we have: * chain(*list_of_something) may be correct or may be not * something(chain(*list_of_something)) may be correct or may be not And technically it's easy to have a type, where both of those cases are incorrect. From sergemp at mail.ru Fri Jul 19 04:36:08 2013 From: sergemp at mail.ru (Sergey) Date: Fri, 19 Jul 2013 05:36:08 +0300 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: References: <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> <20130716072237.GA31779@ando> <20130717172831.6f6d3ad2@sergey> Message-ID: <20130719053608.378fa3c5@sergey> On Jul 18, 2013 Terry Reedy wrote: >>>> [1] http://bugs.python.org/file30917/fasttuple.py >>> >>> I do not like that implementation, because it shares the underlying >>> storage. This means that tuples which ought to be small will grow and >>> grow and grow just because you have called __add__ on a different tuple. >>> >>> Using Python 2.7 and your implementation above: >>> >>> py> a = ft([]) # empty tuple >>> py> len(a._store) >>> 0 >>> py> b = ft([1]) >>> py> c = a + b >>> py> d = ft([2]*10000) >>> py> c = c + d >>> py> len(a._store) >>> 10001 >>> >>> So adding a big tuple to c changes the internal storage of a. >> >> Yes, that's right. All 3 variables `a`, `b` and `c` share the same >> storage, so you effectively get 3 variables for the price of one. :) >> That's the concept. Why is that bad? > > What happens to len(a._store) after del c? In that proof-of-concept implementation? Nothing. I tried to keep it simple, so that the idea was easier to understand. Its technically possible to have __del__ resizing internal store, but is it really needed? -- From stephen at xemacs.org Fri Jul 19 06:12:14 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 19 Jul 2013 13:12:14 +0900 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130719045056.160b123f@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> <87mwpnro8r.fsf@uwakimon.sk.tsukuba.ac.jp> <20130719045056.160b123f@sergey> Message-ID: <87sizbqjc1.fsf@uwakimon.sk.tsukuba.ac.jp> Sergey writes: > >> Do you like having many broken tools? > > > > And please stop this. sum() is not broken, any more than a > > screwdriver is broken just because it is rather inefficient when used > > to pound in nails. > > No, that would be the case if I was using sum for something that it's > not intended to do, [...] I'll say it one last time: this kind of answer does not help your case at all. I assure you "proof by repeated assertion" doesn't work here. The "intent" of sum() is clearly documented: it computes the sum of an iterable of numbers. From the library reference for 2.6 (it hasn't changed up to 3.4.0a, except to refer to itertools.chain() and remove the reference to reduce()): sum(iterable[, start]) Sums start and the items of an iterable from left to right and returns the total. start defaults to 0. The iterable?s items are normally numbers, and are not allowed to be strings. The fast, correct way to concatenate a sequence of strings is by calling ''.join(sequence). Note that sum(range(n), m) is equivalent to reduce(operator.add, range(n), m) To add floating point values with extended precision, see math.fsum(). True, it does not deny that sum() *could* be used for certain non- numbers, but *intent* is clear: sum() adds up a sequence of numbers. Generalizing it to efficiently handle concatenation of sequences is an enhancement, not a bugfix. Your argument is simply that we *could* use sum() for anything that provides the __add__ method, and with __iadd__ it can be efficient in time and space in many cases. You point to the fact that some programmers do use sum() inefficiently, and suggest that we remove this pitfall by making sum() efficient for as many cases as possible. Such enhancements are certainly of interest to python-dev, but that's not sufficient. The counterclaim that matters is that "sum" is a *bad name* for functions that aggregate iterables, unless the type of the elements is (or can be coerced to) numerical. It follows that use of that name makes programs hard to read, and it should be deprecated in favor of readable idioms.[1] Until you successfully address that counterclaim, you are going to fail to persuade enough of the people who matter. Footnotes: [1] Note that backward compatibility, not weakness of the "bad name" argument, is why we compromise by deprecating in words rather than making it impossible to use sum on "wrong" types. From steve at pearwood.info Fri Jul 19 06:07:25 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 19 Jul 2013 14:07:25 +1000 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> Message-ID: <51E8BB7D.30908@pearwood.info> On 19/07/13 04:42, David Mertz wrote: > OK... I have an update to my experiment. I asked by email another friend of > mine (also middle-aged adult, well-educated, a doctorate in humanities FWIW > [like me]), this one has had an introductory exposure to programming, and > in Python specifically, but just one semester-long intro course (the > wonderful edX MIT Intro to CS, a few months ago). While we're swapping anecdotes instead of doing proper UI testing *grin* (or perhaps API testing would be a better term), I asked two of my programmers at work what they would do if they had a list of lists and needed to concatenate them into a single list. Both are moderately familiar with Python, although one (call him R) is more familiar with Ruby and Objective-C. R started with some Ruby code, which was basically the equivalent of reduce, then when I told him that lists supported + for concatenation, said he'd try sum(). At this point, the other programmer (call him M) rolled his eyes and said "You've got to be kidding!". He did not know that lists use + for concatenation, nor that sum() supports lists, and thought that both were terrible ideas and that he'd use a for-loop with extend. When I hinted that sum() might have performance issues, R was at first incredulous, then once he understood the reason for the performance issues, shrugged and said something along the lines of "Oh well, I'd use sum(), and if it turned out to be too slow for my data, I'd refactor it to use a for-loop. No big deal." Neither thought that they would gain much if sum() was sped up, M because he wouldn't use it to join lists regardless of speed, and R because he thought it would be unlikely that he'd ever be in a position of needing to join sufficient numbers of lists to really make a difference, but if he was, refactoring was such a trivial exercise that he didn't care one way or another. -- Steven From ncoghlan at gmail.com Fri Jul 19 06:59:29 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Jul 2013 14:59:29 +1000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 19 July 2013 00:29, Ron Adam wrote: > I was thinking a generator would be more efficient if it's called many > times. But I think this is easier to understand. If there is more interest > we can test both to see how much of a difference it makes. Suspending and resuming a generator is quite an expensive operation. send() has the triple whammy of method call + resume generator + suspend generator, so it's unlikely to outperform a simple method call (even one that redirects to another method). Independent of performance though, I think the mutable sequence API inspired append(), extend() and += are a better API for what you're trying to achieve than "send", so it doesn't make sense to me to try to shoehorn this into the generator API. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jul 19 07:11:13 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Jul 2013 15:11:13 +1000 Subject: [Python-ideas] Enum and serialization In-Reply-To: <51E832DA.30703@stoneleaf.us> References: <51E832DA.30703@stoneleaf.us> Message-ID: On 19 July 2013 04:24, Ethan Furman wrote: > On 07/18/2013 11:13 AM, Giampaolo Rodola' wrote: >> >> I just received this bug report in psutil: >> https://code.google.com/p/psutil/issues/detail?id=408 >> >> Long story short, in psutil I have defined a constant type: >> http://code.activestate.com/recipes/577984-named-constant-type/ >> https://code.google.com/p/psutil/source/browse/psutil/_common.py#31 >> ...behaving along these lines: >> >>>>> FOO = constant(0, 'FOO') >>>>> FOO >> >> 0 >>>>> >>>>> str(FOO) >> >> 'FOO' >> >> It's a nasty problem and I'm still not sure how to solve it in psutil >> but this got me thinking about PEP 435 (Adding an Enum type to the >> Python standard library) since the two things are kind of related. >> I haven't properly gone through the PEP yet (I will) but I notice it >> doesn't talk about serialization. >> Has it been considered? > > > See http://bugs.python.org/issue18264. > > The basic issue is that json only knows how to serialize built-in types. As > soon as we build on that, json barfs. > > One way around that is to write your own json handler that knows about your > custom types. > > For the 3.4 stdlib there are two proposals on the table: > > 1) for IntEnum and FloatEnum cast the member to int or float, then > procede; > > 2) for any Enum, extract the value and proceed. 3) Only change repr (not str) in IntEnum I know Guido doesn't like it, but I still think the backwards compatibility risk is too high to use them to replace constants in the standard library if they change the output of __str__. Debugging only needs repr, and we can make sure stdlib error messages use repr, too. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From gotoalanlu at gmail.com Fri Jul 19 08:19:25 2013 From: gotoalanlu at gmail.com (Hua Lu) Date: Fri, 19 Jul 2013 01:19:25 -0500 Subject: [Python-ideas] A limited exec Message-ID: Hi, I've attempted to make exec/eval a bit safer. May I please have some feedback? https://github.com/cag/execgate Thanks, Alan -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Fri Jul 19 09:08:08 2013 From: masklinn at masklinn.net (Masklinn) Date: Fri, 19 Jul 2013 07:08:08 +0000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: <6D232852-01C3-4EFF-A21C-B594F5872CE3@masklinn.net> On 19 juil. 2013, at 04:59, Nick Coghlan wrote: > On 19 July 2013 00:29, Ron Adam wrote: >> I was thinking a generator would be more efficient if it's called many >> times. But I think this is easier to understand. If there is more interest >> we can test both to see how much of a difference it makes. > > Suspending and resuming a generator is quite an expensive operation. > send() has the triple whammy of method call + resume generator + > suspend generator, so it's unlikely to outperform a simple method call > (even one that redirects to another method). > > Independent of performance though, I think the mutable sequence API > inspired append(), extend() and += are a better API for what you're > trying to achieve than "send", so it doesn't make sense to me to try > to shoehorn this into the generator API. There's an issue with append though, it kind-of implies collection ordering, which I expect is why sets use "add" instead. From _ at lvh.io Fri Jul 19 09:25:38 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 19 Jul 2013 09:25:38 +0200 Subject: [Python-ideas] A limited exec In-Reply-To: References: Message-ID: Hi Alan, I've pretty much broken it, just translating it to 3.x. Please document that it's 3.x only, that's why it's taking longer than a few minutes. func_globals is named differently, chr doesn't exist anymore... Anyway, this approach doesn't work well: if you want secure execution, please look at PyPy's sandbox mode :) cheers lvh On Fri, Jul 19, 2013 at 8:19 AM, Hua Lu wrote: > Hi, I've attempted to make exec/eval a bit safer. May I please have some > feedback? > > https://github.com/cag/execgate > > Thanks, > Alan > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gotoalanlu at gmail.com Fri Jul 19 09:43:05 2013 From: gotoalanlu at gmail.com (Hua Lu) Date: Fri, 19 Jul 2013 02:43:05 -0500 Subject: [Python-ideas] A limited exec In-Reply-To: References: Message-ID: Hey Laurens, Thanks for the feedback. I am still waiting for NumPy in PyPy mostly. I know this blacklist approach is a losing battle, but for the time being, maybe it could be of value? I am aware that f_globals could break things. However, barring 'private' attribute access, I am wondering if it is reachable. I used the code from http://www.reddit.com/r/Python/comments/hftnp/ask_rpython_recovering_cleared_globals/c1v3l4ias a test. Can you give me a code snippet which breaks this? Sincerely, Alan On Fri, Jul 19, 2013 at 2:25 AM, Laurens Van Houtven <_ at lvh.io> wrote: > Hi Alan, > > > I've pretty much broken it, just translating it to 3.x. > > Please document that it's 3.x only, that's why it's taking longer than a > few minutes. func_globals is named differently, chr doesn't exist anymore... > > Anyway, this approach doesn't work well: if you want secure execution, > please look at PyPy's sandbox mode :) > > cheers > lvh > > > On Fri, Jul 19, 2013 at 8:19 AM, Hua Lu wrote: > >> Hi, I've attempted to make exec/eval a bit safer. May I please have some >> feedback? >> >> https://github.com/cag/execgate >> >> Thanks, >> Alan >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.io Fri Jul 19 10:19:00 2013 From: _ at lvh.io (Laurens Van Houtven) Date: Fri, 19 Jul 2013 10:19:00 +0200 Subject: [Python-ideas] A limited exec In-Reply-To: References: Message-ID: On Fri, Jul 19, 2013 at 9:43 AM, Hua Lu wrote: > Hey Laurens, > > Thanks for the feedback. I am still waiting for NumPy in PyPy mostly. I > know this blacklist approach is a losing battle, but for the time being, > maybe it could be of value? > Maybe :) I just wouldn't put any money on it just because I haven't figured out how to break it yet, since, historically, like you implied, every time we've tried it it didn't actually work :) Do you need NumPy + PyPy for you to be able to switch *entirely* to PyPy, or do you actually need it for the things that would do safe eval? A different approach that someone in #python has used with some success is running the interpreter in Qemu with an immutable filesystem and COW Qemu image. He'd then communicate with it over (I believe) a fake serial port. > I am aware that f_globals could break things. However, barring 'private' > attribute access, I am wondering if it is reachable. I used the code from > http://www.reddit.com/r/Python/comments/hftnp/ask_rpython_recovering_cleared_globals/c1v3l4ias a test. > As above: I haven't figured out how yet, since I can't get getattr and accessing __globals__ *seemingly* needs two underscores. > Can you give me a code snippet which breaks this? > I don't have time to finish it right now, but here's the WIP from some "working" 2.x code that I've tried to port to 3.x. ===== dunder = "\x5f\x5f" globals = getattr((lambda: None), dunder + "globals" + dunder) builtins = globals[dunder + "builtins" + dunder] dunder_import = getattr(builtins, dunder + "import" + dunder) new = dunder_import("new") kaboom_code = new.code(0, 5, 8, 0, "KABOOM", (), (),(), "", "", 0, "") kaboom_fun = new.function(kaboom_code, {}) kaboom_fun() ===== As you can see, the eventual thing really can be a single expression, it's just expanded into a bunch of assignments since that makes it easier to read. There's two remaining issues for it to be ported to Py3k: 1. Figure out how to access __globals__, which is the new spelling of func_globals 2. Replace the "new" module with the "types" module I'm also not sure if "KABOOM" maps to a sequence of opcodes in 3.x that will actually go KABOOM ;-) In 2.x, it should map to a bunch of binary opcodes, which, on an empty stack, will obviously segfault :-) My other approach was to just base64 the entire thing and exec the unbase64-d version, but that doesn't work in 3.x either since exec is a function now instead of a statement, so clearing globals is actually an effective way to get rid of it... So, in closing: for now I haven't quite figured out how to break it, but that's with spending like 10 minutes on it, and I'm not particularly familiar with 3.x... It appears to work, and it certainly will foil most attempts to get past it, but it's a very thin spit-and-sticks kind of "works", so I wouldn't trust it too much ;-) cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua at landau.ws Fri Jul 19 11:05:51 2013 From: joshua at landau.ws (Joshua Landau) Date: Fri, 19 Jul 2013 10:05:51 +0100 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: <51E8BB7D.30908@pearwood.info> References: <20130714222606.0f61f16e@sergey> <51E8BB7D.30908@pearwood.info> Message-ID: On 19 July 2013 05:07, Steven D'Aprano wrote: > On 19/07/13 04:42, David Mertz wrote: >> >> OK... I have an update to my experiment. I asked by email another friend >> of >> mine (also middle-aged adult, well-educated, a doctorate in humanities >> FWIW >> [like me]), this one has had an introductory exposure to programming, and >> in Python specifically, but just one semester-long intro course (the >> wonderful edX MIT Intro to CS, a few months ago). > > > While we're swapping anecdotes instead of doing proper UI testing *grin* (or > perhaps API testing would be a better term), I asked two of my programmers > at work what they would do if they had a list of lists and needed to > concatenate them into a single list. Both are moderately familiar with > Python, although one (call him R) is more familiar with Ruby and > Objective-C. Thank you both, again. One conclusion I think it's safe to take from this is that it is *not* a clear-cut issue as many people, myself included, had assumed. I think it's fair to say that people claiming (Stefan Behnel in this case, because it was the first quotation I found): > IMHO, the reason why sum() supports other input types than numbers is that > in Python 2, at the time when it was implemented, there was no clear > definition of a "number" or similar have missed this point. My claims that it obviously does make sense (although it's not necessarily a good idea) was equally wrong in this regard. Also, > Neither thought that they would gain much if sum() was sped up, M > because he wouldn't use it to join lists regardless of speed, and R > because he thought it would be unlikely that he'd ever be in a position > of needing to join sufficient numbers of lists to really make a difference, > but if he was, refactoring was such a trivial exercise that he didn't care > one way or another. I agree in large with R, except that I'd never opt for sum() over chain.from_iterable(). From guido at python.org Fri Jul 19 16:43:21 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Jul 2013 07:43:21 -0700 Subject: [Python-ideas] Enum and serialization In-Reply-To: References: <51E832DA.30703@stoneleaf.us> Message-ID: So you want to make print() output '42' instead of 'THE_ANSWER'? I am strongly against that change that. On Thu, Jul 18, 2013 at 10:11 PM, Nick Coghlan wrote: > On 19 July 2013 04:24, Ethan Furman wrote: >> On 07/18/2013 11:13 AM, Giampaolo Rodola' wrote: >>> >>> I just received this bug report in psutil: >>> https://code.google.com/p/psutil/issues/detail?id=408 >>> >>> Long story short, in psutil I have defined a constant type: >>> http://code.activestate.com/recipes/577984-named-constant-type/ >>> https://code.google.com/p/psutil/source/browse/psutil/_common.py#31 >>> ...behaving along these lines: >>> >>>>>> FOO = constant(0, 'FOO') >>>>>> FOO >>> >>> 0 >>>>>> >>>>>> str(FOO) >>> >>> 'FOO' >>> >>> It's a nasty problem and I'm still not sure how to solve it in psutil >>> but this got me thinking about PEP 435 (Adding an Enum type to the >>> Python standard library) since the two things are kind of related. >>> I haven't properly gone through the PEP yet (I will) but I notice it >>> doesn't talk about serialization. >>> Has it been considered? >> >> >> See http://bugs.python.org/issue18264. >> >> The basic issue is that json only knows how to serialize built-in types. As >> soon as we build on that, json barfs. >> >> One way around that is to write your own json handler that knows about your >> custom types. >> >> For the 3.4 stdlib there are two proposals on the table: >> >> 1) for IntEnum and FloatEnum cast the member to int or float, then >> procede; >> >> 2) for any Enum, extract the value and proceed. > > 3) Only change repr (not str) in IntEnum > > I know Guido doesn't like it, but I still think the backwards > compatibility risk is too high to use them to replace constants in the > standard library if they change the output of __str__. Debugging only > needs repr, and we can make sure stdlib error messages use repr, too. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From barry at python.org Fri Jul 19 15:24:35 2013 From: barry at python.org (Barry Warsaw) Date: Fri, 19 Jul 2013 09:24:35 -0400 Subject: [Python-ideas] Enum and serialization References: <51E832DA.30703@stoneleaf.us> <51E881DB.3070400@stoneleaf.us> Message-ID: <20130719092435.44aef28f@anarchist> On Jul 18, 2013, at 05:01 PM, Ethan Furman wrote: >> What about third party serialization libs though? > >They are in the same boat, and have the same options. Third party serializations (and built-in JSON serialization) already has to be taught about several common built-in data types, e.g. datetimes and timedeltas. If it weren't for the rather icky json module API for extensions, it's not that hard. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Fri Jul 19 17:30:50 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Jul 2013 08:30:50 -0700 Subject: [Python-ideas] Enum and serialization In-Reply-To: <20130719092435.44aef28f@anarchist> References: <51E832DA.30703@stoneleaf.us> <51E881DB.3070400@stoneleaf.us> <20130719092435.44aef28f@anarchist> Message-ID: Could we start using @singledispatch? --Guido van Rossum (sent from Android phone) On Jul 19, 2013 8:27 AM, "Barry Warsaw" wrote: > On Jul 18, 2013, at 05:01 PM, Ethan Furman wrote: > > >> What about third party serialization libs though? > > > >They are in the same boat, and have the same options. > > Third party serializations (and built-in JSON serialization) already has > to be > taught about several common built-in data types, e.g. datetimes and > timedeltas. If it weren't for the rather icky json module API for > extensions, > it's not that hard. > > -Barry > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Fri Jul 19 17:58:04 2013 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 19 Jul 2013 10:58:04 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On Thu, Jul 18, 2013 at 11:59 PM, Nick Coghlan wrote: > > On 19 July 2013 00:29, Ron Adam wrote: > > I was thinking a generator would be more efficient if it's called many > > times. But I think this is easier to understand. If there is more interest > > we can test both to see how much of a difference it makes. > > Suspending and resuming a generator is quite an expensive operation. > send() has the triple whammy of method call + resume generator + > suspend generator, so it's unlikely to outperform a simple method call > (even one that redirects to another method). I thought one of the things about generators was that they didn't have to create a new frame, or store attributes. And those features would make them a bit faster than a function call. Hmmm, but that is wrapped in a method call. I did some tests with send(), and it is about 3% slower than a function call. Some of that difference is due to the loop in the generator. So it's pretty close. Could we have syntax for generators to bypass the method calls? x = gen[] # next gen[] = x # send; or "gen[] x" x = gen[]; gen[] = x # or "x = gen[] x" Currently using empty brackets like this is a syntax error. The brackets imply it is a type of iterator. Which is the most common use for generators. > Independent of performance though, I think the mutable sequence API > inspired append(), extend() and += are a better API for what you're > trying to achieve than "send", so it doesn't make sense to me to try > to shoehorn this into the generator API. I'm 50/50 on this right now. So I want to test both out in different situations. Keeping the behavior as dead simple as possible may be better because then it creates an object that would be used in a very consistent way. I think that is one of the benefits of the simple iterator design. Ron -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Jul 19 16:01:31 2013 From: barry at python.org (Barry Warsaw) Date: Fri, 19 Jul 2013 10:01:31 -0400 Subject: [Python-ideas] Enum and serialization References: <51E832DA.30703@stoneleaf.us> <51E881DB.3070400@stoneleaf.us> <20130719092435.44aef28f@anarchist> Message-ID: <20130719100131.28bf7ba0@anarchist> On Jul 19, 2013, at 08:30 AM, Guido van Rossum wrote: >Could we start using @singledispatch? Yeah, that's a really interesting idea. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ethan at stoneleaf.us Fri Jul 19 18:07:19 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 19 Jul 2013 09:07:19 -0700 Subject: [Python-ideas] Enum and serialization In-Reply-To: References: <51E832DA.30703@stoneleaf.us> Message-ID: <51E96437.4060605@stoneleaf.us> On 07/18/2013 10:11 PM, Nick Coghlan wrote: > On 19 July 2013 04:24, Ethan Furman wrote: >> >> For the 3.4 stdlib there are two proposals on the table: >> >> 1) for IntEnum and FloatEnum cast the member to int or float, then >> procede; >> >> 2) for any Enum, extract the value and proceed. > > 3) Only change repr (not str) in IntEnum > > I know Guido doesn't like it, but I still think the backwards > compatibility risk is too high to use them to replace constants in the > standard library if they change the output of __str__. Debugging only > needs repr, and we can make sure stdlib error messages use repr, too. The issue with that solution is json uses repr, not str, for floats. So we would still have a problem. -- ~Ethan~ From stefan_ml at behnel.de Fri Jul 19 19:06:15 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 19 Jul 2013 19:06:15 +0200 Subject: [Python-ideas] Intermediate Summary: Fast sum() for non-numbers In-Reply-To: References: <20130714222606.0f61f16e@sergey> <51E8BB7D.30908@pearwood.info> Message-ID: Joshua Landau, 19.07.2013 11:05: > One conclusion I think it's safe to take from this is that it is *not* > a clear-cut issue as many people, myself included, had assumed. I > think it's fair to say that people claiming (Stefan Behnel in this > case, because it was the first quotation I found): > >> IMHO, the reason why sum() supports other input types than numbers is that >> in Python 2, at the time when it was implemented, there was no clear >> definition of a "number" > > or similar have missed this point. Not sure what point you mean exactly here, but as I said before, using sum() on lists of lists seems like one of those cool little hacks at first, but actually isn't. It's really just a bad idea. Even if you personally don't consider it weird that lists can be summed up, it definitely violates the "don't make me think" principle for many people. Rephrasing the quote above, I actually consider sum(lists) an implementation artefact. Definitely not a feature, and most likely not even deliberate. Stefan From steve at pearwood.info Fri Jul 19 19:32:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 20 Jul 2013 03:32:15 +1000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: <51E9781F.60007@pearwood.info> On 20/07/13 01:58, Ron Adam wrote: > Could we have syntax for generators to bypass the method calls? > > x = gen[] # next Please no. What's so special about generators that they should have magic syntax for bypassing methods? Good uses for syntax include doing things that you can't otherwise do, not as a mere alias for a method. Operators are a special case, because we want to write "1 + 1" not "(1).add(1)". But gen[] is less readable than next(gen). Python is nice, clean and simple because it eschews vast numbers of special cases. Only a few of the most common, important features are syntax. There are more interesting ways to optimize code than adding magic syntax (e.g. PyPy, Mypy), and I doubt very much that the overhead of calling generator methods will be the bottleneck in many non-toy applications. If you're actually doing something useful with the generator data, chances are that doing that will far outweigh calling the method. For those who really care about optimizing such calls in tight loops, there is the same option available as for any other method: nxt = gen.__next__ for thing in tight_loop: x = nxt() > Currently using empty brackets like this is a syntax error. The brackets > imply it is a type of iterator. Which is the most common use for > generators. How do the brackets imply it is a type of iterator? To me, square brackets imply it is a list (like list displays, and list comprehensions), or sequence/mapping __getitem__. [...] > I think that is one of the benefits of the simple iterator design. We already have a very simple iterator design. Making it more complex makes it less simple. -- Steven From ethan at stoneleaf.us Fri Jul 19 21:34:31 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 19 Jul 2013 12:34:31 -0700 Subject: [Python-ideas] Fwd: Allow Enum members to refer to each other during execution of body In-Reply-To: References: <51DB5573.5070004@stoneleaf.us> <6fc3f1ea-e643-40c4-aa2c-6e0d42bd7b6e@googlegroups.com> <51DE0FD8.4050301@stoneleaf.us> <334b5e00-2f0b-4231-9b86-1e82105c5e28@googlegroups.com> <51DF2AF5.6050804@stoneleaf.us> Message-ID: <51E994C7.4040703@stoneleaf.us> On 07/14/2013 02:36 PM, Antony Lee wrote: > Is there any specific reason why you do not wish to change the behavior of Enum to this one (which does seem more > logical to me)? The patch is fairly simple in its logic (compared to the rest of the implementation, at least...), > and I could even change it to remove the requirement of defining __new__ before the members as long as there are no > references to other members (because as long as there are no references to other members, I obviously don't need to > actually create the members), thus making it fully compatible with the current version. My apologies for the delay in replying. Getting Enum into the stdlib was a very careful balancing act: - Make it powerful enough to meet most needs as-is - Make it extensible enough that custom enumerations could be easily implemented - Make it simple enough to not create a large cognitive burden How this relates to your patch: 1) With your patch, referencing another enum member either returns the member itself (pure Enum), or the value of the Enum (mixed Enum) -- which means two different behaviors from the same syntax. 2) The patch fails with the pure Enum with auto-numbering test case. It fails because __new__ is looking at the __member__ data structure which is empty for the duration of __prepare__. While work arounds are possible, they would not be simpler, or even as simple. Summary: The resulting behavior is inconsistent, and the complexity added to the code, but mostly to the mind, is much greater than the minor benefit. -- ~Ethna From ethan at stoneleaf.us Fri Jul 19 23:32:35 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 19 Jul 2013 14:32:35 -0700 Subject: [Python-ideas] Enum and serialization In-Reply-To: <20130719100131.28bf7ba0@anarchist> References: <51E832DA.30703@stoneleaf.us> <51E881DB.3070400@stoneleaf.us> <20130719092435.44aef28f@anarchist> <20130719100131.28bf7ba0@anarchist> Message-ID: <51E9B073.7040406@stoneleaf.us> On 07/19/2013 07:01 AM, Barry Warsaw wrote: > On Jul 19, 2013, at 08:30 AM, Guido van Rossum wrote: > >> Could we start using @singledispatch? > > Yeah, that's a really interesting idea. Possibly a (very) stupid question, but how would @singledispatch work with the _json accelerator module? -- ~Ethan~ From victor.stinner at gmail.com Sat Jul 20 01:55:12 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Sat, 20 Jul 2013 01:55:12 +0200 Subject: [Python-ideas] Fast sum() for non-numbers In-Reply-To: <20130719045056.160b123f@sergey> References: <20130702211209.6dbde663@sergey> <51D41C82.2040301@pearwood.info> <51D46647.3080100@pearwood.info> <20130704125419.6230332d@sergey> <1372976712.76395.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130705094341.7c1c84de@sergey> <1373020478.44137.YahooMailNeo@web184703.mail.ne1.yahoo.com> <20130708232234.72de4688@sergey> <1373330756.74168.YahooMailNeo@web184704.mail.ne1.yahoo.com> <20130709164235.7fe21a7d@sergey> <20130712043419.1f5c59e5@sergey> <20130716083605.16da9f9f@sergey> <87mwpnro8r.fsf@uwakimon.sk.tsukuba.ac.jp> <20130719045056.160b123f@sergey> Message-ID: 2013/7/19 Sergey : > Well, my current PEP-writing-skills are kind of zero, but the problem > is that I don't know what to write there yet. To write a PEP, I write 4 main sections: - Abstract - Rationale: why do you consider that something must be changed - Proposal: describe your propositon - Alternatives: list most alternatives proposed on python-ideas You may list advantages and drawbacks of each alternative. This is just a template to write a first draft ;-) I read only a few emails of the sum() threads, but I remember the following options: - only sum numbers: reject all other types - "if start is not None: x = start else x = first(items); for item in items: x += item; return x" your proposal - a new protocol (__concat__? I don't remember its name) with a fallback on "x=start; for item in items: x += item" or to your proposal - implement __iadd__ for immutable types (tuple += tuple) or something like that - etc. > PEP probably implies > that I want to change something, but my simplest patch [1] covers > most of use cases and changes nothing, except performance, and > should need no PEP. You don't need an implementation to write a PEP. In your case, you have an implementation. It's better to wait for a consensus on the PEP before update the implementation to the last PEP, or you may waste your time. Victor From ncoghlan at gmail.com Sat Jul 20 08:38:00 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Jul 2013 16:38:00 +1000 Subject: [Python-ideas] Enum and serialization In-Reply-To: <51E96437.4060605@stoneleaf.us> References: <51E832DA.30703@stoneleaf.us> <51E96437.4060605@stoneleaf.us> Message-ID: On 20 July 2013 02:07, Ethan Furman wrote: > On 07/18/2013 10:11 PM, Nick Coghlan wrote: >> >> On 19 July 2013 04:24, Ethan Furman wrote: >>> >>> >>> For the 3.4 stdlib there are two proposals on the table: >>> >>> 1) for IntEnum and FloatEnum cast the member to int or float, then >>> procede; >>> >>> 2) for any Enum, extract the value and proceed. >> >> >> 3) Only change repr (not str) in IntEnum >> >> I know Guido doesn't like it, but I still think the backwards >> compatibility risk is too high to use them to replace constants in the >> standard library if they change the output of __str__. Debugging only >> needs repr, and we can make sure stdlib error messages use repr, too. > > > The issue with that solution is json uses repr, not str, for floats. So we > would still have a problem. Gah, I forgot about that (and I think it came up on the issue tracker, too). Perhaps we should include a "Converting existing constants to Enums" section in the enum docs, noting some of the backwards compatibility implications for serialisation when using IntEnum or similar subtypes? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ron3200 at gmail.com Sat Jul 20 23:17:26 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 20 Jul 2013 16:17:26 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: <51E9781F.60007@pearwood.info> References: <51E752A7.1060906@pearwood.info> <51E9781F.60007@pearwood.info> Message-ID: On 07/19/2013 12:32 PM, Steven D'Aprano wrote: > On 20/07/13 01:58, Ron Adam wrote: > >> Could we have syntax for generators to bypass the method calls? >> >> x = gen[] # next > > Please no. > > What's so special about generators that they should have magic syntax for > bypassing methods? Generators are quite special. Because they suspend and resume, and the values that are passed on each yield are limited to a single object in followed by a single object out. > Good uses for syntax include doing things that you can't > otherwise do, not as a mere alias for a method. This is what I was trying to convey. Inside a generator, a yield expression is the following byte code. "x = yield y" 0 LOAD_FAST 0 (y) 3 YIELD_VALUE 4 STORE_FAST 1 (x) And it's external counter point is... "y = g.send(x)" 0 LOAD_GLOBAL 0 (g) 3 LOAD_ATTR 1 (send) 6 LOAD_FAST 0 (x) 9 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 12 STORE_FAST 1 (y) My question is can we make the above to be this? 0 LOAD_GLOBAL 0 (g) 6 LOAD_FAST 0 (x) 9 RESUME_YIELD 12 STORE_FAST 1 (y) It avoids the attribute lookup, and passing the values through the function call and signature parsing code in ceval.c. Cheers, Ron From steve at pearwood.info Sun Jul 21 02:47:51 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 21 Jul 2013 10:47:51 +1000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> <51E9781F.60007@pearwood.info> Message-ID: <51EB2FB7.2030301@pearwood.info> On 21/07/13 07:17, Ron Adam wrote: > > > On 07/19/2013 12:32 PM, Steven D'Aprano wrote: >> On 20/07/13 01:58, Ron Adam wrote: >> >>> Could we have syntax for generators to bypass the method calls? >>> >>> x = gen[] # next >> >> Please no. >> >> What's so special about generators that they should have magic syntax for >> bypassing methods? > > Generators are quite special. Because they suspend and resume, and the values that are passed on each yield are limited to a single object in followed by a single object out. I know what generators do. I asked, what is so special that they need *syntax for bypassing methods*. That's the part that you didn't answer. Your syntax suggestion doesn't change either the fact that they suspend and resume, or that the values passed are limited to a single object. We already have an idiom for passing multiple objects at a time: the tuple. >> Good uses for syntax include doing things that you can't >> otherwise do, not as a mere alias for a method. > > This is what I was trying to convey. But your proposal is exactly that, a mere alias. It doesn't add any new functionality. It doesn't let you do anything that can't already be done. That's my point. Instead of things that read like Python code and that you can easily look up in the docs: next(gen) gen.send(x) you have these mysterious syntax, one of which looks like a key/item lookup missing an argument, and one of which just looks like a key/item lookup: gen[] gen[x] I haven't even mentioned that this proposal can't fly because the Python compiler cannot tell ahead of time which code is intended. You could get around that by changing the syntax: gen!! gen!x! I'm questioning the need for this to be syntax in the first place. -- Steven From ron3200 at gmail.com Sun Jul 21 06:04:40 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sat, 20 Jul 2013 23:04:40 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: <51EB2FB7.2030301@pearwood.info> References: <51E752A7.1060906@pearwood.info> <51E9781F.60007@pearwood.info> <51EB2FB7.2030301@pearwood.info> Message-ID: On 07/20/2013 07:47 PM, Steven D'Aprano wrote: > On 21/07/13 07:17, Ron Adam wrote: >> >> >> On 07/19/2013 12:32 PM, Steven D'Aprano wrote: >>> On 20/07/13 01:58, Ron Adam wrote: >>> >>>> Could we have syntax for generators to bypass the method calls? >>>> >>>> x = gen[] # next >>> >>> Please no. >>> >>> What's so special about generators that they should have magic syntax for >>> bypassing methods? >> >> Generators are quite special. Because they suspend and resume, and the >> values that are passed on each yield are limited to a single object in >> followed by a single object out. > > I know what generators do. I asked, what is so special that they need > *syntax for bypassing methods*. That's the part that you didn't answer. > Your syntax suggestion doesn't change either the fact that they suspend and > resume, or that the values passed are limited to a single object. We > already have an idiom for passing multiple objects at a time: the tuple. Unfortanantly I don't think we can generate the specific bytecode change I suggested without also adding new syntax. (Does that help?) >>> Good uses for syntax include doing things that you can't >>> otherwise do, not as a mere alias for a method. >> >> This is what I was trying to convey. > > But your proposal is exactly that, a mere alias. It doesn't add any new > functionality. It doesn't let you do anything that can't already be done. > That's my point. Instead of things that read like Python code and that you > can easily look up in the docs: Correct it doesn't add functionality, it adds a way to accomplish that same functionality in a more efficient way. > next(gen) > gen.send(x) > > you have these mysterious syntax, one of which looks like a key/item lookup > missing an argument, and one of which just looks like a key/item lookup: > > gen[] > gen[x] You seem to be stuck on this point. The exact syntax isn't important. You are clearly -1 on this particular spelling. That's fine. This was just a side comment in this thread. The example syntax wasn't important. If there was some interest in the underlying idea of producing more efficient byte code for generator next and send calls, then we can start another thread about it. So far there isn't any. Maybe I'll try to implement it sometime and see how much difference it makes. If it's more than a few percent, I'll come back here with the results. It may not be anytime soon though. I need to refresh my memory on how to add new grammar and syntax. > I haven't even mentioned that this proposal can't fly because the Python > compiler cannot tell ahead of time which code is intended. You could get > around that by changing the syntax: Well, you just did... ;-) > gen!! > gen!x! > I'm questioning the need for this to be syntax in the first place. It's what the syntax represents that I would like. A bit faster generator suspend, resume, and value passing. If it can be done without new syntax, That's even better. ;-) The thing that got me on this is, if generators aren't faster than a class with method calls. Then why do we generators? Cheers, Ron From tjreedy at udel.edu Sun Jul 21 07:10:51 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 21 Jul 2013 01:10:51 -0400 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> <51E9781F.60007@pearwood.info> <51EB2FB7.2030301@pearwood.info> Message-ID: On 7/21/2013 12:04 AM, Ron Adam wrote: > The thing that got me on this is, if generators aren't faster than a > class with method calls. For the following generator function, iterator class pair, this is a radically wrong premise. (3.3.2) def countgen(max): n = 0 while n < max: yield n n += 1 class countit: def __iter__(self): return self def __init__(self, max): self.max = max self.n = 0 def __next__(self): n = self.n if n < self.max: self.n = n+1 return n else: raise StopIteration print(list(countgen(100)) == list(countit(100))) import timeit print(timeit.repeat('for i in it: pass', "from __main__ import countgen; it = countgen(100)")) print(timeit.repeat('for i in it: pass', "from __main__ import countit; it = countit(100)")) >>> True [0.055593401451847824, 0.047402394989421934, 0.04742083974032252] [0.8787592707929215, 0.8775440691210892, 0.8786535208877584] As the complexity of of the code in the loop or next method increases, the relative difference will go down, but should not disappear. > Then why do we [use] generators? In addition, the generator function is *much* easier to write than the equivalent iterator class because 1) it omits several lines of boilerplate (including attribute to local name and back) and 2) the loop body can yield the value exactly when it is available while the next method must end with the return. I am not familiar with sending stuff to a 'generator' and how that affects timings. This is outside of the iterator protocol that generators were originally designed for. -- Terry Jan Reedy From mertz at gnosis.cx Sun Jul 21 07:32:22 2013 From: mertz at gnosis.cx (David Mertz) Date: Sun, 21 Jul 2013 01:32:22 -0400 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> <51E9781F.60007@pearwood.info> <51EB2FB7.2030301@pearwood.info> Message-ID: There really shouldn't be any need for new syntax to rest this idea. A keyhole optimizer should be able to make this improvement to bytecode using the existing method call syntax, I believe. On Jul 21, 2013 12:05 AM, "Ron Adam" wrote: > > > On 07/20/2013 07:47 PM, Steven D'Aprano wrote: > >> On 21/07/13 07:17, Ron Adam wrote: >> >>> >>> >>> On 07/19/2013 12:32 PM, Steven D'Aprano wrote: >>> >>>> On 20/07/13 01:58, Ron Adam wrote: >>>> >>>> Could we have syntax for generators to bypass the method calls? >>>>> >>>>> x = gen[] # next >>>>> >>>> >>>> Please no. >>>> >>>> What's so special about generators that they should have magic syntax >>>> for >>>> bypassing methods? >>>> >>> >>> Generators are quite special. Because they suspend and resume, and the >>> values that are passed on each yield are limited to a single object in >>> followed by a single object out. >>> >> >> I know what generators do. I asked, what is so special that they need >> *syntax for bypassing methods*. That's the part that you didn't answer. >> Your syntax suggestion doesn't change either the fact that they suspend >> and >> resume, or that the values passed are limited to a single object. We >> already have an idiom for passing multiple objects at a time: the tuple. >> > > Unfortanantly I don't think we can generate the specific bytecode change I > suggested without also adding new syntax. (Does that help?) > > > Good uses for syntax include doing things that you can't >>>> otherwise do, not as a mere alias for a method. >>>> >>> >>> This is what I was trying to convey. >>> >> >> But your proposal is exactly that, a mere alias. It doesn't add any new >> functionality. It doesn't let you do anything that can't already be done. >> That's my point. Instead of things that read like Python code and that you >> can easily look up in the docs: >> > > Correct it doesn't add functionality, it adds a way to accomplish that > same functionality in a more efficient way. > > next(gen) >> gen.send(x) >> >> you have these mysterious syntax, one of which looks like a key/item >> lookup >> missing an argument, and one of which just looks like a key/item lookup: >> >> gen[] >> gen[x] >> > > You seem to be stuck on this point. The exact syntax isn't important. > You are clearly -1 on this particular spelling. That's fine. > > This was just a side comment in this thread. The example syntax wasn't > important. If there was some interest in the underlying idea of producing > more efficient byte code for generator next and send calls, then we can > start another thread about it. So far there isn't any. > > Maybe I'll try to implement it sometime and see how much difference it > makes. If it's more than a few percent, I'll come back here with the > results. It may not be anytime soon though. I need to refresh my memory > on how to add new grammar and syntax. > > > I haven't even mentioned that this proposal can't fly because the Python >> compiler cannot tell ahead of time which code is intended. You could get >> around that by changing the syntax: >> > > Well, you just did... ;-) > > gen!! >> gen!x! >> > > I'm questioning the need for this to be syntax in the first place. >> > > It's what the syntax represents that I would like. A bit faster generator > suspend, resume, and value passing. If it can be done without new syntax, > That's even better. ;-) > > The thing that got me on this is, if generators aren't faster than a class > with method calls. Then why do we generators? > > Cheers, > Ron > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jul 21 07:33:59 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 21 Jul 2013 15:33:59 +1000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> <51E9781F.60007@pearwood.info> <51EB2FB7.2030301@pearwood.info> Message-ID: <51EB72C7.1030204@pearwood.info> On 21/07/13 14:04, Ron Adam wrote: > You seem to be stuck on this point. The exact syntax isn't important. You are clearly -1 on this particular spelling. That's fine. Ron, either I haven't explained myself clearly enough, or you haven't been paying attention :-) I'm not opposed to this particular spelling. I'm opposed to optimizing method calls with syntax *regardless of the spelling*. This opposition should be considered provisional. Obviously if there is a good enough reason to give something syntax, like item lookups have syntax seq[item], then I do not oppose it. But as a general rule, Python does not add syntax for every little thing that might happen to be micro-optimized by a special byte-code. Let me put it this way, to give an analogy... in some of my code, I have a class with a "calculate" method. I use this class *all the time*, it is really important to me. If I were to propose special syntax to call the "calculate" method, so as to avoid the method lookup and argument passing overhead, I would expect that most people would ask the same question I asked earlier: What's so special about this that it needs dedicated syntax? "It will be a tiny bit faster, because it avoids the method call lookup" is not an answer. That would be true for *any* method. What is so special about *this method* (whether it is "calculate" or "send" that it is worth paying the extra cost of new syntax in order to avoid that overhead? Normally Python only uses syntax for things that people absolutely expect to be syntax (like operators), or to give features that can't (conveniently, or at all) work as regular function calls (like del and import). If you have a good answer to that question, then I might change my mind and support your proposal. [...] > The thing that got me on this is, if generators aren't faster than a class with method calls. Then why do we generators? Because typically a generator is much easier to read and write. And also because once you have the infrastructure to support generators, that can be generalised to give you coroutines as well. Besides, are you sure that generators aren't faster? -- Steven From raymond.hettinger at gmail.com Sun Jul 21 07:54:56 2013 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 20 Jul 2013 22:54:56 -0700 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: <51E752A7.1060906@pearwood.info> References: <51E752A7.1060906@pearwood.info> Message-ID: On Jul 17, 2013, at 7:27 PM, Steven D'Aprano wrote: > I'm afraid that to me the idea seems too incoherent to punch holes in it. Quote of the week :-) Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Sun Jul 21 19:43:57 2013 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 21 Jul 2013 12:43:57 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: <51EB72C7.1030204@pearwood.info> References: <51E752A7.1060906@pearwood.info> <51E9781F.60007@pearwood.info> <51EB2FB7.2030301@pearwood.info> <51EB72C7.1030204@pearwood.info> Message-ID: On 07/21/2013 12:33 AM, Steven D'Aprano wrote: > On 21/07/13 14:04, Ron Adam wrote: > >> You seem to be stuck on this point. The exact syntax isn't important. >> You are clearly -1 on this particular spelling. That's fine. > > Ron, either I haven't explained myself clearly enough, or you haven't been > paying attention :-) I'm not opposed to this particular spelling. I'm > opposed to optimizing method calls with syntax *regardless of the spelling*. Yes, I understood this, and was paying attention. Often, the less ordered a persons mind is, the more creative they are, and they are more likely to find new ways to think about things, or find new solutions that are not readily apparent to others. (and solutions that don't always work too.) It's a trade off... The other side of that is someone who has a highly ordered mind, they often are very literate, and have exceptional memories. (I'm not one of these types. ;-) Looking through the opcode.h, there is LIST_APPEND. When and where that is used? It doesn't appear to be used with the append method. > This opposition should be considered provisional. Obviously if there is a > good enough reason to give something syntax, like item lookups have syntax > seq[item], then I do not oppose it. But as a general rule, Python does not > add syntax for every little thing that might happen to be micro-optimized > by a special byte-code. Yes, and I agree. > Let me put it this way, to give an analogy... in some of my code, I have a > class with a "calculate" method. I use this class *all the time*, it is > really important to me. If I were to propose special syntax to call the > "calculate" method, so as to avoid the method lookup and argument passing > overhead, I would expect that most people would ask the same question I > asked earlier: > > What's so special about this that it needs dedicated syntax? I don't have a concrete answer for you. It's not a completely new feature, and it isn't a bug. The best I can do is tell you that it doesn't feel (to me) like it should be a function call. When we call a function, what usually happens, Is a new frame instance is created, and a name space is initialised with the values passed though the signature. And then after that... the code object is started and evaluated. When a generator is resumed, the value just replaces the yield expression. That is quite different from creating a new frame and initialising a name space. It feels more like a communication port to me. Small talk is an interesting comparison where everything is done with messages. But that's the other extreme. David mentioned that it might be possibly to use the peepholer to optimise the bytecode. I think that would be a good first experiment to try. But I'm not sure it can do this.. I think it has a limitation of not optimising over blocks or frames. > "It will be a tiny bit faster, because it avoids the method call lookup" is > not an answer. That would be true for *any* method. What is so special > about *this method* (whether it is "calculate" or "send" that it is worth > paying the extra cost of new syntax in order to avoid that overhead? > Normally Python only uses syntax for things that people absolutely expect > to be syntax (like operators), or to give features that can't > (conveniently, or at all) work as regular function calls (like del and > import). > > If you have a good answer to that question, then I might change my mind and > support your proposal. One method can't be a special case. Is this what you are trying say here? If so, then I agree. Cheers, Ron > [...] >> The thing that got me on this is, if generators aren't faster than a >> class with method calls. Then why do we generators? > > Because typically a generator is much easier to read and write. And also > because once you have the infrastructure to support generators, that can be > generalised to give you coroutines as well. > > Besides, are you sure that generators aren't faster? > > > From ron3200 at gmail.com Mon Jul 22 07:16:58 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 22 Jul 2013 00:16:58 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: On 07/18/2013 11:59 PM, Nick Coghlan wrote: > On 19 July 2013 00:29, Ron Adam wrote: >> I was thinking a generator would be more efficient if it's called many >> times. But I think this is easier to understand. If there is more interest >> we can test both to see how much of a difference it makes. > > Suspending and resuming a generator is quite an expensive operation. > send() has the triple whammy of method call + resume generator + > suspend generator, so it's unlikely to outperform a simple method call > (even one that redirects to another method). Hmmm I think the generators did work faster in earlier versions on python run on older computers. Maybe, it's because today's python, and computers, use cache's much better, so there isn't as much of a difference. > Independent of performance though, I think the mutable sequence API > inspired append(), extend() and += are a better API for what you're > trying to achieve than "send", so it doesn't make sense to me to try > to shoehorn this into the generator API. '+=' is a additive/append type of operation. What is needed for cat... or sum is an extend operation. I don't think '+=' is right for that. And it is already used quite a lot in python programs. A much less used symbol is __lshift__ , '<<'. I think it also carries the right meaning of sending the value on the right to the object on the left. And not have too much other meanings in the same context. Binary left shift is hardly ever used in higher level programs. With that the example sum function can become this... def sum_items(start, items): try: c = Cat() return c << start << items << c.end except: for n in items: start += n return start Quite a bit simpler and shorter. Can work with strings, lists, tuples, and any type that can be converted to a list. It returns a modified start if it's mutable, or a new object of the start type if it immutable. (or a sum of values in a iterable.) Explicit string concatenation... >>> c = Cat() >>> c << "One " << "Two " << "Three!" << c.end 'One Two Three!' Good for a temporary buffer, and possibly have other uses with other types. And interesting alternative possibility is a concatenation_comprehension. Or cat_comp for short. >>> ("One " << "Two " << "Three!") 'One Two Three!' It would work with other types as well. You would almost always want the parentheses any ways. Cheers, Ron From stephen at xemacs.org Mon Jul 22 08:37:14 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 22 Jul 2013 15:37:14 +0900 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: <871u6rqew5.fsf@uwakimon.sk.tsukuba.ac.jp> Ron Adam writes: > '+=' is a additive/append type of operation. Ouch. When '+=' is additive, it's a binary operator on a single set. As such, it should take a sequence on both sides (assuming for the moment on wants to use it on sequences at all), and thus would correspond to concatenation (ie, list.extend()). list.append() and set.add() take things of different types, a container on the left and an object on the right (at least, if you made them into assignment operators, that's what they'd do). > A much less used symbol is __lshift__ , '<<'. But *shift is precisely the opposite: it takes a container on the left (eg, a word thought of as a bitstream) and a shifter (usually a number) on the right. I guess that's why Stroustrup (or whoever) chose left-shift for his iostream operators. So my intuition is precisely the opposite of yours as far as choice of spelling goes. For strings or lists, "<<=" would append an element, and "+=" would extend using a list. In any case the discussion is moot: the operator "+=" already has the definition I find more intuitive for the built-in sequence types. > And interesting alternative possibility is a concatenation_comprehension. > Or cat_comp for short. > > >>> ("One " << "Two " << "Three!") > 'One Two Three!' That's not a comprehension. A comprehension aggregates (and possibly transforms and filters) the elements of a container. Here the strings are elements listed explicitly; it's just an ordinary expression as far as anybody who doesn't know how magic "<<" is would know. From steve at pearwood.info Mon Jul 22 09:27:07 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 22 Jul 2013 17:27:07 +1000 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> Message-ID: <51ECDECB.9070307@pearwood.info> On 22/07/13 15:16, Ron Adam wrote: > A much less used symbol is __lshift__ , '<<'. I think it also carries the right meaning of sending the value on the right to the object on the left. I don't think that "sending the value" has any connection to concatenation, or at least no more than *any* method call can be considered "sending" a message to an object. -- Steven From ron3200 at gmail.com Tue Jul 23 01:29:18 2013 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 22 Jul 2013 18:29:18 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: <871u6rqew5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51E752A7.1060906@pearwood.info> <871u6rqew5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 07/22/2013 01:37 AM, Stephen J. Turnbull wrote: > Ron Adam writes: > > > '+=' is a additive/append type of operation. > > Ouch. When '+=' is additive, it's a binary operator on a single set. > As such, it should take a sequence on both sides (assuming for the > moment on wants to use it on sequences at all), and thus would > correspond to concatenation (ie, list.extend()). list.append() and > set.add() take things of different types, a container on the left and > an object on the right (at least, if you made them into assignment > operators, that's what they'd do). Hmmm, there was a reason I ruled that out.. I think in the initial version I was doing this... result = start << (a, b, c) Where a, b, and c are sequences to be joined to start. > > A much less used symbol is __lshift__ , '<<'. > > But *shift is precisely the opposite: it takes a container on the left > (eg, a word thought of as a bitstream) and a shifter (usually a > number) on the right. I guess that's why Stroustrup (or whoever) > chose left-shift for his iostream operators. So my intuition is > precisely the opposite of yours as far as choice of spelling goes. > For strings or lists, "<<=" would append an element, and "+=" would > extend using a list. '<<=' is a binary shift. Why would that mean append? Normally you can't chain '+=', or '<<=' in an expression. I think the meaning or context I'm trying to get is ... send y to x == x <-- y And have what x does with y not be so strictly defined. The '<<' symbol is closer to '<--' visually without adding a new symbol to python. > In any case the discussion is moot: the operator "+=" already has the > definition I find more intuitive for the built-in sequence types. > > > And interesting alternative possibility is a concatenation_comprehension. > > Or cat_comp for short. > > > > >>> ("One " << "Two " << "Three!") > > 'One Two Three!' > That's not a comprehension. A comprehension aggregates (and possibly > transforms and filters) the elements of a container. Here the strings > are elements listed explicitly; it's just an ordinary expression as > far as anybody who doesn't know how magic "<<" is would know. I agree, it would need something to signify it isn't just an ordinary expression. Cheers, Ron From stephen at xemacs.org Tue Jul 23 03:06:37 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 23 Jul 2013 10:06:37 +0900 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: References: <51E752A7.1060906@pearwood.info> <871u6rqew5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r4eqozj6.fsf@uwakimon.sk.tsukuba.ac.jp> Ron Adam writes: > I think the meaning or context I'm trying to get is ... > > send y to x == x <-- y That's a notation, not a "meaning". R uses the very similar '<-', for one example. But in R it means assignment. Other notations used to "send" one object to another, with semantics to "not be so strictly defined," include what in Python is attribute access x.y, function call x(y), mapping x[y], and extended assignment x+=y. Syntactically these are all the same: noncommutative operators on the set of objects, which expresses the directionality of abstract "send".[1] > > > >>> ("One " << "Two " << "Three!") > > > 'One Two Three!' > > > That's not a comprehension. A comprehension aggregates (and possibly > > transforms and filters) the elements of a container. Here the strings > > are elements listed explicitly; it's just an ordinary expression as > > far as anybody who doesn't know how magic "<<" is would know. > > I agree, it would need something to signify it isn't just an ordinary > expression. But it *is* just an ordinary expression. Why does it need to be anything else? BTW, we already have a way to create generators that send a sequence of objects to a particular object: z = start_object # eg, "" or [] g = (z.receive(x) for x in iterable) Consider: MacPorts 08:22$ python3.3 >>> z = [] >>> g = (z.append(c) for c in "abcdefghij") >>> for x in g: pass ... >>> z ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] >>> Note that x is a pure dummy; the loop could also be expressed while True: next(g) But this is an exercise in futility, as the whole point of a generator is that it generates. Here we just throw that generated object away, and the idea is better expressed explicitly: z = [] for c in "abcdefghij": z.append(c) (BTW, the generated object in this example is always None.) Footnotes: [1] Actually, in most languages attribute access doesn't send one object to another, it invokes a binding in a hierarchical namespace. But consider `get' in Lisp. Also these are "partial" operators: there are object pairs that raise an error rather than producing a result. I'm using this language only to focus on the *syntactic* similarity. Using it to discuss semantics or implementation would be extremely confusing. From ron3200 at gmail.com Tue Jul 23 08:22:49 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 23 Jul 2013 01:22:49 -0500 Subject: [Python-ideas] Adding __getter__ to compliment __iter__. In-Reply-To: <87r4eqozj6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51E752A7.1060906@pearwood.info> <871u6rqew5.fsf@uwakimon.sk.tsukuba.ac.jp> <87r4eqozj6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 07/22/2013 08:06 PM, Stephen J. Turnbull wrote: > Ron Adam writes: > > > I think the meaning or context I'm trying to get is ... > > > > send y to x == x <-- y > > That's a notation, not a "meaning". Yes, And you understood the meaning I was referring to from the notation. ;-) > R uses the very similar '<-', for > one example. But in R it means assignment. Other notations used to > "send" one object to another, with semantics to "not be so strictly > defined," include what in Python is attribute access x.y, function > call x(y), mapping x[y], and extended assignment x+=y. Syntactically > these are all the same: noncommutative operators on the set of > objects, which expresses the directionality of abstract "send".[1] > > > > > >>> ("One " << "Two " << "Three!") > > > > 'One Two Three!' > > > > > That's not a comprehension. A comprehension aggregates (and possibly > > > transforms and filters) the elements of a container. Here the strings > > > are elements listed explicitly; it's just an ordinary expression as > > > far as anybody who doesn't know how magic "<<" is would know. > > > > I agree, it would need something to signify it isn't just an ordinary > > expression. > > But it *is* just an ordinary expression. Why does it need to be > anything else? What I wrote is an ordinary expression. But what I was thinking of, was more than that. Consider a syntax that would have the effect of overriding methods on objects temporarily within the expressions boundaries. It might look something like the above expression, but it would need something more to convey that there was something different about it. This was just one possible alternative way to implement the (second version) of a suggestion like the one the subject of this thread describes. But it's all just too abstract at the moment. So I'm going to give it a break and maybe come back t it later if I can find a more practical and useful idea in all of this. The rest of your replay was was very interesting, thanks. Cheers, Ron > BTW, we already have a way to create generators that send a sequence > of objects to a particular object: > > z = start_object # eg, "" or [] > g = (z.receive(x) for x in iterable) > > Consider: > > MacPorts 08:22$ python3.3 > >>> z = [] > >>> g = (z.append(c) for c in "abcdefghij") > >>> for x in g: pass > ... > >>> z > ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] > >>> > > Note that x is a pure dummy; the loop could also be expressed > > while True: next(g) > > But this is an exercise in futility, as the whole point of a generator > is that it generates. Here we just throw that generated object away, > and the idea is better expressed explicitly: > > z = [] > for c in "abcdefghij": > z.append(c) > > (BTW, the generated object in this example is always None.) > > Footnotes: > [1] Actually, in most languages attribute access doesn't send one > object to another, it invokes a binding in a hierarchical namespace. > But consider `get' in Lisp. Also these are "partial" operators: there > are object pairs that raise an error rather than producing a result. > > I'm using this language only to focus on the *syntactic* similarity. > Using it to discuss semantics or implementation would be extremely > confusing. > From abarnert at yahoo.com Tue Jul 23 09:24:56 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 23 Jul 2013 09:24:56 +0200 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: <20130719051645.5af5495e@sergey> References: <20130717180350.24565872@sergey> <20130719051645.5af5495e@sergey> Message-ID: <18CE30F1-F0BB-47A4-85D4-207F055D6F0F@yahoo.com> On Jul 19, 2013, at 4:16, Sergey wrote: > On Jul 17, 2013 David Mertz: > >>> Imagine a type, that somehow modifies items that it stores, removes >>> duplicates, or sorts them, or something else, e.g.: >>> class aset(set): >>> def __add__(self, other): >>> return self|other Why would you do that? When sets were added, there was a long discussion about what operator to use for union, and | was chosen over + because + would misleadingly imply concatenation. >>> Now we have a code: >>> list_of_sets = [ aset(["item1","item2","item3"]) ] * 1000 >>> [...] >>> for i in sum(list_of_sets, aset()): >>> deal_with(i) >>> >>> If you replace `sum` with `chain` you get something like: >>> for i in chain.from_iterable(list_of_sets): >>> deal_with(i) >>> >>> Which works! (that's the worst part) but produces WRONG result! No, it's not the wrong result. Nobody in his right mind would expect a function called "chain" to union a bunch of iterables; they'd expect it to chain a bunch of iterables. Which is exactly what it does. >> >> In this example you can use: >> >> aset(chain(*list_of_sets)) >> >> This gives the same answer with the same big-O runtime. > > Sure, that's why I called it "error-prone" replacement. > When you have a code like: >>> for i in sum(list_of_sets, aset()): >>> deal_with(i) > You have pretty much no place for error. > > Well, it would be much better, if it was just: >>> for i in sum(list_of_sets): >>> deal_with(i) > but for historical reasons we already have second parameter, > so we have to deal with it. It's not just historical reasons. It's the only way you can handle a potentially empty iterable. With reduce, it's an error to call it with an empty iterable and no start value; with sum, because it's about summing numbers rather than about general folding, you get 0. But there's no third alternative in a dynamically typed language. > And now some newbie tries to use chain. So she does: >>> for i in chain(list_of_sets): >>> deal_with(i) > oops, does not work. Ah, missing star (you miss it yourself!) >>> for i in chain(*list_of_sets): >>> deal_with(i) > works, but incorrectly. Ok, let's hope that our newbie was careful > enough with tests and noticed, that it does not do what it should. > She reads the tutorial again, and notices that the example there was > like: > all_elems = list(chain(*list_of_lists)) > So she tries: >>> for i in list(chain(*list_of_sets)): >>> deal_with(i) This is a mistake right off the bat, and shows a fundamental misunderstanding of iterables. It's the exact same problem we always see with people writing "for i in list(my_str)" to iterate characters, or "for i in list(my_file)" to iterate lines. People will presumably run into it and learn that list(iterable) gives you the same iteration as iterable itself before they get to chain. But if not, this is as good a time to learn as any. > Nope, still wrong. Just in case she tries to remove a star, that she > don't understand anyway: >>> for i in list(chain(list_of_sets)): >>> deal_with(i) > Still no go. So after all these attempts she asks someone smart and > finally gets the correct code: >>> for i in aset(chain(list_of_sets)): >>> deal_with(i) This isn't really a good solution. It may work, but if you want to union a bunch of sets, you shouldn't try to spell it as chaining iterables into a set constructor. For example: for i in union(list_of_sets): for i in aset.union(list_of_sets): If you really want to write it as an expression over the 2-element union operator, you can: for i in reduce(aset.union, list_of_sets, aset()): But really, as with many such uses of reduce, this is probably more readable as a loop. Especially when you consider that there is no reason this needs to be an expression inside the for loop. So: bigset = aset() for i in list_of_sets: bigset |= i for i in bigset: All of these make it clear that we're creating the union of a bunch of sets. Note that in mathematical notation, you'd use a big U with the set of sets, not a sigma. More generally, You're trying to make it possible for people to write looping code without understanding looping. This is silly. Chain is a function for chaining iterables. If that's not what you want, don't use it. Meanwhile, if your hypothetical newbie created the aset class himself, he's not a newbie--novices don't know how to create classes that implement the Iterable and Sequence protocols. If he is at the stage where he's learning about that, it's a good time to learn that he's implemented an incorrect class. On the other hand, if he's using a class created by someone else, this will teach him that the class it buggy. Either way, the right way for him to use a class that misleadingly acts like a sequence even though it isn't is to stop using the class, or use it very carefully. For a newbie, the first answer is the answer. > As I said, `chain` is a nice feature for smart people. But it is > neither good for beginners, nor obvious, nor it's good as a sum > replacement. It's not good as a sum replacement, because it doesn't do the same thing. One sums numbers, the other chains iterables. Why should either one be a good replacement for the other? Needless to say, neither of the two is good as a union replacement. So what? >> It's possible to come up with more perverse customizations where >> this won't hold. But I think all of them involve redefining >> __add__ as something with little relation to it's normal meaning. >> Odd behavior in those cases is to be expected. > > Hah. Easy. Even for commonly used type ? for strings: > str(chain(*list_of_strings)) > it does not work. > > So we have: > > * chain(*list_of_something) > may be correct or may be not No, it's always correct. If you want to iterate over a list of strings, this does exactly what you want. > * something(chain(*list_of_something)) > may be correct or may be not This is not something you should generally want to do. Remember that the whole point of the iteration protocols is that you generally don't care what type you're iterating over. And when you do care, you usually want to build a collection of some specific type out of an iterable, again without caring about the original type. You want a list, or a blist.sortedlist, or whatever, and it doesn't matter that what was passed in was a list, a tuple, or something else. From joshua at landau.ws Tue Jul 23 11:21:11 2013 From: joshua at landau.ws (Joshua Landau) Date: Tue, 23 Jul 2013 10:21:11 +0100 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: <18CE30F1-F0BB-47A4-85D4-207F055D6F0F@yahoo.com> References: <20130717180350.24565872@sergey> <20130719051645.5af5495e@sergey> <18CE30F1-F0BB-47A4-85D4-207F055D6F0F@yahoo.com> Message-ID: On 23 July 2013 08:24, Andrew Barnert wrote: > On Jul 19, 2013, at 4:16, Sergey wrote: > > Nope, still wrong. Just in case she tries to remove a star, that she > > don't understand anyway: > >>> for i in list(chain(list_of_sets)): > >>> deal_with(i) > > Still no go. So after all these attempts she asks someone smart and > > finally gets the correct code: > >>> for i in aset(chain(list_of_sets)): > >>> deal_with(i) > That doesn't work because you want chain.from_iterable instead of chain. Or the star you're so fond of. I really think, in retrospect, that chain.from_iterable should be chain (you can always use chain([things]) to get the old behaviour). It's the behaviour people normally want, and it seems people often accidentally fall back to plain chain too easily. Plus, loads of people don't know of chain.from_iterable, for some strange reason. > This isn't really a good solution. It may work, but if you want to union > a bunch of sets, you shouldn't try to spell it as chaining iterables into a > set constructor. Why? It makes sense to me, and it's not less-efficient than the loop (except that it's not a C-loop, but that's not normally important) and I think it's more readable simply because it's shorter, so you can move on to the actual point of whatever you're doing. > > * something(chain(*list_of_something)) > > may be correct or may be not > > This is not something you should generally want to do. > > Remember that the whole point of the iteration protocols is that you > generally don't care what type you're iterating over. And when you do care, > you usually want to build a collection of some specific type out of an > iterable, again without caring about the original type. You want a list, or > a blist.sortedlist, or whatever, and it doesn't matter that what was passed > in was a list, a tuple, or something else. +1 Note that in the sortedlist situation, I believe there are merge sorts you should be doing, but they aren't generalisable. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Tue Jul 23 12:35:27 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 23 Jul 2013 13:35:27 +0300 Subject: [Python-ideas] list.copy() for consistency with dict.copy() Message-ID: Almost everybody knows that you need to copy structures in Python if you need to fork and modify them without consequences. >>> x = [1,2,3] >>> d = x >>> d[1] = 3 >>> x [1, 3, 3] For dict() the fastest way is to use dict.copy() method - http://stackoverflow.com/questions/5861498/fast-way-to-copy-dictionary-in-python For list() it appears that the fastest is to use non-obvious slicing operator: http://stackoverflow.com/questions/2612802/how-to-clone-a-list-in-python The idea is to add list.copy() and make it as fast as slicing. -- anatoly t. From amauryfa at gmail.com Tue Jul 23 12:48:23 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 23 Jul 2013 12:48:23 +0200 Subject: [Python-ideas] list.copy() for consistency with dict.copy() In-Reply-To: References: Message-ID: 2013/7/23 anatoly techtonik > The idea is to add list.copy() and make it as fast as slicing. > Already done in 3.3, see http://bugs.python.org/issue10516 and http://docs.python.org/3.3/whatsnew/3.3.html#other-language-changes And the implementation in C uses list_slice(), so your wishes are fulfilled! -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Tue Jul 23 14:35:09 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 23 Jul 2013 15:35:09 +0300 Subject: [Python-ideas] list.copy() for consistency with dict.copy() In-Reply-To: References: Message-ID: On Tue, Jul 23, 2013 at 1:48 PM, Amaury Forgeot d'Arc wrote: > 2013/7/23 anatoly techtonik >> >> The idea is to add list.copy() and make it as fast as slicing. > > > Already done in 3.3, see http://bugs.python.org/issue10516 > and http://docs.python.org/3.3/whatsnew/3.3.html#other-language-changes > > And the implementation in C uses list_slice(), so your wishes are fulfilled! That's wonderful. =) -- anatoly t. From tjreedy at udel.edu Tue Jul 23 18:33:48 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Jul 2013 12:33:48 -0400 Subject: [Python-ideas] Another attempt at a sum() alternative: the concatenation protocol In-Reply-To: <18CE30F1-F0BB-47A4-85D4-207F055D6F0F@yahoo.com> References: <20130717180350.24565872@sergey> <20130719051645.5af5495e@sergey> <18CE30F1-F0BB-47A4-85D4-207F055D6F0F@yahoo.com> Message-ID: On 7/23/2013 3:24 AM, Andrew Barnert wrote: > Remember that the whole point of the iteration protocols is that you > generally don't care what type you're iterating over. And when you do > care, you usually want to build a collection of some specific type > out of an iterable, again without caring about the original type. You > want a list, or a blist.sortedlist, or whatever, and it doesn't > matter that what was passed in was a list, a tuple, or something > else. Right. Chaining interators is the iterator equivalent of summing lists or tuples or and homogeneous collections of some specific concrete collection type. If we were designing sum() today, I think we would either restrict input to numbers *or* divide inputs into numbers and iterables and chain the latter. However, returning either a number or iterator seems awkward, so two functions is probably better. I am warning to the idea of making chain a builtin and possible deprecating non-number imputs to sum. -- Terry Jan Reedy From jon.brandvein at gmail.com Wed Jul 24 03:35:40 2013 From: jon.brandvein at gmail.com (Jonathan Brandvein) Date: Tue, 23 Jul 2013 21:35:40 -0400 Subject: [Python-ideas] Arbitrary constants in ASTs Message-ID: In the deprecated (removed in Python 3) compiler library, it was possible to construct a "Const" AST node. This node took in an arbitrary Python object, and got compiled to a LOAD_CONST opcode. This functionality is not available with the modern ast library. I wonder if there are any objections to adding it. Embedding objects in the AST can be handy. In a program transformation, you can avoid having to create code to define and assign to a new variable name. In code that is meant for exec or eval, you don't need to add the object to the global or local namespaces that you pass in. You also avoid the runtime overhead of a global namespace lookup, although your semantics are slightly different in that the object cannot be replaced at runtime. For a program transformation that introduces its own fixed objects that are not meant to be hot-swapped or monkey patched, this is a good tradeoff. I bring this up because I just ported a project from Python 2.4 to 3.3, and this was one feature where there was no good modern substitute available. The one drawback I can think of is that there is no general canonical source representation for an arbitrary constant. I will note that there are already ASTs that don't have a unique source representation: think of two ast.Name nodes that differ in their Store/Load context. Are there any limitations in CPython concerning what values can be placed in co_consts and accessed by LOAD_CONST? Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From haoyi.sg at gmail.com Wed Jul 24 03:54:04 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 24 Jul 2013 09:54:04 +0800 Subject: [Python-ideas] Arbitrary constants in ASTs In-Reply-To: References: Message-ID: As someone who's re-implemented this myself (not as performantly as a dedicated opcode, but what the hell) using hygienic macros ( https://github.com/lihaoyi/macropy#interned), I think it's a neat idea, but... How many other people in the world are doing AST manipulations to want this? I know I could use it, but I suspect I'm part of a very (very) small pool of developers who would use this kind of functionality. For what it's worth, my implementation using pickling in order to serialize the object(s) being passed in, and inserts a (hygienically renamed) top-level unpickle-and-assign statement into the module being worked with and references to that value are just Name() nodes referencing the value assigned. -Haoyi On Wed, Jul 24, 2013 at 9:35 AM, Jonathan Brandvein wrote: > In the deprecated (removed in Python 3) compiler library, it was possible > to construct a "Const" AST node. This node took in an arbitrary Python > object, and got compiled to a LOAD_CONST opcode. This functionality is not > available with the modern ast library. I wonder if there are any objections > to adding it. > > Embedding objects in the AST can be handy. In a program transformation, > you can avoid having to create code to define and assign to a new variable > name. In code that is meant for exec or eval, you don't need to add the > object to the global or local namespaces that you pass in. > > You also avoid the runtime overhead of a global namespace lookup, although > your semantics are slightly different in that the object cannot be replaced > at runtime. For a program transformation that introduces its own fixed > objects that are not meant to be hot-swapped or monkey patched, this is a > good tradeoff. > > I bring this up because I just ported a project from Python 2.4 to 3.3, > and this was one feature where there was no good modern substitute > available. > > The one drawback I can think of is that there is no general canonical > source representation for an arbitrary constant. I will note that there are > already ASTs that don't have a unique source representation: think of two > ast.Name nodes that differ in their Store/Load context. > > Are there any limitations in CPython concerning what values can be placed > in co_consts and accessed by LOAD_CONST? > > > Jon > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.brandvein at gmail.com Wed Jul 24 04:11:57 2013 From: jon.brandvein at gmail.com (Jonathan Brandvein) Date: Tue, 23 Jul 2013 22:11:57 -0400 Subject: [Python-ideas] Arbitrary constants in ASTs In-Reply-To: References: Message-ID: > As someone who's re-implemented this myself (not as performantly as a dedicated opcode, but what the hell) using hygienic macros ( https://github.com/lihaoyi/macropy#interned) Oh, so you're the macropy guy! :) I remember seeing your library, and I have meant to give it a thorough look one of these days. I think there's some overlap between what you do, and the shortcuts I use to make AST manipulations more palatable in my programming. > How many other people in the world are doing AST manipulations to want this? I know I could use it, but I suspect I'm part of a very (very) small pool of developers who would use this kind of functionality. True, the gain might not be worth the developer effort. But if people are directly manipulating ASTs in the first place, they may find this useful. My work is on a static transformation for implementing set comprehensions efficiently. A friend of mine is working on a distributed programming environment that is based on Python and gets compiled to Python code. Though, I suppose my sample is biased since I happen to be in a language research lab. ;) Jon -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Wed Jul 24 09:50:51 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 24 Jul 2013 10:50:51 +0300 Subject: [Python-ideas] namedlist() or Record type Message-ID: Rationale: There are many 2D data with table layout where order is important and it is hard to work with them in Python. The use case: 1. Get data row from a table 2. Change row column by name 3. Save data back For example, termios.tcgetattr() returns list [iflag, oflag, cflag, lflag, ispeed, ospeed, cc]. I need to modify some bits in lflag value and save this list back. In my code it looks like: newattr[3] &= ~termios.ICANON I want to get rid of magic number 3 without defining additional variables just for names. I thought that namedtuple can help here, but it is read-only and seems not serializeable. What I want is: >>> data = [1, 'john', 23434] >>> named = Record(data, ['id', 'name', 'number']) >>> named.id = 3 >>> named.name 'john' >>> named [3, 'john', 23434] >>> named.naem AttributeError(...) >>> named.dict() {'id': 3, 'name': john, 'number': 23434} >>> named.json() ... I hacked OrderedDict to accept list as param and allow attribute access. It doesn't behave as named list - you still need to call .values() to get list back, and indexed access doesn't work as well. http://bugs.python.org/file31026/DictRecord.py -- anatoly t. From techtonik at gmail.com Wed Jul 24 10:43:36 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 24 Jul 2013 11:43:36 +0300 Subject: [Python-ideas] Remove tty module Message-ID: http://hg.python.org/cpython/file/74fd1959cf44/Lib/tty.py tty module is cryptic. The low level interface it is exposed can be covered by appropriate recipes in termios module documentation. You can't understand tty without understanding termios. You also can't understand tty without being a Unix guru - http://en.wikipedia.org/wiki/TTY I doubt that raw mode function (which is 50% of this stuff) is really used by anyone. In ideal world the tty should be substituted with interface with less cryptic terminology replacing cbreak mode, cooked mode, raw mode with user oriented concepts. Otherwise this functionality is already covered by termios interface. -- anatoly t. From mal at egenix.com Wed Jul 24 11:01:42 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Jul 2013 11:01:42 +0200 Subject: [Python-ideas] Remove tty module In-Reply-To: References: Message-ID: <51EF97F6.5080302@egenix.com> On 24.07.2013 10:43, anatoly techtonik wrote: > http://hg.python.org/cpython/file/74fd1959cf44/Lib/tty.py > > tty module is cryptic. The low level interface it is exposed can be > covered by appropriate recipes in termios module documentation. You > can't understand tty without understanding termios. You also can't > understand tty without being a Unix guru - > http://en.wikipedia.org/wiki/TTY I doubt that raw mode function (which > is 50% of this stuff) is really used by anyone. > > In ideal world the tty should be substituted with interface with less > cryptic terminology replacing cbreak mode, cooked mode, raw mode with > user oriented concepts. Otherwise this functionality is already > covered by termios interface. Better references: http://en.wikipedia.org/wiki/POSIX_terminal_interface http://en.wikipedia.org/wiki/Cooked_mode cbreak and raw modes are needed for e.g. games or editors that need low-level access to the keyboard. For a user oriented TTY interface, have a look at http://en.wikipedia.org/wiki/Ncurses and http://docs.python.org/2.7/library/curses.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 24 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ubershmekel at gmail.com Wed Jul 24 11:09:04 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Wed, 24 Jul 2013 12:09:04 +0300 Subject: [Python-ideas] namedlist() or Record type In-Reply-To: References: Message-ID: On Wed, Jul 24, 2013 at 10:50 AM, anatoly techtonik wrote: > > newattr[3] &= ~termios.ICANON > > I want to get rid of magic number 3 without defining additional > variables just for names. I thought that namedtuple can help here, but > it is read-only and seems not serializeable. > > > This indeed would be useful in the csv module as DictReader/DictWriter are cute but the [""] are a bit extraneous. And it'd be a useful impromptu, mutable replacement for namedtuple. The only minor problem I foresee is when you have illegal python words as headers (e.g. named.class or named.break) but then you can fall back to [""]. But now you have more than one way to do it. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Wed Jul 24 11:25:42 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 24 Jul 2013 11:25:42 +0200 Subject: [Python-ideas] namedlist() or Record type In-Reply-To: References: Message-ID: <2B229179-E1D4-45E8-8EDD-E571CA610E13@mac.com> On 24 Jul, 2013, at 9:50, anatoly techtonik wrote: > Rationale: There are many 2D data with table layout where order is > important and it is hard to work with them in Python. > > The use case: > 1. Get data row from a table > 2. Change row column by name > 3. Save data back > > For example, termios.tcgetattr() returns list [iflag, oflag, cflag, > lflag, ispeed, ospeed, cc]. I need to modify some bits in lflag value > and save this list back. In my code it looks like: > > newattr[3] &= ~termios.ICANON > > I want to get rid of magic number 3 without defining additional > variables just for names. I thought that namedtuple can help here, but > it is read-only and seems not serializeable. Namedtuple has an _update method, which would allow you to write the code below if tcgetattr returns a named tuple: newattr = newattr._replace(cflag=newattr.cflag & ~termios.ICANON) However, tcgetattr does not return a namedtuple :-) What do you mean by "namedtuple [...] seems not serializable"? Both pickle and json work fine with named tuples. Ronald From greg.ewing at canterbury.ac.nz Wed Jul 24 09:47:31 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Jul 2013 19:47:31 +1200 Subject: [Python-ideas] Arbitrary constants in ASTs In-Reply-To: References: Message-ID: <51EF8693.3010003@canterbury.ac.nz> Jonathan Brandvein wrote: > Are there any limitations in CPython concerning what values can be > placed in co_consts and accessed by LOAD_CONST? Probably any marshallable object will work. -- Greg From abarnert at yahoo.com Wed Jul 24 13:02:17 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 24 Jul 2013 13:02:17 +0200 Subject: [Python-ideas] Remove tty module In-Reply-To: <51EF97F6.5080302@egenix.com> References: <51EF97F6.5080302@egenix.com> Message-ID: <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> On Jul 24, 2013, at 11:01, "M.-A. Lemburg" wrote: > On 24.07.2013 10:43, anatoly techtonik wrote: >> http://hg.python.org/cpython/file/74fd1959cf44/Lib/tty.py >> >> tty module is cryptic. The low level interface it is exposed can be >> covered by appropriate recipes in termios module documentation. You >> can't understand tty without understanding termios. You also can't >> understand tty without being a Unix guru - >> http://en.wikipedia.org/wiki/TTY I doubt that raw mode function (which >> is 50% of this stuff) is really used by anyone. Many novices want a way to read a character from the tty without waiting, to create simple keyboard menus or just "press any key to continue" apps without using curses or a GUI app. It would be even nicer if there were an easier way to do this (especially one that worked on both unix and windows); it's a pretty common question on StackOverflow and people are always disappointed at how complicated the answer is. But making it even harder is probably not a good idea. So, maybe if we had a higher level "consoleio" module along with the low level termios we wouldn't need the mid level tty. >> In ideal world the tty should be substituted with interface with less >> cryptic terminology replacing cbreak mode, cooked mode, raw mode with >> user oriented concepts. Otherwise this functionality is already >> covered by termios interface. I'm not sure providing the existing mid level functionality with friendlier but less standard (and therefore harder to look up) names would really help much. > Better references: > http://en.wikipedia.org/wiki/POSIX_terminal_interface > http://en.wikipedia.org/wiki/Cooked_mode > > cbreak and raw modes are needed for e.g. games or editors that > need low-level access to the keyboard. > > For a user oriented TTY interface, have a look at > http://en.wikipedia.org/wiki/Ncurses and > http://docs.python.org/2.7/library/curses.html Curses is great, but sometimes you want raw io without it. For one thing, occasionally you need to deal with a platform that doesn't have curses (like iOS). But more generally, sometimes all you want is raw input, without console windowing and other fancy stuff--again, think of those novices with their "press any key to continue" apps. So, a very simple pure python library on top of unix termios and windows conio might be nice as well. From victor.stinner at gmail.com Wed Jul 24 13:23:46 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Wed, 24 Jul 2013 13:23:46 +0200 Subject: [Python-ideas] Arbitrary constants in ASTs In-Reply-To: References: Message-ID: 2013/7/24 Haoyi Li : > How many other people in the world are doing AST manipulations to want this? I wrote a project called astoptimizer which rewrites AST to implement some static optimizations. http://bitbucket.org/haypo/astoptimizer/ I need a get_constant() method checking if a node is "a constant". It creates a tuple from a ast.Tuple tree for example. Having to "encode/decode" (create tuple <=> create ast.Tuple) regulary is not efficient. I would also like a first pass in the optimizer which would replace ast.Num, ast.Str, ast.NameConstant, ast.Tuple, etc. to a generic ast.Constant which contains directly the Python object. See my get_constant() method: https://bitbucket.org/haypo/astoptimizer/src/dfb2c702cb14785320ac34b868d5e1270f910825/astoptimizer/optimizer.py?at=default#cl-289 Victor From steve at pearwood.info Wed Jul 24 14:54:07 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 24 Jul 2013 22:54:07 +1000 Subject: [Python-ideas] namedlist() or Record type In-Reply-To: References: Message-ID: <51EFCE6F.3060708@pearwood.info> On 24/07/13 17:50, anatoly techtonik wrote: > Rationale: There are many 2D data with table layout where order is > important and it is hard to work with them in Python. [...] It sounds to me that what you want is a mutable namedtuple. At least, that's what I often want. namedtuple was a fantastically useful addition to the standard library, I think a mutable record type would be too. Here's a quick and dirty (emphasis on the dirty) proof of concept of the sort of thing I'd like: def record(name, fields): def __init__(self, *args): for slot, arg in zip(self.__slots__, args): setattr(self, slot, arg) return type(name, (), {'__slots__': fields.split(), '__init__': __init__} ) I don't put this forward as a production-ready solution, it is missing a nice repr, and doesn't support indexing or sequence unpacking, and I'm not really sure if it should use slots, but it gives the idea of the sort of thing that can be done. [...] > I hacked OrderedDict to accept list as param and allow attribute > access. It doesn't behave as named list - you still need to call > .values() to get list back, and indexed access doesn't work as well. > > http://bugs.python.org/file31026/DictRecord.py I don't think that OrderedDict is a good base class for this. Dicts have lots of methods that are completely inappropriate for a record/struct-like object. -- Steven From eric at trueblade.com Wed Jul 24 15:28:23 2013 From: eric at trueblade.com (Eric V. Smith) Date: Wed, 24 Jul 2013 09:28:23 -0400 Subject: [Python-ideas] namedlist() or Record type In-Reply-To: <51EFCE6F.3060708@pearwood.info> References: <51EFCE6F.3060708@pearwood.info> Message-ID: <51EFD677.8030303@trueblade.com> On 07/24/2013 08:54 AM, Steven D'Aprano wrote: > On 24/07/13 17:50, anatoly techtonik wrote: >> Rationale: There are many 2D data with table layout where order is >> important and it is hard to work with them in Python. > [...] > > It sounds to me that what you want is a mutable namedtuple. At least, > that's what I often want. namedtuple was a fantastically useful addition > to the standard library, I think a mutable record type would be too. There are a number of implementations of this on PyPi, and probably elsewhere. Here's one I wrote: https://pypi.python.org/pypi/recordtype/ It could use some polishing. I don't particularly like that it conflates mutability along with default parameters, but I couldn't think of an easy way to separate them, and for my particular use I needed both. Eric. From haoyi.sg at gmail.com Wed Jul 24 16:41:31 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 24 Jul 2013 22:41:31 +0800 Subject: [Python-ideas] namedlist() or Record type In-Reply-To: <51EFD677.8030303@trueblade.com> References: <51EFCE6F.3060708@pearwood.info> <51EFD677.8030303@trueblade.com> Message-ID: While we're on the subject of possible alternatives, I think macropy's case classes are a pretty nice balance between namedtuple-ish concerns of conciseness, auto-{__str__, __eq__, __init__, etc.} as well as class-ish concerns of being able to give them methods and initializers, with a pretty nice (macro-powered) syntax. It's a syntax that looks equally pretty whether you're writing a simple Point(x, y) struct or a large-ish class with many members, initialization logic and methods, which I think is pretty nice. On Wed, Jul 24, 2013 at 9:28 PM, Eric V. Smith wrote: > On 07/24/2013 08:54 AM, Steven D'Aprano wrote: > > On 24/07/13 17:50, anatoly techtonik wrote: > >> Rationale: There are many 2D data with table layout where order is > >> important and it is hard to work with them in Python. > > [...] > > > > It sounds to me that what you want is a mutable namedtuple. At least, > > that's what I often want. namedtuple was a fantastically useful addition > > to the standard library, I think a mutable record type would be too. > > There are a number of implementations of this on PyPi, and probably > elsewhere. Here's one I wrote: > > https://pypi.python.org/pypi/recordtype/ > > It could use some polishing. I don't particularly like that it conflates > mutability along with default parameters, but I couldn't think of an > easy way to separate them, and for my particular use I needed both. > > Eric. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jul 24 17:24:42 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 25 Jul 2013 01:24:42 +1000 Subject: [Python-ideas] Arbitrary constants in ASTs In-Reply-To: References: Message-ID: On 24 July 2013 21:23, Victor Stinner wrote: > 2013/7/24 Haoyi Li : >> How many other people in the world are doing AST manipulations to want this? > > I wrote a project called astoptimizer which rewrites AST to implement > some static optimizations. > http://bitbucket.org/haypo/astoptimizer/ > > I need a get_constant() method checking if a node is "a constant". It > creates a tuple from a ast.Tuple tree for example. Having to > "encode/decode" (create tuple <=> create ast.Tuple) regulary is not > efficient. I would also like a first pass in the optimizer which would > replace ast.Num, ast.Str, ast.NameConstant, ast.Tuple, etc. to a > generic ast.Constant which contains directly the Python object. > > See my get_constant() method: > https://bitbucket.org/haypo/astoptimizer/src/dfb2c702cb14785320ac34b868d5e1270f910825/astoptimizer/optimizer.py?at=default#cl-289 There are multiple patches on the tracker to improve the AST (most notably Eugene Toder's work in http://bugs.python.org/issue11549). Anyone interested in improving this would be well-advised to familiarise themselves with those patches, with a view to updating it in response to the comments already received and bring it inline with the current default branch (also keep in mind that the dis module in 3.4 has new capabilities to make bytecode analysis easier, and there is a test.bytecode_helper module to make it easier to write regression tests for the code generator). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From cs at zip.com.au Thu Jul 25 00:56:09 2013 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 25 Jul 2013 08:56:09 +1000 Subject: [Python-ideas] Remove tty module In-Reply-To: <51EF97F6.5080302@egenix.com> References: <51EF97F6.5080302@egenix.com> Message-ID: <20130724225609.GA1675@cskk.homeip.net> On 24Jul2013 11:01, M.-A. Lemburg wrote: | cbreak and raw modes are needed for e.g. games or editors that | need low-level access to the keyboard. Raw mode is also needs to talk to arbitrary serial devices. I've written code for such, and having some line discipline in the way would be disasterous. Not everything is a keyboard:-) (Disclaimer: that app was in Java, but the situation in Python would be unchanged.) I see "tty" only supplies two functions and explicitly requires termios anyway; why weren't these functions just provided as convenience routines inside termios? (I can imagine someone forward thinking saying "this could work elsewhere".) That said, termios could be ported to other platforms fairly easily, at least for most of it. I certainly wrote myself a termios layer in C for V7 UNIX, which didn't have it but did have the older tty setup intrerfaces. Any other OS presenting a serial line control library could probably be targeted nearly as easily for the basics (raw mode, tty speed, parity et al). Cheers, -- Cameron Simpson That is 27 years ago, or about half an eternity in computer years. - Alan Tibbetts From mal at egenix.com Thu Jul 25 09:53:32 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 25 Jul 2013 09:53:32 +0200 Subject: [Python-ideas] Remove tty module In-Reply-To: <20130724225609.GA1675@cskk.homeip.net> References: <51EF97F6.5080302@egenix.com> <20130724225609.GA1675@cskk.homeip.net> Message-ID: <51F0D97C.2000108@egenix.com> On 25.07.2013 00:56, Cameron Simpson wrote: > On 24Jul2013 11:01, M.-A. Lemburg wrote: > | cbreak and raw modes are needed for e.g. games or editors that > | need low-level access to the keyboard. > > Raw mode is also needs to talk to arbitrary serial devices. I've > written code for such, and having some line discipline in the way > would be disasterous. Not everything is a keyboard:-) (Disclaimer: > that app was in Java, but the situation in Python would be unchanged.) > > I see "tty" only supplies two functions and explicitly requires > termios anyway; why weren't these functions just provided as > convenience routines inside termios? > (I can imagine someone forward thinking saying "this could work elsewhere".) > > That said, termios could be ported to other platforms fairly easily, > at least for most of it. > > I certainly wrote myself a termios layer in C for V7 UNIX, which > didn't have it but did have the older tty setup intrerfaces. Any > other OS presenting a serial line control library could probably > be targeted nearly as easily for the basics (raw mode, tty speed, > parity et al). Looking at the code, it seems that the tty module was meant as higher level Python module for terminal interaction, whereas termios implemented the C parts - much like we have with ssl.py and the _ssl C module. tty does an import * of all termios symbols. Hard to say whether that was the original motivation, though, since the two modules are really old (both were added in 1994). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 25 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From clay.sweetser at gmail.com Thu Jul 25 10:08:44 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Thu, 25 Jul 2013 04:08:44 -0400 Subject: [Python-ideas] Remove tty module In-Reply-To: <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> Message-ID: On Jul 24, 2013 7:08 AM, "Andrew Barnert" wrote: > > On Jul 24, 2013, at 11:01, "M.-A. Lemburg" wrote: > > > On 24.07.2013 10:43, anatoly techtonik wrote: > >> http://hg.python.org/cpython/file/74fd1959cf44/Lib/tty.py > >> > >> tty module is cryptic. The low level interface it is exposed can be > >> covered by appropriate recipes in termios module documentation. You > >> can't understand tty without understanding termios. You also can't > >> understand tty without being a Unix guru - > >> http://en.wikipedia.org/wiki/TTY I doubt that raw mode function (which > >> is 50% of this stuff) is really used by anyone. > > Many novices want a way to read a character from the tty without waiting, to create simple keyboard menus or just "press any key to continue" apps without using curses or a GUI app. > > It would be even nicer if there were an easier way to do this (especially one that worked on both unix and windows); it's a pretty common question on StackOverflow and people are always disappointed at how complicated the answer is. But making it even harder is probably not a good idea. Yeah, I'm with you on the windows compatibility; It's always annoyed me that tty and termios (along with most of curses) is Unix only, when I'm fairly sure that windows has enough functionality in it's api that replication is possible. Still, I'm no expert on Window's api, so I may be wrong. > > So, maybe if we had a higher level "consoleio" module along with the low level termios we wouldn't need the mid level tty. > > >> In ideal world the tty should be substituted with interface with less > >> cryptic terminology replacing cbreak mode, cooked mode, raw mode with > >> user oriented concepts. Otherwise this functionality is already > >> covered by termios interface. > > I'm not sure providing the existing mid level functionality with friendlier but less standard (and therefore harder to look up) names would really help much. > > > Better references: > > http://en.wikipedia.org/wiki/POSIX_terminal_interface > > http://en.wikipedia.org/wiki/Cooked_mode > > > > cbreak and raw modes are needed for e.g. games or editors that > > need low-level access to the keyboard. > > > > For a user oriented TTY interface, have a look at > > http://en.wikipedia.org/wiki/Ncurses and > > http://docs.python.org/2.7/library/curses.html > > Curses is great, but sometimes you want raw io without it. For one thing, occasionally you need to deal with a platform that doesn't have curses (like iOS). But more generally, sometimes all you want is raw input, without console windowing and other fancy stuff--again, think of those novices with their "press any key to continue" apps. So, a very simple pure python library on top of unix termios and windows conio might be nice as well. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jul 25 17:58:29 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 25 Jul 2013 17:58:29 +0200 Subject: [Python-ideas] Remove tty module In-Reply-To: References: <51EF97F6.5080302@egenix.com> <9BB724B2-C361-47E0-8AA8-B235F6F9E568@yahoo.com> Message-ID: On Jul 25, 2013, at 10:08, Clay Sweetser wrote: > > On Jul 24, 2013 7:08 AM, "Andrew Barnert" wrote: > > > > On Jul 24, 2013, at 11:01, "M.-A. Lemburg" wrote: > > > > > On 24.07.2013 10:43, anatoly techtonik wrote: > > >> http://hg.python.org/cpython/file/74fd1959cf44/Lib/tty.py > > >> > > >> tty module is cryptic. The low level interface it is exposed can be > > >> covered by appropriate recipes in termios module documentation. You > > >> can't understand tty without understanding termios. You also can't > > >> understand tty without being a Unix guru - > > >> http://en.wikipedia.org/wiki/TTY I doubt that raw mode function (which > > >> is 50% of this stuff) is really used by anyone. > > > > Many novices want a way to read a character from the tty without waiting, to create simple keyboard menus or just "press any key to continue" apps without using curses or a GUI app. > > > > It would be even nicer if there were an easier way to do this (especially one that worked on both unix and windows); it's a pretty common question on StackOverflow and people are always disappointed at how complicated the answer is. But making it even harder is probably not a good idea. > > Yeah, I'm with you on the windows compatibility; It's always annoyed me that tty and termios (along with most of curses) is Unix only, when I'm fairly sure that windows has enough functionality in it's api that replication is possible. Still, I'm no expert on Window's api, so I may be wrong. > IIRC, there are two different curses near-clones for Windows, so it's obviously possible. But unless Python included one of them, a wrapper wouldn't be that useful. Faking termios on Windows (and presumably faking attributes for the cmd.exe console window) would probably be almost as much work as faking curses, and a lot less useful, so I'm not sure that would be worth doing. Also, the serial port stuff isn't as easy to integrate with the tty stuff on Windows. I think a separate module (which could also do things like enumerating serial files--which is also nontrivial on Macs and some other Unix systems) might be more useful than trying to do it together with tty. Just raw and echo flags on and off on stdin might be sufficient (together with isatty and a way to get the binary file underneath stdin, both of which we already have)? Or even just a single raw_read(maxcount=1) function? Switching back and forth for each read wouldn't exactly be super-efficient (especially if the tty is actually a 1200 baud SLIP or PPP or something--is that ever an issue nowadays?), but it would be very easy to implement, and novice-friendly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Thu Jul 25 21:22:39 2013 From: rymg19 at gmail.com (Ryan) Date: Thu, 25 Jul 2013 14:22:39 -0500 Subject: [Python-ideas] shlex escapes without Posix mode Message-ID: Note: This is my first post to the mailing list, so I'm not sure if I'm doing something wrong or something. I've been playing around with shlex.lately, and I mostly like it, but I have an idea. Have an option with the ability to enable certain Posix mode features selectively, most particularly character escapes. It could be something like, if Posix mode is disabled, the string of escape characters is set to empty or None, and assigning a value to it enables that feature in non-Posix mode. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Jul 26 05:35:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Jul 2013 13:35:15 +1000 Subject: [Python-ideas] shlex escapes without Posix mode In-Reply-To: References: Message-ID: <51F1EE73.3010608@pearwood.info> Hi Ryan, and welcome. On 26/07/13 05:22, Ryan wrote: > Note: This is my first post to the mailing list, so I'm not sure if I'm doing something wrong or something. > > I've been playing around with shlex.lately, and I mostly like it, but I have an idea. > > Have an option with the ability to enable certain Posix mode features selectively, most particularly character escapes. It could be something like, if Posix mode is disabled, the string of escape characters is set to empty or None, and assigning a value to it enables that feature in non-Posix mode. That's a good start, but it's awfully vague. "Something like"? Concrete ideas will help. Actual working code is best (although be cautious about posting large amounts of code here -- a few lines is fine, pages of code, not so much), or at least pseudo-code demonstrating how and when somebody might use this proposed feature. Good use-cases for why you might want the feature also helps. Under what circumstances would you say "Well, I don't want POSIX mode, but I do want POSIX escape sequences"? Ultimately, don't be surprised or disappointed at negative reactions. Negative reactions are better than silence -- at least it means that people have read, and care enough to comment, on your post, while silence may mean that nobody cares, or simply don't understand what you're talking about and are too polite to say so. We tend to be rather conservative about adding new features. Sometimes it takes *years* for features to be added, or they are never added, if nobody who cares about the feature steps up to program it. Remember too that new code has to carry its weight: code not only has one-off costs (code doesn't write itself, neither does the documentation), but also on-going costs (maintenance, bug-fixes, new features for users to learn, etc.), and no matter how low that cost is, it is never zero, so if the benefit from that feature is not more than the cost, it will probably be rejected. Two good posts you should read, by one of the senior core developers, are: http://www.boredomandlaziness.org/2011/04/musings-on-culture-of-python-dev.html http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html If you take nothing else from my reply, at least take from it these two questions: "Under what circumstances would this feature be useful to you? And would they be useful enough that you personally would program this feature, if you had the skills?" -- Steven From techtonik at gmail.com Fri Jul 26 12:36:14 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 26 Jul 2013 13:36:14 +0300 Subject: [Python-ideas] namedlist() or Record type In-Reply-To: <2B229179-E1D4-45E8-8EDD-E571CA610E13@mac.com> References: <2B229179-E1D4-45E8-8EDD-E571CA610E13@mac.com> Message-ID: On Wed, Jul 24, 2013 at 12:25 PM, Ronald Oussoren wrote: > > Namedtuple has an _update method, which would allow you to > write the code below if tcgetattr returns a named tuple: > > newattr = newattr._replace(cflag=newattr.cflag & ~termios.ICANON) > > However, tcgetattr does not return a namedtuple :-) Converting is possible, even if it looks ugly to call private method outside of class, but it is also unclear if tcsetattr() can accept nametuples. > What do you mean by "namedtuple [...] seems not serializable"? Both > pickle and json work fine with named tuples. While searching archive for similar topic I've found this one, which says that namedtuple pickling is "sometimes possible": https://groups.google.com/forum/#!topic/python-ideas/Pw0hNdiTu8A -- anatoly t. From techtonik at gmail.com Fri Jul 26 12:48:49 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 26 Jul 2013 13:48:49 +0300 Subject: [Python-ideas] namedlist() or Record type In-Reply-To: <51EFCE6F.3060708@pearwood.info> References: <51EFCE6F.3060708@pearwood.info> Message-ID: On Wed, Jul 24, 2013 at 3:54 PM, Steven D'Aprano wrote: > > Here's a quick and dirty (emphasis on the dirty) proof of concept of the > sort of thing I'd like: > > def record(name, fields): > def __init__(self, *args): > for slot, arg in zip(self.__slots__, args): > setattr(self, slot, arg) > return type(name, (), > {'__slots__': fields.split(), '__init__': __init__} > ) > > > I don't put this forward as a production-ready solution, it is missing a > nice repr, and doesn't support indexing or sequence unpacking, and I'm not > really sure if it should use slots, but it gives the idea of the sort of > thing that can be done. Indexing is vital for lists, the idea is to have a list replacement, just with convenient names to write more readable code. Slots looks like a good solution to throw errors about typos in attribute names as early as possible. > [...] > >> I hacked OrderedDict to accept list as param and allow attribute >> access. It doesn't behave as named list - you still need to call >> .values() to get list back, and indexed access doesn't work as well. >> >> http://bugs.python.org/file31026/DictRecord.py > > > > I don't think that OrderedDict is a good base class for this. Dicts have > lots of methods that are completely inappropriate for a record/struct-like > object. I agree, but it was the only way I can think of to get clean definition for list fields, such as: class TermiosState(DictRecord): NAMES = ['iflag', 'oflag', 'cflag', 'lflag', 'ispeed', 'ospeed', 'cc'] This definition can be easily parsed by static analyzer tools like pyflakes. The standard nametuple definition is not so clear to them: Point = namedtuple('Point', ['x', 'y'], verbose=True) -- anatoly t. From techtonik at gmail.com Fri Jul 26 13:30:01 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 26 Jul 2013 14:30:01 +0300 Subject: [Python-ideas] namedlist() or Record type In-Reply-To: <51EFD677.8030303@trueblade.com> References: <51EFCE6F.3060708@pearwood.info> <51EFD677.8030303@trueblade.com> Message-ID: On Wed, Jul 24, 2013 at 4:28 PM, Eric V. Smith wrote: > On 07/24/2013 08:54 AM, Steven D'Aprano wrote: >> On 24/07/13 17:50, anatoly techtonik wrote: >>> Rationale: There are many 2D data with table layout where order is >>> important and it is hard to work with them in Python. >> [...] >> >> It sounds to me that what you want is a mutable namedtuple. At least, >> that's what I often want. namedtuple was a fantastically useful addition >> to the standard library, I think a mutable record type would be too. Yes, "mutable namedtuple" is the name. > There are a number of implementations of this on PyPi, and probably > elsewhere. Here's one I wrote: > > https://pypi.python.org/pypi/recordtype/ > > It could use some polishing. I don't particularly like that it conflates > mutability along with default parameters, but I couldn't think of an > easy way to separate them, and for my particular use I needed both. That was the first idea for me too, but I'd like to see this defined differently. I remember my confusion when I first met "namedtuple" factory call in the code, so I'd prefer an easy static approach for names definition (such as class attribute). Default are not needed for such "mutable namedtuple", because it should always be initialized with list of the fixed length. What does the recordtype module do is helping to build a record given only partial set of data fields. This use case is often seen in DB oriented applications (Django models [1], etc.) and if you want a clear definition of your defaults, IMHO that's a way to go. Other static definitions I can think about look ugly and not extensible (i.e. with additional field properties besides defaults): class TermiosState(DictRecord): NAMES = ['iflag', 'oflag', 'cflag', 'lflag', 'ispeed', 'ospeed', 'cc'] DEFAULTS = [0, 0, 0, 0, 0, 0, 0] or class TermiosState(DictRecord): NAMES = ['iflag', 'oflag', 'cflag', 'lflag', 'ispeed', 'ospeed', 'cc'] DEFAULTS = dict(lflag=0) or class TermiosState(DictRecord): NAMES = dict( iflag=0, oflag=0, ... ) I like the definition of macropy "case" classes [2] mentioned by Haoyi Li. The name is the most unfortunate among all the names I can remember - "case" means so many things in English and in programming in particular, that people new to the concept may hack their brain before they find an appropriate tutorial. These "case classes" surely look like a hack, as they redefine the role of class definition parameters, but they hold more practical value than a multiple inheritance to me: @case class Point(x, y): pass The TermiosState can then be rewritten as: @record class TermiosState(iflag, oflag, cflag, lflag, ispeed, ospeed, cc): iflag = 0 # default Of course this is not as extensible as Django model definition. I am not sure if such syntax is portable across Python implementations, but it is the most concise. Of course there should be the proper check for name typos in section with defaults. 1. https://docs.djangoproject.com/en/dev/topics/db/models/ 2. https://github.com/lihaoyi/macropy#case-classes -- anatoly t. From techtonik at gmail.com Fri Jul 26 16:39:18 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 26 Jul 2013 17:39:18 +0300 Subject: [Python-ideas] sys.args Message-ID: sys.argv is an atavism with not intuitive name, how about sys.args for program arguments only? if not sys.args: print("usage: ...") -- anatoly t. From g.rodola at gmail.com Fri Jul 26 16:47:44 2013 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Fri, 26 Jul 2013 16:47:44 +0200 Subject: [Python-ideas] sys.args In-Reply-To: References: Message-ID: On Fri, Jul 26, 2013 at 4:39 PM, anatoly techtonik wrote: > sys.argv is an atavism with not intuitive name, how about sys.args for > program arguments only? > > if not sys.args: > print("usage: ...") > > -- > anatoly t. I'd say the cost in terms of compatibility breakage is way higher than the benefits. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ From techtonik at gmail.com Fri Jul 26 16:51:15 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 26 Jul 2013 17:51:15 +0300 Subject: [Python-ideas] sys.args In-Reply-To: References: Message-ID: On Fri, Jul 26, 2013 at 5:47 PM, Giampaolo Rodola' wrote: > On Fri, Jul 26, 2013 at 4:39 PM, anatoly techtonik wrote: >> sys.argv is an atavism with not intuitive name, how about sys.args for >> program arguments only? >> >> if not sys.args: >> print("usage: ...") > > I'd say the cost in terms of compatibility breakage is way higher than > the benefits. It's not sys.argv replacement, but an alternative that is easy to remember. -- anatoly t. From phd at phdru.name Fri Jul 26 16:52:17 2013 From: phd at phdru.name (Oleg Broytman) Date: Fri, 26 Jul 2013 18:52:17 +0400 Subject: [Python-ideas] sys.args In-Reply-To: References: Message-ID: <20130726145217.GA12658@iskra.aviel.ru> On Fri, Jul 26, 2013 at 05:39:18PM +0300, anatoly techtonik wrote: > sys.argv is an atavism with not intuitive name, how about sys.args for > program arguments only? > > if not sys.args: > print("usage: ...") 1. How are you going to handle backward compatibility? I.e., how are you going to fix millions scripts out there? 2. If you use argparse -- and it's *the way* to parse command-line args -- you don't need to use sys.argv at all, argparse does all necessary magic for you. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From fuzzyman at gmail.com Fri Jul 26 16:54:50 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Fri, 26 Jul 2013 17:54:50 +0300 Subject: [Python-ideas] sys.args In-Reply-To: References: Message-ID: On 26 July 2013 17:51, anatoly techtonik wrote: > On Fri, Jul 26, 2013 at 5:47 PM, Giampaolo Rodola' > wrote: > > On Fri, Jul 26, 2013 at 4:39 PM, anatoly techtonik > wrote: > >> sys.argv is an atavism with not intuitive name, how about sys.args for > >> program arguments only? > >> > >> if not sys.args: > >> print("usage: ...") > > > > I'd say the cost in terms of compatibility breakage is way higher than > > the benefits. > > It's not sys.argv replacement, but an alternative that is easy to remember. > In principle I like it. I agree that argv is an unintuitive hang over from earlier days. The only compatibility issue is that a lot of code out there (especially tests but not just tests) manipulates the contents of sys.argv to modify behaviour. Code might have to change to modifying sys.args and sys.argv, taking care to keep them in sync. All the best, Michael Foord > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at gmail.com Fri Jul 26 16:55:49 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Fri, 26 Jul 2013 17:55:49 +0300 Subject: [Python-ideas] sys.args In-Reply-To: <20130726145217.GA12658@iskra.aviel.ru> References: <20130726145217.GA12658@iskra.aviel.ru> Message-ID: On 26 July 2013 17:52, Oleg Broytman wrote: > On Fri, Jul 26, 2013 at 05:39:18PM +0300, anatoly techtonik < > techtonik at gmail.com> wrote: > > sys.argv is an atavism with not intuitive name, how about sys.args for > > program arguments only? > > > > if not sys.args: > > print("usage: ...") > > 1. How are you going to handle backward compatibility? I.e., how are you > going to fix millions scripts out there? > > How will adding another sys attribute break those scripts? > 2. If you use argparse -- and it's *the way* to parse command-line args -- > you don't need to use sys.argv at all, argparse does all necessary magic > for you. > Sometimes it's overkill. Or you have custom needs that argparse doesn't handle. Michael > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Fri Jul 26 16:59:16 2013 From: phd at phdru.name (Oleg Broytman) Date: Fri, 26 Jul 2013 18:59:16 +0400 Subject: [Python-ideas] sys.args In-Reply-To: References: Message-ID: <20130726145916.GA12720@iskra.aviel.ru> On Fri, Jul 26, 2013 at 05:51:15PM +0300, anatoly techtonik wrote: > On Fri, Jul 26, 2013 at 5:47 PM, Giampaolo Rodola' wrote: > > On Fri, Jul 26, 2013 at 4:39 PM, anatoly techtonik wrote: > >> sys.argv is an atavism with not intuitive name, how about sys.args for > >> program arguments only? > >> > >> if not sys.args: > >> print("usage: ...") > > > > I'd say the cost in terms of compatibility breakage is way higher than > > the benefits. > > It's not sys.argv replacement, but an alternative that is easy to remember. There shouldn't be two slightly different but very similar ways to access command-line arguments. Too much confusion for too little gain. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Fri Jul 26 17:02:32 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 26 Jul 2013 17:02:32 +0200 Subject: [Python-ideas] sys.args References: Message-ID: <20130726170232.24c378c8@pitrou.net> Le Fri, 26 Jul 2013 17:54:50 +0300, Michael Foord a ?crit : > > In principle I like it. I agree that argv is an unintuitive hang over > from earlier days. > > The only compatibility issue is that a lot of code out there > (especially tests but not just tests) manipulates the contents of > sys.argv to modify behaviour. Code might have to change to modifying > sys.args and sys.argv, taking care to keep them in sync. Yeah, and visually it can be quite... bothersome to distinguish between the two. This is a recipe for stupid, nerve-grating bugs. Regards Antoine. From phd at phdru.name Fri Jul 26 17:04:05 2013 From: phd at phdru.name (Oleg Broytman) Date: Fri, 26 Jul 2013 19:04:05 +0400 Subject: [Python-ideas] sys.args In-Reply-To: References: <20130726145217.GA12658@iskra.aviel.ru> Message-ID: <20130726150405.GB12720@iskra.aviel.ru> On Fri, Jul 26, 2013 at 05:55:49PM +0300, Michael Foord wrote: > On 26 July 2013 17:52, Oleg Broytman wrote: > > > On Fri, Jul 26, 2013 at 05:39:18PM +0300, anatoly techtonik < > > techtonik at gmail.com> wrote: > > > sys.argv is an atavism with not intuitive name, how about sys.args for > > > program arguments only? > > > > > > if not sys.args: > > > print("usage: ...") > > > > 1. How are you going to handle backward compatibility? I.e., how are you > > going to fix millions scripts out there? > > > > > How will adding another sys attribute break those scripts? Adding another way to access sys.argv without removing argv itself would lead to major confusion; "args" is too similar to "argv", the difference is one letter in spelling and one index in accessing. And removing sys.argv is a major compatibility problem. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From rymg19 at gmail.com Fri Jul 26 18:40:51 2013 From: rymg19 at gmail.com (Ryan) Date: Fri, 26 Jul 2013 11:40:51 -0500 Subject: [Python-ideas] sys.args In-Reply-To: References: Message-ID: <6d9c7034-ab57-4408-9f33-359865106342@email.android.com> But that violates the Python Zen: 'There should be one, and preferably only one, way to do it'. The name argv isn't meaningless. I believe it comes from C. argv was the argument vector, i.e. the container for the arguments. anatoly techtonik wrote: >On Fri, Jul 26, 2013 at 5:47 PM, Giampaolo Rodola' >wrote: >> On Fri, Jul 26, 2013 at 4:39 PM, anatoly techtonik > wrote: >>> sys.argv is an atavism with not intuitive name, how about sys.args >for >>> program arguments only? >>> >>> if not sys.args: >>> print("usage: ...") >> >> I'd say the cost in terms of compatibility breakage is way higher >than >> the benefits. > >It's not sys.argv replacement, but an alternative that is easy to >remember. >-- >anatoly t. >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jul 26 18:41:06 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Jul 2013 09:41:06 -0700 Subject: [Python-ideas] sys.args In-Reply-To: References: Message-ID: On Fri, Jul 26, 2013 at 7:39 AM, anatoly techtonik wrote: > sys.argv is an atavism with not intuitive name, how about sys.args for > program arguments only? > > if not sys.args: > print("usage: ...") Apart from what everyone else already said, you should really use argparse. -- --Guido van Rossum (python.org/~guido) From rymg19 at gmail.com Fri Jul 26 18:43:18 2013 From: rymg19 at gmail.com (Ryan) Date: Fri, 26 Jul 2013 11:43:18 -0500 Subject: [Python-ideas] shlex escapes without Posix mode In-Reply-To: <51F1EE73.3010608@pearwood.info> References: <51F1EE73.3010608@pearwood.info> Message-ID: The main thing is that this: ("d") In Posix mode gets split into this: (d) But, say the language has callable functions. I'd have to re-shlex.split the line to split the arguments. And, even then, the quotes already got destroyed. Escapes, however, are useful in practically every language. Restricting them to POSIX mode just kills it. And I had tried to see if I could implement it myself, but reading source code on Android SL4A is absolutely painful. And, whenever I pull up a computer, I always have a goal in mind and haven't got a chance to tweak it. I've never quite come across a language without some form of escapes. And, I can't think of an occasion where I'd use POSIX mode. Therefore, in the end, it would end up being better if you could enable the escapes individually. POSIX mode would have priority over the escape option. The instance could.be created like this: lex = shlex.shlex(escape='\\') The default value would be None. That would change the value of pex.escape to '\\'. If the value is None, escapes are disabled. Steven D'Aprano wrote: >Hi Ryan, and welcome. > > >On 26/07/13 05:22, Ryan wrote: >> Note: This is my first post to the mailing list, so I'm not sure if >I'm doing something wrong or something. >> >> I've been playing around with shlex.lately, and I mostly like it, but >I have an idea. >> >> Have an option with the ability to enable certain Posix mode features >selectively, most particularly character escapes. It could be something >like, if Posix mode is disabled, the string of escape characters is set >to empty or None, and assigning a value to it enables that feature in >non-Posix mode. > > >That's a good start, but it's awfully vague. "Something like"? Concrete >ideas will help. Actual working code is best (although be cautious >about posting large amounts of code here -- a few lines is fine, pages >of code, not so much), or at least pseudo-code demonstrating how and >when somebody might use this proposed feature. > >Good use-cases for why you might want the feature also helps. Under >what circumstances would you say "Well, I don't want POSIX mode, but I >do want POSIX escape sequences"? > >Ultimately, don't be surprised or disappointed at negative reactions. >Negative reactions are better than silence -- at least it means that >people have read, and care enough to comment, on your post, while >silence may mean that nobody cares, or simply don't understand what >you're talking about and are too polite to say so. > >We tend to be rather conservative about adding new features. Sometimes >it takes *years* for features to be added, or they are never added, if >nobody who cares about the feature steps up to program it. Remember too >that new code has to carry its weight: code not only has one-off costs >(code doesn't write itself, neither does the documentation), but also >on-going costs (maintenance, bug-fixes, new features for users to >learn, etc.), and no matter how low that cost is, it is never zero, so >if the benefit from that feature is not more than the cost, it will >probably be rejected. > >Two good posts you should read, by one of the senior core developers, >are: > >http://www.boredomandlaziness.org/2011/04/musings-on-culture-of-python-dev.html > >http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html > > >If you take nothing else from my reply, at least take from it these two >questions: > >"Under what circumstances would this feature be useful to you? And >would they be useful enough that you personally would program this >feature, if you had the skills?" > > > >-- >Steven >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jul 26 19:10:51 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 26 Jul 2013 19:10:51 +0200 Subject: [Python-ideas] shlex escapes without Posix mode In-Reply-To: References: <51F1EE73.3010608@pearwood.info> Message-ID: <97715D1A-C9F6-4519-8CEB-E0CC4F409808@yahoo.com> Are you trying to use shlex to parse code for some language other than sh or another shell language? It's not meant to be useful for perl or C or whatever. A general-purpose quoting, escaping, splitting, and joining module that could be configured to handle everything from sh to C to CSV could be cool, but shlex isn't it. On Jul 26, 2013, at 18:43, Ryan wrote: > The main thing is that this: > > ("d") > > In Posix mode gets split into this: > > (d) > > But, say the language has callable functions. I'd have to re-shlex.split the line to split the arguments. And, even then, the quotes already got destroyed. > > Escapes, however, are useful in practically every language. Restricting them to POSIX mode just kills it. And I had tried to see if I could implement it myself, but reading source code on Android SL4A is absolutely painful. And, whenever I pull up a computer, I always have a goal in mind and haven't got a chance to tweak it. > > I've never quite come across a language without some form of escapes. And, I can't think of an occasion where I'd use POSIX mode. Therefore, in the end, it would end up being better if you could enable the escapes individually. POSIX mode would have priority over the escape option. The instance could.be created like this: > > lex = shlex.shlex(escape='\\') > > The default value would be None. That would change the value of pex.escape to '\\'. If the value is None, escapes are disabled. > > Steven D'Aprano wrote: >> >> Hi Ryan, and welcome. >> >> >> On 26/07/13 05:22, Ryan wrote: >>> Note: This is my first post to the mailing list, so I'm not sure if I'm doing something wrong or something. >>> >>> I've been playing around with shlex.lately, and I mostly like it, but I have an idea. >>> >>> Have an option with the ability to enable certain Posix mode features selectively, most particularly character escapes. It could be something like, if Posix mode is disabled, the string of escape characters is set to empty or None, and assigning a value to it enables that feature in non-Posix mode. >> >> >> That's a good start, but it's awfully vague. "Something like"? Concrete ideas will help. Actual working code is best (although be cautious about posting large >> amounts of code here -- a few lines is fine, pages of code, not so much), or at least pseudo-code demonstrating how and when somebody might use this proposed feature. >> >> Good use-cases for why you might want the feature also helps. Under what circumstances would you say "Well, I don't want POSIX mode, but I do want POSIX escape sequences"? >> >> Ultimately, don't be surprised or disappointed at negative reactions. Negative reactions are better than silence -- at least it means that people have read, and care enough to comment, on your post, while silence may mean that nobody cares, or simply don't understand what you're talking about and are too polite to say so. >> >> We tend to be rather conservative about adding new features. Sometimes it takes *years* for features to be added, or they are never added, if nobody who cares about the feature steps up to program it. Remember too that new code has to carry its weight: code not only has one-off costs (code doesn't >> write itself, neither does the documentation), but also on-going costs (maintenance, bug-fixes, new features for users to learn, etc.), and no matter how low that cost is, it is never zero, so if the benefit from that feature is not more than the cost, it will probably be rejected. >> >> Two good posts you should read, by one of the senior core developers, are: >> >> http://www.boredomandlaziness.org/2011/04/musings-on-culture-of-python-dev.html >> >> http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html >> >> >> If you take nothing else from my reply, at least take from it these two questions: >> >> "Under what circumstances would this feature be useful to you? And would they be useful enough that you personally would program this feature, if you had the >> skills?" > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Fri Jul 26 19:28:08 2013 From: eric at trueblade.com (Eric V. Smith) Date: Fri, 26 Jul 2013 13:28:08 -0400 Subject: [Python-ideas] sys.args In-Reply-To: <20130726150405.GB12720@iskra.aviel.ru> References: <20130726145217.GA12658@iskra.aviel.ru> <20130726150405.GB12720@iskra.aviel.ru> Message-ID: <51F2B1A8.1070303@trueblade.com> On 7/26/2013 11:04 AM, Oleg Broytman wrote: > On Fri, Jul 26, 2013 at 05:55:49PM +0300, Michael Foord wrote: >> On 26 July 2013 17:52, Oleg Broytman wrote: >> >>> On Fri, Jul 26, 2013 at 05:39:18PM +0300, anatoly techtonik < >>> techtonik at gmail.com> wrote: >>>> sys.argv is an atavism with not intuitive name, how about sys.args for >>>> program arguments only? >>>> >>>> if not sys.args: >>>> print("usage: ...") >>> >>> 1. How are you going to handle backward compatibility? I.e., how are you >>> going to fix millions scripts out there? >>> >>> >> How will adding another sys attribute break those scripts? > > Adding another way to access sys.argv without removing argv itself > would lead to major confusion; "args" is too similar to "argv", the > difference is one letter in spelling and one index in accessing. Indeed, I read the subject as "sys.argv", and couldn't figure out what was being discussed. So at least one person has already fallen victim to this. -- Eric. From rosuav at gmail.com Sat Jul 27 00:08:55 2013 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 26 Jul 2013 23:08:55 +0100 Subject: [Python-ideas] sys.args In-Reply-To: <6d9c7034-ab57-4408-9f33-359865106342@email.android.com> References: <6d9c7034-ab57-4408-9f33-359865106342@email.android.com> Message-ID: On Fri, Jul 26, 2013 at 5:40 PM, Ryan wrote: > But that violates the Python Zen: 'There should be one, and preferably only > one, way to do it'. The name argv isn't meaningless. I believe it comes from > C. argv was the argument vector, i.e. the container for the arguments. > The name comes from the C pair of argc/argv, where you need a separate integer count of args. So it's not wholly appropriate to Python. Ironically, it's not even mandated by C (it's just a convention, equivalent to Python methods having a first argument named 'self'), but it's a fixed name in Python, where it's inaccurate. But it's something that's found in many other languages too, so it's something plenty of people will understand (particularly as 'argv' (almost) invariably refers to the program's arguments, while other uses of 'arg' might mean some particular function's args), so the name isn't terrible; and the backward compatibility loss (or the clarity loss of keeping both) is the strongest ... uhh, argument. ChrisA From rymg19 at gmail.com Sat Jul 27 00:11:32 2013 From: rymg19 at gmail.com (Ryan) Date: Fri, 26 Jul 2013 17:11:32 -0500 Subject: [Python-ideas] shlex escapes without Posix mode In-Reply-To: <97715D1A-C9F6-4519-8CEB-E0CC4F409808@yahoo.com> References: <51F1EE73.3010608@pearwood.info> <97715D1A-C9F6-4519-8CEB-E0CC4F409808@yahoo.com> Message-ID: <9273a76f-e842-4a70-b3ab-027073269090@email.android.com> ksh has parenthesis and shell functions. Still a shell language. And shell languages pretty much always have escapes. Andrew Barnert wrote: >Are you trying to use shlex to parse code for some language other than >sh or another shell language? It's not meant to be useful for perl or C >or whatever. > >A general-purpose quoting, escaping, splitting, and joining module that >could be configured to handle everything from sh to C to CSV could be >cool, but shlex isn't it. > >On Jul 26, 2013, at 18:43, Ryan wrote: > >> The main thing is that this: >> >> ("d") >> >> In Posix mode gets split into this: >> >> (d) >> >> But, say the language has callable functions. I'd have to >re-shlex.split the line to split the arguments. And, even then, the >quotes already got destroyed. >> >> Escapes, however, are useful in practically every language. >Restricting them to POSIX mode just kills it. And I had tried to see if >I could implement it myself, but reading source code on Android SL4A is >absolutely painful. And, whenever I pull up a computer, I always have a >goal in mind and haven't got a chance to tweak it. >> >> I've never quite come across a language without some form of escapes. >And, I can't think of an occasion where I'd use POSIX mode. Therefore, >in the end, it would end up being better if you could enable the >escapes individually. POSIX mode would have priority over the escape >option. The instance could.be created like this: >> >> lex = shlex.shlex(escape='\\') >> >> The default value would be None. That would change the value of >pex.escape to '\\'. If the value is None, escapes are disabled. >> >> Steven D'Aprano wrote: >>> >>> Hi Ryan, and welcome. >>> >>> >>> On 26/07/13 05:22, Ryan wrote: >>>> Note: This is my first post to the mailing list, so I'm not sure if >I'm doing something wrong or something. >>>> >>>> I've been playing around with shlex.lately, and I mostly like it, >but I have an idea. >>>> >>>> Have an option with the ability to enable certain Posix mode >features selectively, most particularly character escapes. It could be >something like, if Posix mode is disabled, the string of escape >characters is set to empty or None, and assigning a value to it enables >that feature in non-Posix mode. >>> >>> >>> That's a good start, but it's awfully vague. "Something like"? >Concrete ideas will help. Actual working code is best (although be >cautious about posting large >>> amounts of code here -- a few lines is fine, pages of code, not so >much), or at least pseudo-code demonstrating how and when somebody >might use this proposed feature. >>> >>> Good use-cases for why you might want the feature also helps. Under >what circumstances would you say "Well, I don't want POSIX mode, but I >do want POSIX escape sequences"? >>> >>> Ultimately, don't be surprised or disappointed at negative >reactions. Negative reactions are better than silence -- at least it >means that people have read, and care enough to comment, on your post, >while silence may mean that nobody cares, or simply don't understand >what you're talking about and are too polite to say so. >>> >>> We tend to be rather conservative about adding new features. >Sometimes it takes *years* for features to be added, or they are never >added, if nobody who cares about the feature steps up to program it. >Remember too that new code has to carry its weight: code not only has >one-off costs (code doesn't >>> write itself, neither does the documentation), but also on-going >costs (maintenance, bug-fixes, new features for users to learn, etc.), >and no matter how low that cost is, it is never zero, so if the benefit >from that feature is not more than the cost, it will probably be >rejected. >>> >>> Two good posts you should read, by one of the senior core >developers, are: >>> >>> >http://www.boredomandlaziness.org/2011/04/musings-on-culture-of-python-dev.html >>> >>> >http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html >>> >>> >>> If you take nothing else from my reply, at least take from it these >two questions: >>> >>> "Under what circumstances would this feature be useful to you? And >would they be useful enough that you personally would program this >feature, if you had the >>> skills?" >> >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jul 27 03:03:18 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 27 Jul 2013 03:03:18 +0200 Subject: [Python-ideas] sys.args In-Reply-To: References: <6d9c7034-ab57-4408-9f33-359865106342@email.android.com> Message-ID: <1E3B1562-46CD-4EC0-A549-760126BE4DB5@yahoo.com> Sent from a random iPhone On Jul 27, 2013, at 0:08, Chris Angelico wrote: > On Fri, Jul 26, 2013 at 5:40 PM, Ryan wrote: >> But that violates the Python Zen: 'There should be one, and preferably only >> one, way to do it'. The name argv isn't meaningless. I believe it comes from >> C. argv was the argument vector, i.e. the container for the arguments. > > The name comes from the C pair of argc/argv, where you need a separate > integer count of args. So it's not wholly appropriate to Python. > Ironically, it's not even mandated by C (it's just a convention, > equivalent to Python methods having a first argument named 'self'), While the names of the arguments to main aren't mandated by C or POSIX, the names of the arguments to related functions ranging from getopt to execv are mandated by one or the other, and universally argc and argv. (For main, those names _are_ used in the signature and the text, but with the parenthetical comment "referred to here as argc and argv, although any names may be used, as they are local to the function".) The funny thing is that the C standard goes out of its way to avoid using the words "count" and "vector", and even "argument". There's some pretty convoluted verbiage to explain that argc is a count without ever saying so. Interestingly, Python's higher level wrappers like subprocess and argparse, and even some of the lower level thing wrappers, use args instead. So there might be some precedent for this idea if sys.argv were considered a higher-level interface. But of course the higher level interface is argparse or fileinput. IIRC the argv docs even recommend using one of them in place of trying to process arguments manually. And if not? maybe the appropriate fix is that they should? > but it's a fixed name in Python, where it's inaccurate. But it's > something that's found in many other languages too, so it's something > plenty of people will understand (particularly as 'argv' (almost) > invariably refers to the program's arguments, Well, it also often refers to related things--if the list I pass to subprocess, the fake args I pass to argparse for testing, etc. are stored in a variable in my code, it's probably called argv... But that just strengthens your point. From stephen at xemacs.org Sat Jul 27 06:42:04 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 27 Jul 2013 13:42:04 +0900 Subject: [Python-ideas] sys.args In-Reply-To: <51F2B1A8.1070303@trueblade.com> References: <20130726145217.GA12658@iskra.aviel.ru> <20130726150405.GB12720@iskra.aviel.ru> <51F2B1A8.1070303@trueblade.com> Message-ID: <877ggcpqar.fsf@uwakimon.sk.tsukuba.ac.jp> Eric V. Smith writes: > Indeed, I read the subject as "sys.argv", and couldn't figure out what > was being discussed. So at least one person has already fallen victim to > this. +1 except I assumed the subject spelling was a typo. If one really feels this needs to be deliberately snafu'd, at least choosing longer names would minimize the inevitable confusion. Better than "sys.args" would be "sys.arguments" or "sys.arg_list". IMHO, etc. From steve at pearwood.info Sat Jul 27 12:01:43 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Jul 2013 20:01:43 +1000 Subject: [Python-ideas] Support Unicode code point notation Message-ID: <51F39A87.5030209@pearwood.info> Unicode's standard notation for code points is U+ followed by a 4, 5 or 6 hex digit string, such as ? = U+03C0. This notation is found throughout the Unicode Consortium's website, e.g.: http://www.unicode.org/versions/corrigendum2.html as well as in third party sites that have reason to discuss Unicode code points, e.g.: https://en.wikipedia.org/wiki/Eth#Computer_input I propose that Python strings support this as the preferred escape notation for Unicode code points: '\U+03C0' => '?' The existing \U and \u variants must be kept for backwards compatibility, but should be (mildly) discouraged in new code. Doesn't this violate "Only One Way To Do It"? --------------------------------------------- That's not what the Zen says. The Zen says there should be One Obvious Way to do it, not Only One. It is my hope that we can agree that the One Obvious Way to refer to a Unicode character by its code point is by using the same notation that the Unicode Consortium uses: d <=> U+0064 and leave legacy escape sequences as the not-so-obvious ways to do it: \x64 \144 \u0064 \U00000064 Why do we need yet another way of writing escape sequences? ----------------------------------------------------------- We don't need another one, we need a better one. U+xxxx is the standard Unicode notation, while existing Python escapes have various problems. One-byte hex and oct escapes are a throwback to the old one-byte ASCII days, and reflect an obsolete idea of strings being equivalent to bytes. Backwards compatibility requires that we continue to support them, but they shouldn't be encouraged in strings. Two-byte \u escapes are harmless, so long as you imagine that Unicode is a 16-bit character set. Unfortunately, it is not. \u does not support code points in the Supplementary Multilingual Planes (those with ordinal value greater than 0xFFFF), and can silently give the wrong result if you make a mistake in counting digits: # I want EGYPTIAN HIEROGLYPH D010 (Eye of Horus) s = '\u13080' => oops, I get '?0' (ETHIOPIC SYLLABLE GA, ZERO) Four-byte \U escape sequences support the entire Unicode character set, but they are terribly verbose, and the first three digits are *always* zero. Python doesn't (and shouldn't) support \U escapes beyond 10FFFF, so the first three digits of the eight digit hex value are pointless. What is the U+ escape specification? ------------------------------------ http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals lists the escape sequences, including: \uxxxx Character with 16-bit hex value xxxx \Uxxxxxxxx Character with 32-bit hex value xxxxxxxx To this should be added: \U+xxxx Character at code point xxxx (hex) with the note: Exactly 4, 5 or 6 hexadecimal digits are required. Upper or lower case? -------------------- Uppercase should be preferred, as the Unicode Consortium uses it, but both should be accepted. Variable number of digits? Isn't that a bad thing? -------------------------------------------------- It's neither good nor bad. Octal escapes already support from 1 to 3 oct digits. In some languages (but not Python), hex escapes support from 1 to an unlimited number of hex digits. Is this backwards compatible? ----------------------------- I believe it is. As of Python 3.3, strings using \U+ give a syntax error: py> '\U+13080' File "", line 1 SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-7: end of string in escape sequence What deprecation schedule are you proposing? -------------------------------------------- I'm not. At least, the existing features should not be considered for removal before Python 4000. In the meantime, the U+ form should be noted as the preferred way, and perhaps blessed in PEP 8. Should string reprs use the U+ form? ------------------------------------ \u escapes are sometimes used in string reprs, e.g. for private-use characters: py> chr(0xE034) '\ue034' Should this change to '\U+E034'? My personal preference is that it should, but I fear backwards compatibility may prevent it. Even if the exact form of str.__repr__ is not guaranteed, changing the repr would break (e.g.) some doctests. This proposal defers any discussion of changing the repr of strings to use U+ escapes. -- Steven From mal at egenix.com Sat Jul 27 12:07:22 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 27 Jul 2013 12:07:22 +0200 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39A87.5030209@pearwood.info> References: <51F39A87.5030209@pearwood.info> Message-ID: <51F39BDA.8070108@egenix.com> Steven D'Aprano wrote: > Unicode's standard notation for code points is U+ followed by a 4, 5 or 6 hex digit string, such as > ? = U+03C0. This notation is found throughout the Unicode Consortium's website, e.g.: > > http://www.unicode.org/versions/corrigendum2.html > > as well as in third party sites that have reason to discuss Unicode code points, e.g.: > > https://en.wikipedia.org/wiki/Eth#Computer_input > > I propose that Python strings support this as the preferred escape notation for Unicode code points: > > '\U+03C0' > => '?' -1. The \u and \U notations are standard in several programming languages, e.g. Java and C++, so we're in good company. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From abarnert at yahoo.com Sat Jul 27 12:18:46 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 27 Jul 2013 12:18:46 +0200 Subject: [Python-ideas] shlex escapes without Posix mode In-Reply-To: <9273a76f-e842-4a70-b3ab-027073269090@email.android.com> References: <51F1EE73.3010608@pearwood.info> <97715D1A-C9F6-4519-8CEB-E0CC4F409808@yahoo.com> <9273a76f-e842-4a70-b3ab-027073269090@email.android.com> Message-ID: On Jul 27, 2013, at 0:11, Ryan wrote: > ksh has parenthesis and shell functions. Still a shell language. And you have to use posix mode with it. Otherwise it'll get quotes within words, empty strings, etc. wrong. Also, posix mode will handle the parentheses the same way as legacy mode. The parsing rules don't have any differences with parens. Or Try shlex('foo("bar")') with both modes, and call get_token repeatedly and see. So parens aren't relevant, and you aren't trying to do anything with ksh you wouldn't do the same way with sh, unless I'm misunderstanding you. It sounds like what you want is the legacy internal-quote-stripping mode with posix everything else? But you don't even really want that; if you're parsing ksh, you need to handle internal quotes the same way that sh does; you just don't want to consider parenthesize arguments "internal". And turning off posix mode doesn't do that--it seems to do the right thing in trivial cases, but not in general. More importantly, it sounds like you want to parse parens, which means you really need to use a shlex instance manually rather than calling split. For example: >>> s=shlex.shlex('foo("spam eggs" bar)') >>> list(iter(s.get_token, None)) ['foo', '(', 'spam eggs', 'bar', ')'] Those are the tokens you want, right? There's no way to get that with split. > And shell languages pretty much always have escapes. > > Andrew Barnert wrote: >> >> Are you trying to use shlex to parse code for some language other than sh or another shell language? It's not meant to be useful for perl or C or whatever. >> >> A general-purpose quoting, escaping, splitting, and joining module that could be configured to handle everything from sh to C to CSV could be cool, but shlex isn't it. >> >> On Jul 26, 2013, at 18:43, Ryan wrote: >> >>> The main thing is that this: >>> >>> ("d") >>> >>> In Posix mode gets split into this: >>> >>> (d) >>> >>> But, say the language has callable functions. I'd have to re-shlex.split the line to split the arguments. And, even then, the quotes already got destroyed. >>> >>> Escapes, however, are useful in practically every language. Restricting them to POSIX mode just kills it. And I had tried to see if I could implement it myself, but reading source code on Android SL4A is absolutely painful. And, whenever I pull up a computer, I always have a goal in mind and haven't got a chance to tweak it. >>> >>> I've never quite come across a language without some form of escapes. And, I can't think of an occasion where I'd use POSIX mode. Therefore, in the end, it would end up being better if you could enable the escapes individually. POSIX mode would have priority over the escape option. The instance could.be created like this: >>> >>> lex = shlex.shlex(escape='\\') >>> >>> The default value would be None. That would change the value of pex.escape to '\\'. If the value is None, escapes are disabled. >>> >>> Steven D'Aprano wrote: >>>> >>>> Hi Ryan, and welcome. >>>> >>>> >>>> On 26/07/13 05:22, Ryan wrote: >>>>> Note: This is my first post to the mailing list, so I'm not sure if I'm doing something wrong or something. >>>>> >>>>> I've been playing around with shlex.lately, and I mostly like it, but I have an idea. >>>>> >>>>> Have an option with the ability to enable certain Posix mode features selectively, most particularly character escapes. It could be something like, if Posix mode is disabled, the string of escape characters is set to empty or None, and assigning a value to it enables that feature in non-Posix mode. >>>> >>>> >>>> That's a good start, but it's awfully vague. "Something like"? Concrete ideas will help. Actual working code is best (although be cautious about posting large >>>> amounts of code here -- a few lines is fine, pages of code, not so much), or at least pseudo-code demonstrating how and when somebody might use this proposed feature. >>>> >>>> Good use-cases for why you might want the feature also helps. Under what circumstances would you say "Well, I don't want POSIX mode, but I do want POSIX escape sequences"? >>>> >>>> Ultimately, don't be surprised or disappointed at negative reactions. Negative reactions are better than silence -- at least it means that people have read, and care enough to comment, on your post, while silence may mean that nobody cares, or simply don't understand what you're talking about and are too polite to say so. >>>> >>>> We tend to be rather conservative about adding new features. Sometimes it takes *years* for features to be added, or they are never added, if nobody who cares about the feature steps up to program it. Remember too that new code has to carry its weight: code not only has one-off costs (code doesn't >>>> write itself, neither does the documentation), but also on-going costs (maintenance, bug-fixes, new features for users to learn, etc.), and no matter how low that cost is, it is never zero, so if the benefit from that feature is not more than the cost, it will probably be rejected. >>>> >>>> Two good posts you should read, by one of the senior core developers, are: >>>> >>>> http://www.boredomandlaziness.org/2011/04/musings-on-culture-of-python-dev.html >>>> >>>> http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html >>>> >>>> >>>> If you take nothing else from my reply, at least take from it these two questions: >>>> >>>> "Under what circumstances would this feature be useful to you? And would they be useful enough that you personally would program this feature, if you had the >>>> skills?" >>> >>> -- >>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/mailman/listinfo/python-ideas > > -- > Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ian at feete.org Sat Jul 27 12:22:56 2013 From: ian at feete.org (Ian Foote) Date: Sat, 27 Jul 2013 11:22:56 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39A87.5030209@pearwood.info> References: <51F39A87.5030209@pearwood.info> Message-ID: <51F39F80.4070108@feete.org> On 27/07/13 11:01, Steven D'Aprano wrote: > Unicode's standard notation for code points is U+ followed by a 4, 5 or > 6 hex digit string, such as ? = U+03C0. This notation is found > throughout the Unicode Consortium's website, e.g.: > > http://www.unicode.org/versions/corrigendum2.html > > as well as in third party sites that have reason to discuss Unicode code > points, e.g.: > > https://en.wikipedia.org/wiki/Eth#Computer_input > > I propose that Python strings support this as the preferred escape > notation for Unicode code points: > > '\U+03C0' > => '?' > > The existing \U and \u variants must be kept for backwards > compatibility, but should be (mildly) discouraged in new code. > > > Doesn't this violate "Only One Way To Do It"? > --------------------------------------------- > > That's not what the Zen says. The Zen says there should be One Obvious > Way to do it, not Only One. It is my hope that we can agree that the One > Obvious Way to refer to a Unicode character by its code point is by > using the same notation that the Unicode Consortium uses: > > d <=> U+0064 > > and leave legacy escape sequences as the not-so-obvious ways to do it: > > \x64 \144 \u0064 \U00000064 > > > Why do we need yet another way of writing escape sequences? > ----------------------------------------------------------- > > We don't need another one, we need a better one. U+xxxx is the standard > Unicode notation, while existing Python escapes have various problems. > > One-byte hex and oct escapes are a throwback to the old one-byte ASCII > days, and reflect an obsolete idea of strings being equivalent to bytes. > Backwards compatibility requires that we continue to support them, but > they shouldn't be encouraged in strings. > > Two-byte \u escapes are harmless, so long as you imagine that Unicode is > a 16-bit character set. Unfortunately, it is not. \u does not support > code points in the Supplementary Multilingual Planes (those with ordinal > value greater than 0xFFFF), and can silently give the wrong result if > you make a mistake in counting digits: > > # I want EGYPTIAN HIEROGLYPH D010 (Eye of Horus) > s = '\u13080' > => oops, I get '?0' (ETHIOPIC SYLLABLE GA, ZERO) > > Four-byte \U escape sequences support the entire Unicode character set, > but they are terribly verbose, and the first three digits are *always* > zero. Python doesn't (and shouldn't) support \U escapes beyond 10FFFF, > so the first three digits of the eight digit hex value are pointless. > > > What is the U+ escape specification? > ------------------------------------ > > http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals > > > lists the escape sequences, including: > > \uxxxx Character with 16-bit hex value xxxx > \Uxxxxxxxx Character with 32-bit hex value xxxxxxxx > > > To this should be added: > > \U+xxxx Character at code point xxxx (hex) > > > with the note: > > Exactly 4, 5 or 6 hexadecimal digits are required. > > > Upper or lower case? > -------------------- > > Uppercase should be preferred, as the Unicode Consortium uses it, but > both should be accepted. > > > Variable number of digits? Isn't that a bad thing? > -------------------------------------------------- > > It's neither good nor bad. Octal escapes already support from 1 to 3 oct > digits. In some languages (but not Python), hex escapes support from 1 > to an unlimited number of hex digits. > > > Is this backwards compatible? > ----------------------------- > > I believe it is. As of Python 3.3, strings using \U+ give a syntax error: > > py> '\U+13080' > File "", line 1 > SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in > position 0-7: end of string in escape sequence > > > What deprecation schedule are you proposing? > -------------------------------------------- > > I'm not. At least, the existing features should not be considered for > removal before Python 4000. In the meantime, the U+ form should be noted > as the preferred way, and perhaps blessed in PEP 8. > > > Should string reprs use the U+ form? > ------------------------------------ > > \u escapes are sometimes used in string reprs, e.g. for private-use > characters: > > py> chr(0xE034) > '\ue034' > > Should this change to '\U+E034'? My personal preference is that it > should, but I fear backwards compatibility may prevent it. Even if the > exact form of str.__repr__ is not guaranteed, changing the repr would > break (e.g.) some doctests. > > This proposal defers any discussion of changing the repr of strings to > use U+ escapes. > > > What should 'U+12345' be? U+12345 CUNEIFORM SIGN URU TIMES KI or U+1234 ETHIOPIC SYLLABLE SEE and a digit 5? -1 without a clear way to disambiguate. Regards, Ian From ian at feete.org Sat Jul 27 12:26:40 2013 From: ian at feete.org (Ian Foote) Date: Sat, 27 Jul 2013 11:26:40 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39F80.4070108@feete.org> References: <51F39A87.5030209@pearwood.info> <51F39F80.4070108@feete.org> Message-ID: <51F3A060.80606@feete.org> On 27/07/13 11:22, Ian Foote wrote: > > What should 'U+12345' be? U+12345 CUNEIFORM SIGN URU TIMES KI or U+1234 > ETHIOPIC SYLLABLE SEE and a digit 5? > Oops, I meant '\U+12345'. Regards, Ian From abarnert at yahoo.com Sat Jul 27 13:01:47 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 27 Jul 2013 13:01:47 +0200 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39F80.4070108@feete.org> References: <51F39A87.5030209@pearwood.info> <51F39F80.4070108@feete.org> Message-ID: <4518F168-1AAC-4310-B44F-F7D3C0294667@yahoo.com> On Jul 27, 2013, at 12:22, Ian Foote wrote: > On 27/07/13 11:01, Steven D'Aprano wrote: >> Unicode's standard notation for code points is U+ followed by a 4, 5 or >> 6 hex digit string, such as ? = U+03C0. This notation is found >> throughout the Unicode Consortium's website, e.g.: >> >> http://www.unicode.org/versions/corrigendum2.html >> >> as well as in third party sites that have reason to discuss Unicode code >> points, e.g.: >> >> https://en.wikipedia.org/wiki/Eth#Computer_input >> >> I propose that Python strings support this as the preferred escape >> notation for Unicode code points: >> >> '\U+03C0' >> => '?' >> >> The existing \U and \u variants must be kept for backwards >> compatibility, but should be (mildly) discouraged in new code. >> >> >> Doesn't this violate "Only One Way To Do It"? >> --------------------------------------------- >> >> That's not what the Zen says. The Zen says there should be One Obvious >> Way to do it, not Only One. It is my hope that we can agree that the One >> Obvious Way to refer to a Unicode character by its code point is by >> using the same notation that the Unicode Consortium uses: >> >> d <=> U+0064 >> >> and leave legacy escape sequences as the not-so-obvious ways to do it: >> >> \x64 \144 \u0064 \U00000064 >> >> >> Why do we need yet another way of writing escape sequences? >> ----------------------------------------------------------- >> >> We don't need another one, we need a better one. U+xxxx is the standard >> Unicode notation, while existing Python escapes have various problems. >> >> One-byte hex and oct escapes are a throwback to the old one-byte ASCII >> days, and reflect an obsolete idea of strings being equivalent to bytes. >> Backwards compatibility requires that we continue to support them, but >> they shouldn't be encouraged in strings. >> >> Two-byte \u escapes are harmless, so long as you imagine that Unicode is >> a 16-bit character set. Unfortunately, it is not. \u does not support >> code points in the Supplementary Multilingual Planes (those with ordinal >> value greater than 0xFFFF), and can silently give the wrong result if >> you make a mistake in counting digits: >> >> # I want EGYPTIAN HIEROGLYPH D010 (Eye of Horus) >> s = '\u13080' >> => oops, I get '?0' (ETHIOPIC SYLLABLE GA, ZERO) >> >> Four-byte \U escape sequences support the entire Unicode character set, >> but they are terribly verbose, and the first three digits are *always* >> zero. Python doesn't (and shouldn't) support \U escapes beyond 10FFFF, >> so the first three digits of the eight digit hex value are pointless. >> >> >> What is the U+ escape specification? >> ------------------------------------ >> >> http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals >> >> >> lists the escape sequences, including: >> >> \uxxxx Character with 16-bit hex value xxxx >> \Uxxxxxxxx Character with 32-bit hex value xxxxxxxx >> >> >> To this should be added: >> >> \U+xxxx Character at code point xxxx (hex) >> >> >> with the note: >> >> Exactly 4, 5 or 6 hexadecimal digits are required. >> >> >> Upper or lower case? >> -------------------- >> >> Uppercase should be preferred, as the Unicode Consortium uses it, but >> both should be accepted. >> >> >> Variable number of digits? Isn't that a bad thing? >> -------------------------------------------------- >> >> It's neither good nor bad. Octal escapes already support from 1 to 3 oct >> digits. In some languages (but not Python), hex escapes support from 1 >> to an unlimited number of hex digits. >> >> >> Is this backwards compatible? >> ----------------------------- >> >> I believe it is. As of Python 3.3, strings using \U+ give a syntax error: >> >> py> '\U+13080' >> File "", line 1 >> SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in >> position 0-7: end of string in escape sequence >> >> >> What deprecation schedule are you proposing? >> -------------------------------------------- >> >> I'm not. At least, the existing features should not be considered for >> removal before Python 4000. In the meantime, the U+ form should be noted >> as the preferred way, and perhaps blessed in PEP 8. >> >> >> Should string reprs use the U+ form? >> ------------------------------------ >> >> \u escapes are sometimes used in string reprs, e.g. for private-use >> characters: >> >> py> chr(0xE034) >> '\ue034' >> >> Should this change to '\U+E034'? My personal preference is that it >> should, but I fear backwards compatibility may prevent it. Even if the >> exact form of str.__repr__ is not guaranteed, changing the repr would >> break (e.g.) some doctests. >> >> This proposal defers any discussion of changing the repr of strings to >> use U+ escapes. > > What should 'U+12345' be? U+12345 CUNEIFORM SIGN URU TIMES KI or U+1234 ETHIOPIC SYLLABLE SEE and a digit 5? > > -1 without a clear way to disambiguate. We already have the exact same problem with octal literals. They can be one to three digits, ending at the first non-octal-digit character (or end of string). So '\123' is unambiguously 'S', while '\128' is unambiguously '\n8'. Not exactly beautiful, but simple, and a precedent going back to the earliest days of Python, and beyond it to C. So if we followed the same rule, '\U+12345' would unambiguously be character U+12345, while '\U+1234@' would be U+1234 and a @. That doesn't mean it's necessarily a good idea. After all, we don't allow 1-char hex escapes. And octal escapes are already pretty weird, in that they don't encode only characters up to 127 (as in C) or all of Unicode, but everything up to 511 (because that happens to be the max you can fit into the rules), so maybe they're not a great precedent to follow. > > Regards, > Ian > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From kwpolska at gmail.com Sat Jul 27 13:17:00 2013 From: kwpolska at gmail.com (=?UTF-8?B?Q2hyaXMg4oCcS3dwb2xza2HigJ0gV2Fycmljaw==?=) Date: Sat, 27 Jul 2013 13:17:00 +0200 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39A87.5030209@pearwood.info> References: <51F39A87.5030209@pearwood.info> Message-ID: On Sat, Jul 27, 2013 at 12:01 PM, Steven D'Aprano wrote: > Unicode's standard notation for code points is U+ followed by a 4, 5 or 6 > hex digit string, such as ? = U+03C0. This notation is found throughout the > Unicode Consortium's website, e.g.: > > http://www.unicode.org/versions/corrigendum2.html > > as well as in third party sites that have reason to discuss Unicode code > points, e.g.: > > https://en.wikipedia.org/wiki/Eth#Computer_input > > I propose that Python strings support this as the preferred escape notation > for Unicode code points: > > '\U+03C0' > => '?' > > The existing \U and \u variants must be kept for backwards compatibility, > but should be (mildly) discouraged in new code. As Marc-Andre Lemburg said, C, C++ and Java use the same notation as Python does. And there is NO programming language implementing the U+ syntax. Why should we? Why should we violate de-facto standards? Existing programming languages use one or more of: a) \uHHHH b) \UHHHHHHHH c) \u{H..HHHHHH} (eg. Ruby) c) \xH..HH d) \x{H..HHHHHH} e) \O..OOO and probably some more variants I am not aware of or forgot about, but there is probably no programming language that does \U+{H..HHHHHH}, so why should we? > Doesn't this violate "Only One Way To Do It"? > --------------------------------------------- > > That's not what the Zen says. The Zen says there should be One Obvious Way > to do it, not Only One. It is my hope that we can agree that the One Obvious > Way to refer to a Unicode character by its code point is by using the same > notation that the Unicode Consortium uses: > > d <=> U+0064 > > and leave legacy escape sequences as the not-so-obvious ways to do it: > > \x64 \144 \u0064 \U00000064 For a C, C++, Java or some other programmers, the ABOVE ways are the obvious ways to do it. \U+ definitely is not. Even something as basic as GNU echo uses the \u \U syntax. > Why do we need yet another way of writing escape sequences? > ----------------------------------------------------------- > > We don't need another one, we need a better one. U+xxxx is the standard > Unicode notation, while existing Python escapes have various problems. ?standard notation that NO programming language uses. In English, sure thing ? go for those fancy U+H..HHHHHH things, that?s what they are for. > One-byte hex and oct escapes are a throwback to the old one-byte ASCII days, > and reflect an obsolete idea of strings being equivalent to bytes. Backwards > compatibility requires that we continue to support them, but they shouldn't > be encouraged in strings. Py2k?s str or Py3k?s bytes still exist and are used. This is also where you would use \xHH or \OOO. > Two-byte \u escapes are harmless, so long as you imagine that Unicode is a > 16-bit character set. Unfortunately, it is not. \u does not support code > points in the Supplementary Multilingual Planes (those with ordinal value > greater than 0xFFFF), and can silently give the wrong result if you make a > mistake in counting digits: > > # I want EGYPTIAN HIEROGLYPH D010 (Eye of Horus) > s = '\u13080' > => oops, I get '?0' (ETHIOPIC SYLLABLE GA, ZERO) > > Four-byte \U escape sequences support the entire Unicode character set, but > they are terribly verbose, and the first three digits are *always* zero. > Python doesn't (and shouldn't) support \U escapes beyond 10FFFF, so the > first three digits of the eight digit hex value are pointless. Ruby handles this wonderfully with what I called syntax (c) above. So, maybe instead of this, let?s get working on \u{H..HHHHHH}? > [snip] > > Variable number of digits? Isn't that a bad thing? > -------------------------------------------------- > > It's neither good nor bad. Octal escapes already support from 1 to 3 oct > digits. In some languages (but not Python), hex escapes support from 1 to an > unlimited number of hex digits. This is bad, because of hex digits. Consider this: '\U+0002two' Would get us Start of Text (aka ^B), and the letters 't', 'w' and 'o'. And when we wanted to go with French, '\U+0002deux' We will find ourselves with MODIFIER LETTER RHOTIC HOOK, 'u' and 'x'. Uh-oh! (example above based on another one from Unicode mailing list archives.) > [snip] Overall, huge nonsense. If you care about some wasted zeroes, why not propose to steal Ruby?s syntax, denoted as (c) in this message? -- Chris ?Kwpolska? Warrick PGP: 5EAAEA16 stop html mail | always bottom-post | only UTF-8 makes sense From steve at pearwood.info Sat Jul 27 13:22:50 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Jul 2013 21:22:50 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39F80.4070108@feete.org> References: <51F39A87.5030209@pearwood.info> <51F39F80.4070108@feete.org> Message-ID: <51F3AD8A.2070401@pearwood.info> On 27/07/13 20:22, Ian Foote wrote: > On 27/07/13 11:01, Steven D'Aprano wrote: >> Variable number of digits? Isn't that a bad thing? >> -------------------------------------------------- >> >> It's neither good nor bad. Octal escapes already support from 1 to 3 oct >> digits. In some languages (but not Python), hex escapes support from 1 >> to an unlimited number of hex digits. > What should 'U+12345' be? U+12345 CUNEIFORM SIGN URU TIMES KI or U+1234 ETHIOPIC SYLLABLE SEE and a digit 5? There is no ambiguity. Just like oct escapes, the longest valid sequence (up to the maximum) would be used. If you used the shortest, then there would be no way to specify 5 or 6 digit sequences. -- Steven From steve at pearwood.info Sat Jul 27 13:25:12 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Jul 2013 21:25:12 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39BDA.8070108@egenix.com> References: <51F39A87.5030209@pearwood.info> <51F39BDA.8070108@egenix.com> Message-ID: <51F3AE18.90707@pearwood.info> On 27/07/13 20:07, M.-A. Lemburg wrote: > The \u and \U notations are standard in several programming > languages, e.g. Java and C++, so we're in good company. Given the problems with both \u and \U escapes, I think it is better to say we're in bad company. -- Steven From rosuav at gmail.com Sat Jul 27 14:37:58 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 27 Jul 2013 13:37:58 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F3AD8A.2070401@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F39F80.4070108@feete.org> <51F3AD8A.2070401@pearwood.info> Message-ID: On Sat, Jul 27, 2013 at 12:22 PM, Steven D'Aprano wrote: > On 27/07/13 20:22, Ian Foote wrote: >> >> On 27/07/13 11:01, Steven D'Aprano wrote: > > >>> Variable number of digits? Isn't that a bad thing? >>> -------------------------------------------------- >>> >>> It's neither good nor bad. Octal escapes already support from 1 to 3 oct >>> digits. In some languages (but not Python), hex escapes support from 1 >>> to an unlimited number of hex digits. > > >> What should 'U+12345' be? U+12345 CUNEIFORM SIGN URU TIMES KI or U+1234 >> ETHIOPIC SYLLABLE SEE and a digit 5? > > > > There is no ambiguity. Just like oct escapes, the longest valid sequence (up > to the maximum) would be used. If you used the shortest, then there would be > no way to specify 5 or 6 digit sequences. In a vacuum, \U+12345 seems like a good thing. But two issues dog it: incompatibility with *every other language*, and the inability to follow it with a hex digit. With octal escapes, there's a limit of three digits, so you can simply stuff in an extra zero or two: >>> "\1234" 'S4' >>> "\01234" '\n34' >>> "\001234" '\x01234' Granted, this isn't the case in all languages, but it's a reasonable convention to stick to. How many digits should be permitted in \U+ notation? Six? Eight? Will a quick eyeball of a string literal be able to figure out the correct interpretation of "\U+0012345678"? Also, this is a problem with a lot more characters than it is with octal, which unambiguously stops after any non-digit; in hex, there are two additional digits (8, 9) and twelve very common ASCII letters (A-F, a-f) which can cause problems. I foresee issues like with Windows paths in non-raw strings: >>> "c:\qwer" 'c:\\qwer' >>> "c:\asdf" 'c:\x07sdf' Some work, some don't. You'll put in a convenient four or five digit Unicode escape, follow it with a non-hex letter, and then later on come and edit and confuse yourself no end. I'm -1 on the proposal, primarily because it's different from everything else without being a significant improvement over them. On Sat, Jul 27, 2013 at 12:25 PM, Steven D'Aprano wrote: > On 27/07/13 20:07, M.-A. Lemburg wrote: > >> The \u and \U notations are standard in several programming >> languages, e.g. Java and C++, so we're in good company. > > > Given the problems with both \u and \U escapes, I think it is better to say > we're in bad company. Good or bad, it's a large company, and that *in itself* is of value. ChrisA From steve at pearwood.info Sat Jul 27 15:57:47 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Jul 2013 23:57:47 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51F39F80.4070108@feete.org> <51F3AD8A.2070401@pearwood.info> Message-ID: <51F3D1DB.9000602@pearwood.info> On 27/07/13 22:37, Chris Angelico wrote: > In a vacuum, \U+12345 seems like a good thing. But two issues dog it: > incompatibility with *every other language*, Every language is incompatible with every other language. That's why they are different languages. Some languages happen to share a few (or many) similarities, but they are dwarfed by the differences. And yet we manage. Do you really mean to suggest that a C programmer is capable of interpreting U+2345 when reading about code points on Wikipedia, but will be confused when reading '\U+2345' in Python code? Surely not. But if so, I suggest that Python's \x escapes will also confuse him, since Python's \x is incompatible with C's \x. (We even mention that difference in the docs.) As well as Python's significant indentation, duck typing, and, most of all, lack of braces. > and the inability to > follow it with a hex digit. With octal escapes, there's a limit of > three digits, so you can simply stuff in an extra zero or two: You would simply do the same as you already do for octal escapes: stuff in an extra zero or two: '\U+0003B82' => U+03B8 followed by 2 There's never any need to add more than two zeroes, since you can't use fewer than four or more than six digits in total. > How many digits should be permitted in \U+ > notation? Six? Eight? The Unicode standard uses exactly four, five or six hex digits for code points. The smallest code point is U+0000, and the largest is U+10FFFF. So: '\U+FFpq' will be a SyntaxError, just like '\uFFpq' today; '\U+FFFFFF' will be a SyntaxError, just like '\U00FFFFFF' today; '\U+00F2' will be unambiguously interpreted as a four digit hex escape; '\U+00FF2' will be unambiguously interpreted as a five digit hex escape; '\U+00FFF2' will be unambiguously interpreted as a six digit hex escape; '\U+00FFFF2' will be unambiguously interpreted as U+FFFF followed by 2. > Will a quick eyeball of a string literal be able > to figure out the correct interpretation of "\U+0012345678"? I don't think that the existing hex escapes pass the "quick eyeball" test: 'M\u00fcller' but your example above will be parsed as U+1234 followed by 5678. > Also, > this is a problem with a lot more characters than it is with octal, > which unambiguously stops after any non-digit; in hex, there are two > additional digits (8, 9) and twelve very common ASCII letters (A-F, > a-f) which can cause problems. I foresee issues like with Windows > paths in non-raw strings: > >>>> "c:\qwer" > 'c:\\qwer' >>>> "c:\asdf" > 'c:\x07sdf' > > Some work, some don't. You'll put in a convenient four or five digit > Unicode escape, follow it with a non-hex letter, and then later on > come and edit and confuse yourself no end. 'C:\Products\Umbrellas' has the same problem. This is an issue with Windows path names, not my proposal. You don't even need Unicode to be bitten by this issue, just a name starting with n, t, x, etc. -- Steven From rymg19 at gmail.com Sat Jul 27 17:05:12 2013 From: rymg19 at gmail.com (Ryan) Date: Sat, 27 Jul 2013 10:05:12 -0500 Subject: [Python-ideas] shlex escapes without Posix mode In-Reply-To: References: <51F1EE73.3010608@pearwood.info> <97715D1A-C9F6-4519-8CEB-E0CC4F409808@yahoo.com> <9273a76f-e842-4a70-b3ab-027073269090@email.android.com> Message-ID: <7cfefdfc-0cd0-4f5b-9af5-e4d9ef6fb6de@email.android.com> Sorry, I'm a little rusty at explaining things. I was using ksh as an example. What I'm really trying to parse is a simple whitespace-delimited language for writing C++ documentation. The basic thing is that Posix mode didn't work right, but I really need those escapes. Normal mode set to split only ay whitespace works great, but having the user write " for every quote is a little annoying. And it's a simple language; PLY and PyParsing are excess overkill. Also, I used split in the example, but I'm really using the shlex class itself. Back on the escapes, I still think shlex needs them. If you give me some time, I can write a modified shlex with support for that, but it might take a bit. I also have a more refined concept. Instead of being None, the escape parameter I used in my last example could be an empty string by default. Posix mode has priority over the parameter; if Posix mode is enabled, the parameter will be set to '\\'. If not, the parameter is left alone. That way, shlex wouldn't have to check for None. The feature is off if the string is empty. Andrew Barnert wrote: >On Jul 27, 2013, at 0:11, Ryan wrote: > >> ksh has parenthesis and shell functions. Still a shell language. > >And you have to use posix mode with it. Otherwise it'll get quotes >within words, empty strings, etc. wrong. > >Also, posix mode will handle the parentheses the same way as legacy >mode. The parsing rules don't have any differences with parens. Or Try >shlex('foo("bar")') with both modes, and call get_token repeatedly and >see. So parens aren't relevant, and you aren't trying to do anything >with ksh you wouldn't do the same way with sh, unless I'm >misunderstanding you. > >It sounds like what you want is the legacy internal-quote-stripping >mode with posix everything else? But you don't even really want that; >if you're parsing ksh, you need to handle internal quotes the same way >that sh does; you just don't want to consider parenthesize arguments >"internal". And turning off posix mode doesn't do that--it seems to do >the right thing in trivial cases, but not in general. > >More importantly, it sounds like you want to parse parens, which means >you really need to use a shlex instance manually rather than calling >split. For example: > >>>> s=shlex.shlex('foo("spam >eggs" bar)') >>>> list(iter(s.get_token, None)) >['foo', '(', 'spam eggs', 'bar', ')'] > >Those are the tokens you want, right? There's no way to get that with >split. > >> And shell languages pretty much always have escapes. >> >> Andrew Barnert wrote: >>> >>> Are you trying to use shlex to parse code for some language other >than sh or another shell language? It's not meant to be useful for perl >or C or whatever. >>> >>> A general-purpose quoting, escaping, splitting, and joining module >that could be configured to handle everything from sh to C to CSV could >be cool, but shlex isn't it. >>> >>> On Jul 26, 2013, at 18:43, Ryan wrote: >>> >>>> The main thing is that this: >>>> >>>> ("d") >>>> >>>> In Posix mode gets split into this: >>>> >>>> (d) >>>> >>>> But, say the language has callable functions. I'd have to >re-shlex.split the line to split the arguments. And, even then, the >quotes already got destroyed. >>>> >>>> Escapes, however, are useful in practically every language. >Restricting them to POSIX mode just kills it. And I had tried to see if >I could implement it myself, but reading source code on Android SL4A is >absolutely painful. And, whenever I pull up a computer, I always have a >goal in mind and haven't got a chance to tweak it. >>>> >>>> I've never quite come across a language without some form of >escapes. And, I can't think of an occasion where I'd use POSIX mode. >Therefore, in the end, it would end up being better if you could enable >the escapes individually. POSIX mode would have priority over the >escape option. The instance could.be created like this: >>>> >>>> lex = shlex.shlex(escape='\\') >>>> >>>> The default value would be None. That would change the value of >pex.escape to '\\'. If the value is None, escapes are disabled. >>>> >>>> Steven D'Aprano wrote: >>>>> >>>>> Hi Ryan, and welcome. >>>>> >>>>> >>>>> On 26/07/13 05:22, Ryan wrote: >>>>>> Note: This is my first post to the mailing list, so I'm not sure >if I'm doing something wrong or something. >>>>>> >>>>>> I've been playing around with shlex.lately, and I mostly like it, >but I have an idea. >>>>>> >>>>>> Have an option with the ability to enable certain Posix mode >features selectively, most particularly character escapes. It could be >something like, if Posix mode is disabled, the string of escape >characters is set to empty or None, and assigning a value to it enables >that feature in non-Posix mode. >>>>> >>>>> >>>>> That's a good start, but it's awfully vague. "Something like"? >Concrete ideas will help. Actual working code is best (although be >cautious about posting large >>>>> amounts of code here -- a few lines is fine, pages of code, not so >much), or at least pseudo-code demonstrating how and when somebody >might use this proposed feature. >>>>> >>>>> Good use-cases for why you might want the feature also helps. >Under what circumstances would you say "Well, I don't want POSIX mode, >but I do want POSIX escape sequences"? >>>>> >>>>> Ultimately, don't be surprised or disappointed at negative >reactions. Negative reactions are better than silence -- at least it >means that people have read, and care enough to comment, on your post, >while silence may mean that nobody cares, or simply don't understand >what you're talking about and are too polite to say so. >>>>> >>>>> We tend to be rather conservative about adding new features. >Sometimes it takes *years* for features to be added, or they are never >added, if nobody who cares about the feature steps up to program it. >Remember too that new code has to carry its weight: code not only has >one-off costs (code doesn't >>>>> write itself, neither does the documentation), but also on-going >costs (maintenance, bug-fixes, new features for users to learn, etc.), >and no matter how low that cost is, it is never zero, so if the benefit >from that feature is not more than the cost, it will probably be >rejected. >>>>> >>>>> Two good posts you should read, by one of the senior core >developers, are: >>>>> >>>>> >http://www.boredomandlaziness.org/2011/04/musings-on-culture-of-python-dev.html >>>>> >>>>> >http://www.boredomandlaziness.org/2011/02/status-quo-wins-stalemate.html >>>>> >>>>> >>>>> If you take nothing else from my reply, at least take from it >these two questions: >>>>> >>>>> "Under what circumstances would this feature be useful to you? And >would they be useful enough that you personally would program this >feature, if you had the >>>>> skills?" >>>> >>>> -- >>>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> http://mail.python.org/mailman/listinfo/python-ideas >> >> -- >> Sent from my Android phone with K-9 Mail. Please excuse my brevity. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sat Jul 27 17:28:14 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 27 Jul 2013 16:28:14 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> Message-ID: <51F3E70E.7030909@mrabarnett.plus.com> On 27/07/2013 12:17, Chris ?Kwpolska? Warrick wrote: > On Sat, Jul 27, 2013 at 12:01 PM, Steven D'Aprano wrote: >> Unicode's standard notation for code points is U+ followed by a 4, 5 or 6 >> hex digit string, such as ? = U+03C0. This notation is found throughout the >> Unicode Consortium's website, e.g.: >> >> http://www.unicode.org/versions/corrigendum2.html >> >> as well as in third party sites that have reason to discuss Unicode code >> points, e.g.: >> >> https://en.wikipedia.org/wiki/Eth#Computer_input >> >> I propose that Python strings support this as the preferred escape notation >> for Unicode code points: >> >> '\U+03C0' >> => '?' >> >> The existing \U and \u variants must be kept for backwards compatibility, >> but should be (mildly) discouraged in new code. > > As Marc-Andre Lemburg said, C, C++ and Java use the same notation as > Python does. > > And there is NO programming language implementing the U+ syntax. Why > should we? Why should we violate de-facto standards? > > Existing programming languages use one or more of: > > a) \uHHHH > b) \UHHHHHHHH > c) \u{H..HHHHHH} (eg. Ruby) > c) \xH..HH > d) \x{H..HHHHHH} > e) \O..OOO > > and probably some more variants I am not aware of or forgot about, but > there is probably no programming language that does \U+{H..HHHHHH}, so > why should we? > [snip] Perl supports "\N{U+1234}" and "\x{1234}", and I believe that some languages also support "\x{41 42 43}" as an abbreviation for "\x{41}\x{42}\x{43}". As others have said, "\U+1234" suffers from the same problem as octal escapes. -1 From steve at pearwood.info Sat Jul 27 17:47:57 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Jul 2013 01:47:57 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> Message-ID: <51F3EBAD.8050300@pearwood.info> On 27/07/13 21:17, Chris ?Kwpolska? Warrick wrote: > And there is NO programming language implementing the U+ syntax. That is incorrect. LispWorks and MIT Scheme both support #\U+ syntax: http://www.lispworks.com/documentation/lw50/LWUG/html/lwuser-352.htm http://web.mit.edu/scheme_v9.0.1/doc/mit-scheme-ref/External-Representation-of-Characters.html as does a project "CLforJava": https://groups.google.com/forum/#!topic/comp.lang.lisp/pUjKLYLgrVA (The leading # is Lisp syntax to create a character.) CSS supports U+ syntax for both individual characters and ranges: http://www.w3.org/TR/css3-fonts/#unicode-range-desc BitC does something similar to what I am suggesting: http://www.bitc-lang.org/docs/bitc/spec.html#stringlit There may be others I am unaware of. So if you're worried about Python breaking new ground by supporting the standard Unicode notation for code points, don't worry, others have done so first. > Why should we? Why should we violate de-facto standards? "The great thing about standards is there are so many to choose from." You listed six. Here are a few more: http://billposer.org/Software/ListOfRepresentations.html None of them are language-independent standards. There is only one language-independent standard for representing code points, and that is the U+xxxx standard used by the Unicode Consortium. There is a whole universe of Unicode discussion that makes no reference to C or Java escapes, but does reference U+xxxx code points. U+xxxx is the language-independent standard that *any* person familiar with Unicode should be able to understand, regardless of what programming language they use. We're not "violating" anything. Python doesn't support Ruby's \u{xxxxxx} escape, does that mean we're "violating" Ruby's de facto standard? Or are they violating ours? No to both of those, of course. Python is not Ruby, and nothing we do can violate Ruby's standard. Or C's, or Java's. [...] >> Doesn't this violate "Only One Way To Do It"? >> --------------------------------------------- >> >> That's not what the Zen says. The Zen says there should be One Obvious Way >> to do it, not Only One. It is my hope that we can agree that the One Obvious >> Way to refer to a Unicode character by its code point is by using the same >> notation that the Unicode Consortium uses: >> >> d <=> U+0064 >> >> and leave legacy escape sequences as the not-so-obvious ways to do it: >> >> \x64 \144 \u0064 \U00000064 > > For a C, C++, Java or some other programmers, the ABOVE ways are the > obvious ways to do it. \U+ definitely is not. Even something as > basic as GNU echo uses the \u \U syntax. If you want C, C++, Java, Pascal, Forth, ... you know where to get them. This is Python, not C or Java, and we're discussing what is right for the Python language, not for C or Java. (Java still treats Unicode as a 16-bit charset. It isn't.) While we can, and should, consider what other languages do, we should neither slavishly follow them into bad decisions, nor should we be scared to introduce features that they don't have. Whether my proposal is good or bad, it is what it is regardless of what other languages do. C programmers find braces obvious. If you're unaware of the Pythonic response to the argument "we should do what C does", try this: from __future__ import braces [...] >> One-byte hex and oct escapes are a throwback to the old one-byte ASCII days, >> and reflect an obsolete idea of strings being equivalent to bytes. Backwards >> compatibility requires that we continue to support them, but they shouldn't >> be encouraged in strings. > > Py2k?s str or Py3k?s bytes still exist and are used. This is also > where you would use \xHH or \OOO. This proposal has nothing to do with bytes nor Python 2. Python 2 is closed to new features. Unicode escapes are irrelevant to bytes. Your comment here is a red herring. >> Four-byte \U escape sequences support the entire Unicode character set, but >> they are terribly verbose, and the first three digits are *always* zero. >> Python doesn't (and shouldn't) support \U escapes beyond 10FFFF, so the >> first three digits of the eight digit hex value are pointless. Correction: I obviously can't count, it is only the first two digits that are always zero. > Ruby handles this wonderfully with what I called syntax (c) above. > So, maybe instead of this, let?s get working on \u{H..HHHHHH}? Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code points go up to hex 10FFFF, so an absolute maximum of six digits, not seven or more as you keep writing (four times, not that I'm counting :-) As for Ruby's syntax, by your own argument, it "violates the de facto standard" of C, C++, Java, and, yes, Python. Perhaps you would like to tell Matz that it's a terrible idea because it is violating Python's standard? But seriously, the biggest benefit I see from the Ruby syntax is you can write a sequence of code points: \u{00E0 00E9 00EE 00F5 00FC} => ????? but that's not my proposal. If somebody else wants to champion that, be my guest. >> [snip] >> >> Variable number of digits? Isn't that a bad thing? >> -------------------------------------------------- >> >> It's neither good nor bad. Octal escapes already support from 1 to 3 oct >> digits. In some languages (but not Python), hex escapes support from 1 to an >> unlimited number of hex digits. > > This is bad, because of hex digits. I have covered this objection in my reply to Chris Angelico. In short, you are no worse off than you already are if you use octal escapes. A U+ hex escape will, at most, need two extra leading zeroes to avoid running past the end, so to speak. Your example: > '\U+0002deux' could be written as '\U+000002deux'. (Or any of the existing ways of writing it would continue to work. Since U+0002 is an ASCII control character, I would not object to it being written as '\x02' or '\2'.) While I acknowledge the issue you raise, I don't think much of this example. Surely in nearly any real-world example there would be some sort of separator between the control character and the word? '\U+0002 deux' Yes, the issue of digits following octal or U+ escapes is a real issue, but it is not a common issue, and the solution is *exactly* the same in both cases: add one or two extra zeroes. >> [snip] > > Overall, huge nonsense. If you care about some wasted zeroes, why not > propose to steal Ruby?s syntax, denoted as (c) in this message? I don't merely care about wasted zeroes. I care about improving Python's Unicode model. What we call "characters" in Python actually are code points, and I believe we should support the standard notation for code points, even if we support other notation as well. If you go to the Unicode.org website, or Wikipedia, or any other site that actually understands Unicode, they invariably talk about code points and use the U+ notation. But in Python, we use our own notation that is *just slightly different*, for little or no good reason. ("C programmers use it" is not a good reason for a language which is not a variant of C.) While we must continue to support existing ways of escaping Unicode characters, I'd like to be able to tell people: "To enter a Unicode code point in a string, put a backslash in front of it." instead of telling them to count the number of hex digits, then either use \u or \U, and don't forget to pad it to eight digits if you use \U but not \u. Oh, and if you're tempted to copy and paste the code point from somewhere, you have to drop the U+ or it won't work. Unicode's notation is nice and simple. If we had it first, would we prefer \uxxxx and \U00xxxxxx over it? I don't think so. -- Steven From rosuav at gmail.com Sat Jul 27 18:03:22 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 27 Jul 2013 17:03:22 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F3EBAD.8050300@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> Message-ID: On Sat, Jul 27, 2013 at 4:47 PM, Steven D'Aprano wrote: > Unicode's notation is nice and simple. If we had it first, would we prefer > \uxxxx and \U00xxxxxx over it? I don't think so. Almost certainly not. Like I said, I think your idea is great *in a vacuum*. Obviously the removal of the current notations is out of the question, which means that this is yet another way to specify a codepoint; and it's one that most programmers won't be looking for. (I stand corrected, though: I had thought that there were *no* other languages using this notation. Of course, this is a silly thought. There is almost nothing that hasn't already been done, somewhere.) If Python had supported this notation from the beginning of Unicode strings, or at least since 3.0, then adding \uxxxx would have been purely as a sop to C/Java/etc programmers, and it would likely have gone nowhere. How much value is gained by creating a new syntax, which now Python programmers have to understand in addition to the existing ones? Consistency across languages is fairly important; have you ever used \123 notation in a BIND file? http://rosuav.blogspot.com/2012/12/i-want-my-octal.html Maybe Python will start a new trend, and \U+1234 will become the new convention. Maybe that's a good thing. But how beneficial will it be, and how complicating? I'm weakening my stance to -0. ChrisA From stephen at xemacs.org Sat Jul 27 18:46:28 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 28 Jul 2013 01:46:28 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F39A87.5030209@pearwood.info> References: <51F39A87.5030209@pearwood.info> Message-ID: <8761vwosrf.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I propose that Python strings support this as the preferred escape notation for Unicode code points: > > '\U+03C0' > => '?' -1. Because: > The existing \U and \u variants must be kept for backwards > compatibility, but should be (mildly) discouraged in new code. OTOH, supporting "\N{U+03C0}" seems harmless, if not particularly useful, to me. However, I don't find it hard to imagine that some people would use it in preference to the \U and \u escpes, despite being somewhat verbose. From python at mrabarnett.plus.com Sat Jul 27 19:06:08 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 27 Jul 2013 18:06:08 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <8761vwosrf.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51F39A87.5030209@pearwood.info> <8761vwosrf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51F3FE00.9080904@mrabarnett.plus.com> On 27/07/2013 17:46, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I propose that Python strings support this as the preferred escape notation for Unicode code points: > > > > '\U+03C0' > > => '?' > > -1. Because: > > > The existing \U and \u variants must be kept for backwards > > compatibility, but should be (mildly) discouraged in new code. > > OTOH, supporting "\N{U+03C0}" seems harmless, if not particularly useful, > to me. However, I don't find it hard to imagine that some people > would use it in preference to the \U and \u escpes, despite being > somewhat verbose. > I think the point of "\N{U+03C0}" is that it lets you name all of the codepoints, even those that are as yet unnamed. :-) From joshua at landau.ws Sat Jul 27 22:39:13 2013 From: joshua at landau.ws (Joshua Landau) Date: Sat, 27 Jul 2013 21:39:13 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <8761vwosrf.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51F39A87.5030209@pearwood.info> <8761vwosrf.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 27 July 2013 17:46, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I propose that Python strings support this as the preferred escape > notation for Unicode code points: > > > > '\U+03C0' > > => '?' > > -1. Because: > > > The existing \U and \u variants must be kept for backwards > > compatibility, but should be (mildly) discouraged in new code. > > OTOH, supporting "\N{U+03C0}" seems harmless, if not particularly useful, > to me. However, I don't find it hard to imagine that some people > would use it in preference to the \U and \u escpes, despite being > somewhat verbose. As a quick guess, I would. I don't like counting. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Jul 27 23:58:44 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 27 Jul 2013 17:58:44 -0400 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F3AD8A.2070401@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F39F80.4070108@feete.org> <51F3AD8A.2070401@pearwood.info> Message-ID: On 7/27/2013 7:22 AM, Steven D'Aprano wrote: > On 27/07/13 20:22, Ian Foote wrote: >> On 27/07/13 11:01, Steven D'Aprano wrote: > >>> Variable number of digits? Isn't that a bad thing? >>> -------------------------------------------------- >>> >>> It's neither good nor bad. It is wretched. In the unicode standard, the U+ notation is used for single codepoints and as near as I can tell from checking a few chapters, always has a trailing delimiter (space or punctuation). This is true even for successive codepoints. For example: "katakana letter ainu to can simply be mapped to the Unicode character sequence ". Note that the authors did not simple write "U+30C8U+309A" as in this proposal. In other words, the proposal does not conform to the usage of the notation in the standard. In tables, the 'U+' is omitted. Sequential codepoints are separated by spaces for readability. For instance, '0069 0307 0301' in one table stands for the single grapheme 'i??' (Lithuanian char) == '\u0069\u0307\u0301' Even though a computer could parse 'U+0069U+0307U+0301' correctly, most humans eyes will see '+' as the separator. I find this more painful to read than the '\' form. >>> Octal escapes already support from 1 to 3 oct digits. And there are awful to use in string literals, as opposed to numbers. >>> In some languages (but not Python), hex escapes support from 1 >>> to an unlimited number of hex digits. That is fine for numbers. For strings, 2*n hex digits often (typically?) means n bytes. >> What should 'U+12345' be? U+12345 CUNEIFORM SIGN URU TIMES KI or >> U+1234 ETHIOPIC SYLLABLE SEE and a digit 5? > There is no ambiguity. But there is a problem. What if a persons (an Ethiopian?) *wants* to write U+1234 ETHIOPIC SYLLABLE SEE and a digit 5 as a 2 character identifier? You really expect someone to tranlate '5' into 'U+00xx'? > Just like oct escapes, the longest valid sequence > (up to the maximum) would be used. If you used the shortest, then there > would be no way to specify 5 or 6 digit sequences. As I said above, there is no ambiguity in the standard because they do not jam codepoints (with or without 'U+') together without non-alphanumeric delimiters. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Sun Jul 28 01:14:50 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 28 Jul 2013 11:14:50 +1200 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F3EBAD.8050300@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> Message-ID: <51F4546A.2070808@canterbury.ac.nz> Steven D'Aprano wrote: > Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code > points go up to hex 10FFFF, They do *now*, but we can't be sure that they will stay that way in the future. This isn't a problem for the U+XXXX notation in informal usage, since it's usually written with surrounding whitespace or punctuation that makes it clear where the digits end. But the \U+XXXX syntax as currently proposed would bake in an absolute 6-digit limit that's impossible to ever extend. > I'd like to be able to tell people: > > "To enter a Unicode code point in a string, put a backslash in front of > it." > > instead of telling them to count the number of hex digits, But they're *still* going to have to count hex digits, and pad to 6 if it happens to be followed by a problematic character. If we're going to introduce something new, we might as well design it not to have silly, awkward properties like that. The Ruby \U{...} syntax has the following advantages: * Very clear, not prone to editing errors * No fixed limit on number of digits * Extends easily to multiple code points * Can optionally accept U+ for those who like that * Precedent exists in at least one other language Or we could invent something of our own, such as using another backslash as a delimiter: \U+1234\ Multiple characters could be written as: \U+1234+5678+9abc\ -- Greg From rosuav at gmail.com Sun Jul 28 01:18:12 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 28 Jul 2013 00:18:12 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F4546A.2070808@canterbury.ac.nz> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> Message-ID: On Sun, Jul 28, 2013 at 12:14 AM, Greg Ewing wrote: > Steven D'Aprano wrote: >> >> Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code >> points go up to hex 10FFFF, > > They do *now*, but we can't be sure that they will stay that > way in the future. They will for as long as UTF-16 is supported. Really, it would have been better all round if UTF-16 had never existed, and everyone just had to switch up to UTF-32; sure, memory would have been wasted, but concepts like PEP 393 would have been devised to deal with that, and we wouldn't have stupid bugs in 99% of programming languages. ChrisA From abarnert at yahoo.com Sun Jul 28 02:30:25 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 28 Jul 2013 02:30:25 +0200 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> Message-ID: On Jul 28, 2013, at 1:18, Chris Angelico wrote: > On Sun, Jul 28, 2013 at 12:14 AM, Greg Ewing > wrote: >> Steven D'Aprano wrote: >>> >>> Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code >>> points go up to hex 10FFFF, >> >> They do *now*, but we can't be sure that they will stay that >> way in the future. > > They will for as long as UTF-16 is supported. Really, it would have > been better all round if UTF-16 had never existed, and everyone just > had to switch up to UTF-32; sure, memory would have been wasted, but > concepts like PEP 393 would have been devised to deal with that, and > we wouldn't have stupid bugs in 99% of programming languages. UTF-16 wouldn't have been a problem if it weren't almost compatible with UCS2, allowing all kinds of Unicode 1.0 software to misleadingly claim Unicode 2.0 support. (For example, for a long time, both Windows and Java "supported" UTF-16 by treating surrogate pairs as two characters instead of one, which is like "supporting" UTF-8 by treating it like ASCII--except that the bugs are much less likely to hit developers early in the cycle.) There are use cases for which UTF-16 is perfectly reasonable. For example, strings with lots of BMP CJK characters and an occasional non-BMP character aren't helped by PEP 393, or by UTF-8, but they are helped by UTF-16. (So long as you can rely on software not treating it as UCS2?) But anyway, this is pretty far off topic. Unicode could go past 10FFFF without dropping UTF-16, either by adding more surrogate pair ranges, or by adding surrogate triplets. It's really no different from extending UTF-8, which is no problem. The problem is that we have no way to predict how they will extend UTF-16, UTF-8, or code point notation if that ever happens. Assuming that the max length for a code point is six nibbles does sound like assuming nobody will ever need more than 640k characters. From ncoghlan at gmail.com Sun Jul 28 02:57:50 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Jul 2013 10:57:50 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> Message-ID: On 28 Jul 2013 10:34, "Andrew Barnert" wrote: > > On Jul 28, 2013, at 1:18, Chris Angelico wrote: > > > On Sun, Jul 28, 2013 at 12:14 AM, Greg Ewing > > wrote: > >> Steven D'Aprano wrote: > >>> > >>> Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code > >>> points go up to hex 10FFFF, > >> > >> They do *now*, but we can't be sure that they will stay that > >> way in the future. > > > > They will for as long as UTF-16 is supported. Really, it would have > > been better all round if UTF-16 had never existed, and everyone just > > had to switch up to UTF-32; sure, memory would have been wasted, but > > concepts like PEP 393 would have been devised to deal with that, and > > we wouldn't have stupid bugs in 99% of programming languages. > > UTF-16 wouldn't have been a problem if it weren't almost compatible with UCS2, allowing all kinds of Unicode 1.0 software to misleadingly claim Unicode 2.0 support. (For example, for a long time, both Windows and Java "supported" UTF-16 by treating surrogate pairs as two characters instead of one, which is like "supporting" UTF-8 by treating it like ASCII--except that the bugs are much less likely to hit developers early in the cycle.) There are use cases for which UTF-16 is perfectly reasonable. For example, strings with lots of BMP CJK characters and an occasional non-BMP character aren't helped by PEP 393, or by UTF-8, but they are helped by UTF-16. (So long as you can rely on software not treating it as UCS2?) But anyway, this is pretty far off topic. > > Unicode could go past 10FFFF without dropping UTF-16, either by adding more surrogate pair ranges, or by adding surrogate triplets. It's really no different from extending UTF-8, which is no problem. > > The problem is that we have no way to predict how they will extend UTF-16, UTF-8, or code point notation if that ever happens. Assuming that the max length for a code point is six nibbles does sound like assuming nobody will ever need more than 640k characters. The idea of enhancing name based lookup by accepting the "U+" prefix as specifying a code point sounds good to me. It's already a delimited notation, doesn't require a new escape and, as someone else pointed out, allows \N to be used consistently, even if a code point doesn't have a name yet. Cheers, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jul 28 05:43:39 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Jul 2013 13:43:39 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F4546A.2070808@canterbury.ac.nz> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> Message-ID: <51F4936B.8020500@pearwood.info> On 28/07/13 09:14, Greg Ewing wrote: > Steven D'Aprano wrote: >> Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code points go up to hex 10FFFF, > > They do *now*, but we can't be sure that they will stay that > way in the future. Yes we can. The Unicode Consortium have guaranteed that Unicode will never be extended past code point U+10FFFF. I quote: Q: Will UTF-16 ever be extended to more than a million characters? A: No. Both Unicode and ISO 10646 have policies in place that formally limit future code assignment to the integer range that can be expressed with current UTF-16 (0 to 1,114,111). http://www.unicode.org/faq/utf_bom.html#utf16-6 Supporting some hypothetical "Super-hyper-mega-Code" in 2035 will be as big a change as adding Unicode in the first place. It will probably require a PEP :-) [...] >> I'd like to be able to tell people: >> >> "To enter a Unicode code point in a string, put a backslash in front of it." >> >> instead of telling them to count the number of hex digits, > > But they're *still* going to have to count hex digits, and pad > to 6 if it happens to be followed by a problematic character. Most uses of hex escapes aren't followed by another hex digit: there are in excess of a million Unicode code points, and less than 50 are hex digits (less than 30 if you exclude East-Asian full-width forms). To return to the example that keeps being given, if you're writing Ethiopian text, I don't think it is actually very likely that you will want to follow ETHIOPIC SYLLABLE SEE by a Latin digit 5 with no separator between them. Yes, it "might" happen, but there are trivial ways to deal that, in no particular order: - pad the code point to six digits - don't use \U+, use a fixed-width \u or \U escape - use string concatenation '\U+1234' '5' - use string substitutions (% or format or $ templates). > If we're going to introduce something new, we might as well > design it not to have silly, awkward properties like that. > > The Ruby \U{...} syntax has the following advantages: > > * Very clear, not prone to editing errors > * No fixed limit on number of digits > * Extends easily to multiple code points > * Can optionally accept U+ for those who like that > * Precedent exists in at least one other language As I said earlier, if someone wants to champion that idea, I won't object. > Or we could invent something of our own, such as using another > backslash as a delimiter: > > \U+1234\ > > Multiple characters could be written as: > > \U+1234+5678+9abc\ > Another suggestion which was made is: \N{U+xxxx} (Sorry, I have forgotten who made that suggestion originally.) That could be extended to allow multiple space-separated code points: \N{U+xxxx U+yyyy U+zzzzz} or \N{U+xxxx yyyy zzzzz} -- Steven From steve at pearwood.info Sun Jul 28 05:57:11 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Jul 2013 13:57:11 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> Message-ID: <51F49697.1020305@pearwood.info> On 28/07/13 10:30, Andrew Barnert wrote: > Unicode could go past 10FFFF without dropping UTF-16, either by adding more surrogate pair ranges, or by adding surrogate triplets. It's really no different from extending UTF-8, which is no problem. > > The problem is that we have no way to predict how they will extend UTF-16, UTF-8, or code point notation if that ever happens. Assuming that the max length for a code point is six nibbles does sound like assuming nobody will ever need more than 640k characters. The Unicode Consortium formally guarantees stability of the character range U+0000 - U+10FFFF. http://www.unicode.org/faq/utf_bom.html#utf16-6 -- Steven From stefan_ml at behnel.de Sun Jul 28 08:21:49 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 28 Jul 2013 08:21:49 +0200 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F4936B.8020500@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> Message-ID: Steven D'Aprano, 28.07.2013 05:43: > Another suggestion which was made is: > > \N{U+xxxx} +1 > That could be extended to allow multiple space-separated code points: > > \N{U+xxxx U+yyyy U+zzzzz} > > or > > \N{U+xxxx yyyy zzzzz} If I were up for bike shedding, I'd suggest to rather use comma separated code point values here. I don't think I have a preference regarding the repetition of the "U+" prefix (it looks less clear without it and feels redundant if you require it), but thinking of the cases where a sequence of two or more code points combines into one character makes it seem like a useful thing to support in general. Stefan From rosuav at gmail.com Sun Jul 28 08:59:26 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 28 Jul 2013 07:59:26 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F49697.1020305@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F49697.1020305@pearwood.info> Message-ID: On Sun, Jul 28, 2013 at 4:57 AM, Steven D'Aprano wrote: > On 28/07/13 10:30, Andrew Barnert wrote: > >> Unicode could go past 10FFFF without dropping UTF-16, either by adding >> more surrogate pair ranges, or by adding surrogate triplets. It's really no >> different from extending UTF-8, which is no problem. >> >> The problem is that we have no way to predict how they will extend UTF-16, >> UTF-8, or code point notation if that ever happens. Assuming that the max >> length for a code point is six nibbles does sound like assuming nobody will >> ever need more than 640k characters. > > > The Unicode Consortium formally guarantees stability of the character range > U+0000 - U+10FFFF. > > http://www.unicode.org/faq/utf_bom.html#utf16-6 And to add to this: Surrogate triplets would majorly break one of the fundamentals of UTF-16, namely that it guarantees synchronizability. You can look at any 16-bit code unit and know whether it's a lead or trail surrogate. (Obviously if you write to a file or other byte stream, you have to have some out-of-band way to synchronize on bytes, that's separate.) So there's unlikely ever to be a scheme that extends UTF-16 to more characters. UTF-8 can in theory handle longer codes (and some encoders can simply use the same mathematical technique to encode numbers larger than 10FFFF, as we've already seen). The only way would be to declare UTF-16 as a flawed system, just as UCS-2 is. It's a system that can encode only the first planes of Unicode. I doubt it'll ever happen, though, as there's no need for more space. ChrisA From stephen at xemacs.org Sun Jul 28 09:24:56 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 28 Jul 2013 16:24:56 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F4546A.2070808@canterbury.ac.nz> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> Message-ID: <8738qzp2nr.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Steven D'Aprano wrote: > > Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code > > points go up to hex 10FFFF, > > They do *now*, but we can't be sure that they will stay that > way in the future. In Unicode, they will. Blood was shed over the issue in the ISO 10646 committees before the standards could be unified. Huge amounts of software validate UTF-8 and UTF-16 including staying within the range, and won't easily be converted to accept extended ranges. So Unicode and ISO 10646 will stay within the current 17 pages. To go beyond that they'll need a new standard. In any case, it seems really unlikely that more than 1,000,000 code points will ever be needed, unless there's a mutation that makes all of *us* obsolete. > The Ruby \U{...} syntax has the following advantages: So does the \N{U+XXXX} proposal, and it has the further advantage of indicating the obvious semantics as a name for this character/code point, which is consistent with the actual usage of the U+XXXX syntax in the standard. From stephen at xemacs.org Sun Jul 28 09:41:45 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 28 Jul 2013 16:41:45 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F4936B.8020500@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> Message-ID: <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > if you're writing Ethiopian text, I don't think it is actually very > likely that you will want to follow ETHIOPIC SYLLABLE SEE by a > Latin digit 5 with no separator between them. Yes, it "might" > happen, If you're writing Ethiopic text, I doubt you'll be using escape sequences to denote Ethiopic characters in the first place. I think it's hard to predict how these sequences are going to be used in the future. What I would worry about it not whether writers would "want" to use such sequences, but whether they'll bother to clean them up if they occur in the first place. The writer knows what she wants; it's the reader who has to parse the resulting mess. > (Sorry, I have forgotten who made that suggestion originally.) That > could be extended to allow multiple space-separated code points: > > \N{U+xxxx U+yyyy U+zzzzz} > > or > > \N{U+xxxx yyyy zzzzz} This is a modal encoding, which has proved to be a really bad idea in its past incarnations. I hope that extension is never added to Python. From kwpolska at gmail.com Sun Jul 28 10:06:32 2013 From: kwpolska at gmail.com (=?UTF-8?B?Q2hyaXMg4oCcS3dwb2xza2HigJ0gV2Fycmljaw==?=) Date: Sun, 28 Jul 2013 10:06:32 +0200 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F4546A.2070808@canterbury.ac.nz> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> Message-ID: A bit of clarification: On Sat, Jul 27, 2013 at 5:47 PM, Steven D'Aprano wrote: > Aside: you keep writing H..HHHHHH for Unicode code points. Unicode code > points go up to hex 10FFFF, so an absolute maximum of six digits, not seven > or more as you keep writing (four times, not that I'm counting :-) My fancy syntax meant ?up to six hex digits?. And 10FFFF is six digits long. ~~~ On Sun, Jul 28, 2013 at 1:14 AM, Greg Ewing wrote: > The Ruby \U{...} syntax has the following advantages: It?s \u{}. "\U{}" results in "U{}", i.e. does not work. > * No fixed limit on number of digits Are we still speaking of the Ruby implementation? irb(main):002:0> "\u{1234567}" SyntaxError: (irb):2: invalid Unicode codepoint (too large) "\u{1234567}" ^ from /usr/bin/irb:12:in `
' -- Chris ?Kwpolska? Warrick PGP: 5EAAEA16 stop html mail | always bottom-post | only UTF-8 makes sense From steve at pearwood.info Sun Jul 28 10:21:13 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 28 Jul 2013 18:21:13 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51F4D479.2050000@pearwood.info> On 28/07/13 17:41, Stephen J. Turnbull wrote: > > (Sorry, I have forgotten who made that suggestion originally.) That > > could be extended to allow multiple space-separated code points: > > > > \N{U+xxxx U+yyyy U+zzzzz} > > > > or > > > > \N{U+xxxx yyyy zzzzz} > > This is a modal encoding, which has proved to be a really bad idea in > its past incarnations. I hope that extension is never added to > Python. Could you elaborate please? What do you mean "modal encoding", and what past incarnations are you referring to? -- Steven From stephen at xemacs.org Sun Jul 28 11:05:00 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 28 Jul 2013 18:05:00 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F4D479.2050000@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> <51F4D479.2050000@pearwood.info> Message-ID: <87zjt7njgj.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > On 28/07/13 17:41, Stephen J. Turnbull wrote: > > > (Sorry, I have forgotten who made that suggestion originally.) That > > > could be extended to allow multiple space-separated code points: > > > > > > \N{U+xxxx U+yyyy U+zzzzz} > > > > > > or > > > > > > \N{U+xxxx yyyy zzzzz} > > > > This is a modal encoding, which has proved to be a really bad idea in > > its past incarnations. I hope that extension is never added to > > Python. > > Could you elaborate please? What do you mean "modal encoding", and > what past incarnations are you referring to? A "modal encoding" is one in which the same combination of code units (here, ASCII characters) is interpreted differently depending on arbitrarily distant context. One only has to look at certain web pages or mail messages to see similar encodings (SGML numeric character entities, quoted-printable encoding of text using non-Latin character sets) abused to represent many lines of text. In such (ab)uses, it's very easy to corrupt the whole stream accidentally by losing one of the braces or by interpolating text encoded differently. Sure, it's easy for humans to recognize what's going on, and recover, when they encounter corrupted text interactively, but this is obviously not a convention that's intended for interactive human use! The main past incarnation is the ISO 2022 family. I see no advantage in "readability" of "\N{U+xxxx U+yyyy U+zzzzz}" or "\N{U+xxxx yyyy zzzzz}" over "\N{U+xxxx}\N{U+yyyy}\N{U+zzzzz}", and very little space savings. Worst, it violates the basic understanding that "\N{...}" is the name of one character or code point. From ncoghlan at gmail.com Sun Jul 28 11:02:26 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Jul 2013 19:02:26 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F4D479.2050000@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> <51F4D479.2050000@pearwood.info> Message-ID: On 28 July 2013 18:21, Steven D'Aprano wrote: > On 28/07/13 17:41, Stephen J. Turnbull wrote: >> >> > (Sorry, I have forgotten who made that suggestion originally.) That >> > could be extended to allow multiple space-separated code points: >> > >> > \N{U+xxxx U+yyyy U+zzzzz} >> > >> > or >> > >> > \N{U+xxxx yyyy zzzzz} >> >> This is a modal encoding, which has proved to be a really bad idea in >> its past incarnations. I hope that extension is never added to >> Python. > > > Could you elaborate please? What do you mean "modal encoding", and what past > incarnations are you referring to? I believe what Stephen means is that it changes the \N{} notation from a relatively straightforward key lookup (where everything inside the "{}" refers to a single code point), to a two level parser, where the contents of the "{}" need to be further parsed to see if they refer to one code point or many. It doesn't bother me that much personally, especially if it was a general comma delimited capability that also worked with other code point names, but my inclination is to call YAGNI on the additional complexity. Using "modal encoding" to refer to that change isn't really valid though - Python string syntax is already modal, since "\N{" switches modes to "any characters until the next '}' are part of a code point name rather than part of the string contents", and similar statements can be made about the other escape sequences (especially the other Unicode related ones). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jul 28 11:47:08 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Jul 2013 19:47:08 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <87zjt7njgj.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> <51F4D479.2050000@pearwood.info> <87zjt7njgj.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 28 July 2013 19:05, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > On 28/07/13 17:41, Stephen J. Turnbull wrote: > > > > (Sorry, I have forgotten who made that suggestion originally.) That > > > > could be extended to allow multiple space-separated code points: > > > > > > > > \N{U+xxxx U+yyyy U+zzzzz} > > > > > > > > or > > > > > > > > \N{U+xxxx yyyy zzzzz} > > > > > > This is a modal encoding, which has proved to be a really bad idea in > > > its past incarnations. I hope that extension is never added to > > > Python. > > > > Could you elaborate please? What do you mean "modal encoding", and > > what past incarnations are you referring to? > > A "modal encoding" is one in which the same combination of code units > (here, ASCII characters) is interpreted differently depending on > arbitrarily distant context. Ah, I had missed the "arbitrarily distant" sense you intended for modal encoding. Agreed, the fact that unicode escapes (including \N{}) are limited in length to a single code point is a definite win in that regard. Cheers, Nick. P.S. It occurs to me that the str.format mini-language has no such limitation, though: >> def hexchr(x): ... return chr(int(x, 16)) ... >>> def hex2str(s): ... return "".join(hexchr(x) for x in s.split()) ... >>> class chrformat: ... def __format__(self, fmt): ... return hex2str(fmt) ... >>> "{:40 60 1234 e9}".format(chrformat()) '@`??' -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Sun Jul 28 14:00:39 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 28 Jul 2013 21:00:39 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> <51F4D479.2050000@pearwood.info> Message-ID: <87y58qopw8.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > It doesn't bother me that much personally, especially if it was a > general comma delimited capability that also worked with other code > point names, I think it should bother you, though. It's not a problem for Python core developers, it's true. Similarly, ISO 2022 was a great idea in theory, and works fine for communication of text over streams. The problem is when you want to embed that stream in some higher-level protocol. So, for example, the original space-separated syntax breaks one-argument split-string, while your comma-separated version breaks CSV. You could fix both of those by using no separator and simply finishing the current code point on encountering "U+" or "}", but I doubt anybody would find that variant appealing. Now, for program literals this isn't going to matter because a string will be converted to internal representation by the compiler, and the program never sees that syntax. But what about applications like web frameworks which often eval client-supplied strings? I hope we are not going to recommend they eval them before validating them! > but my inclination is to call YAGNI on the additional complexity. "Using 'complexity' to refer to this syntax isn't really valid though - what it is, is 'complicated'." > Using "modal encoding" to refer to that change isn't really valid > though No, it's quite correct, at least in ISO-land. There, a modal encoding is one which must maintain state across *code points*. The single- code-point "\N" syntax needs to maintain state across *code units*, but when it's done with a code *point*, it's done - there's no state to worry about before starting to parse the next one. By your definition, UTF-8 is modal, but that doesn't seem a very useful categorization to me. From ncoghlan at gmail.com Sun Jul 28 15:06:18 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Jul 2013 23:06:18 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <87y58qopw8.fsf@uwakimon.sk.tsukuba.ac.jp> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> <51F4D479.2050000@pearwood.info> <87y58qopw8.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 28 July 2013 22:00, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > Using "modal encoding" to refer to that change isn't really valid > > though > > No, it's quite correct, at least in ISO-land. There, a modal encoding > is one which must maintain state across *code points*. The single- > code-point "\N" syntax needs to maintain state across *code units*, > but when it's done with a code *point*, it's done - there's no state > to worry about before starting to parse the next one. By your > definition, UTF-8 is modal, but that doesn't seem a very useful > categorization to me. My bytes-oriented comms background is showing ;) I agree, preserving the property that "one escape sequence = one code point" is valuable, so the proposal should just be to make this resolve to the right value: "\N{U+}" It would also be more consistent if unicodedata.lookup() was updated to handle numeric code point names. Something like: >>> import unicodedata >>> def enhanced_lookup(name): ... if name.startswith("U+"): ... return chr(int(name[2:], 16)) ... return unicodedata.lookup(name) ... >>> enhanced_lookup("GREEK SMALL LETTER ALPHA") '?' >>> enhanced_lookup("U+03B1") '?' Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Sun Jul 28 19:29:45 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 29 Jul 2013 03:29:45 +1000 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> <51F4D479.2050000@pearwood.info> <87y58qopw8.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51F55509.2030304@pearwood.info> On 28/07/13 23:06, Nick Coghlan wrote: > It would also be more consistent if unicodedata.lookup() was updated > to handle numeric code point names. Something like: > >>>> import unicodedata >>>> def enhanced_lookup(name): > ... if name.startswith("U+"): > ... return chr(int(name[2:], 16)) > ... return unicodedata.lookup(name) > ... >>>> enhanced_lookup("GREEK SMALL LETTER ALPHA") > '?' >>>> enhanced_lookup("U+03B1") > '?' Earlier, MRAB suggested that unicodedata.name() could return the U+ code point in the case of unnamed characters. I think it would be better to have a separate unicodedata function to return the code point, and leave the current behaviour of name() alone. def codepoint(c): return 'U+{:04X}'.format(ord(c)) This should always succeed for any character. -- Steven From python at mrabarnett.plus.com Sun Jul 28 20:07:21 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 28 Jul 2013 19:07:21 +0100 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F55509.2030304@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> <51F4D479.2050000@pearwood.info> <87y58qopw8.fsf@uwakimon.sk.tsukuba.ac.jp> <51F55509.2030304@pearwood.info> Message-ID: <51F55DD9.9070006@mrabarnett.plus.com> On 28/07/2013 18:29, Steven D'Aprano wrote: > On 28/07/13 23:06, Nick Coghlan wrote: > >> It would also be more consistent if unicodedata.lookup() was updated >> to handle numeric code point names. Something like: >> >>>>> import unicodedata >>>>> def enhanced_lookup(name): >> ... if name.startswith("U+"): >> ... return chr(int(name[2:], 16)) >> ... return unicodedata.lookup(name) >> ... >>>>> enhanced_lookup("GREEK SMALL LETTER ALPHA") >> '?' >>>>> enhanced_lookup("U+03B1") >> '?' > > > Earlier, MRAB suggested that unicodedata.name() could return the U+ code point in the case of unnamed characters. What I said was: """I think the point of "\N{U+03C0}" is that it lets you name all of the codepoints, even those that are as yet unnamed.""" Whether unicodedata.name() could have a fallback is something I've never considered. Until now... :-) > I think it would be better to have a separate unicodedata function to return the code point, and leave the current behaviour of name() alone. > > def codepoint(c): > return 'U+{:04X}'.format(ord(c)) > > This should always succeed for any character. > From stephen at xemacs.org Sun Jul 28 20:46:54 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 29 Jul 2013 03:46:54 +0900 Subject: [Python-ideas] Support Unicode code point notation In-Reply-To: <51F55509.2030304@pearwood.info> References: <51F39A87.5030209@pearwood.info> <51F3EBAD.8050300@pearwood.info> <51F4546A.2070808@canterbury.ac.nz> <51F4936B.8020500@pearwood.info> <871u6jp1vq.fsf@uwakimon.sk.tsukuba.ac.jp> <51F4D479.2050000@pearwood.info> <87y58qopw8.fsf@uwakimon.sk.tsukuba.ac.jp> <51F55509.2030304@pearwood.info> Message-ID: <87txjeo735.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Earlier, MRAB suggested that unicodedata.name() could return the U+ > code point in the case of unnamed characters. I think it would be > better to have a separate unicodedata function to return the code > point, and leave the current behaviour of name() alone. His point, and I agree, is that it's not useful to have name() error, as it does for unicodedata.name(chr(65535)). In that case I would prefer that it return "U+FFFF NOT A CHARACTER" or something like that. And for chr(65535*2) it would return "U+1FFFE UNASSIGNED IN VERSION ". Similarly for unassigned private use area code points and surrogates (with their blocks being mentioned). It would be nice if assigned private use area code points could have names added to the database. If a private use character wasn't named, it could have its name algorithmically determined as "U+XXXX PRIVATE USE: UNNAMED". > def codepoint(c): > return 'U+{:04X}'.format(ord(c)) > > This should always succeed for any character. Or code point: it will succeed for things that aren't characters, such as chr(65535). As one-liners go, this does seem a reasonable candidate for the stdlib. Steve From techtonik at gmail.com Tue Jul 30 16:58:57 2013 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 30 Jul 2013 17:58:57 +0300 Subject: [Python-ideas] Python stdlib for hackers Message-ID: stdlib.py check https://code.google.com/p/rainforce/wiki/PythonLibrarySplit -- anatoly t. From musicdenotation at gmail.com Tue Jul 30 17:19:07 2013 From: musicdenotation at gmail.com (Musical Notation) Date: Tue, 30 Jul 2013 22:19:07 +0700 Subject: [Python-ideas] Enhance definition of functions Message-ID: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Yes, I know that multiline lambda will never be implemented in Python, but in many languages it is possible to write an anonymous function without using lambda at all. In JavaScript: Instead of "function (){code}" you can write "var name; name=function(){code}" Python (proposed): def func(a,b): print(a+b) return a+b becomes func=function a,b: print(a+b) return a+b -------------- next part -------------- An HTML attachment was scrubbed... URL: From mclefavor at gmail.com Tue Jul 30 17:25:10 2013 From: mclefavor at gmail.com (Matthew Lefavor) Date: Tue, 30 Jul 2013 11:25:10 -0400 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: What's wrong with the current syntax? Why can't you just write a def? I didn't do an exact count, but it looks like the lengths of the two function definitions differ by at most a single character. On Tue, Jul 30, 2013 at 11:19 AM, Musical Notation < musicdenotation at gmail.com> wrote: > Yes, I know that multiline lambda will never be implemented in Python, but > in many languages it is possible to write an anonymous function without > using lambda at all. > In JavaScript: > Instead of "function (){code}" you can write "var name; > name=function(){code}" > Python (proposed): > def func(a,b): > print(a+b) > return a+b > > becomes > > func=function a,b: > print(a+b) > return a+b > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Tue Jul 30 17:54:36 2013 From: mertz at gnosis.cx (David Mertz) Date: Tue, 30 Jul 2013 11:54:36 -0400 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: On Tue, Jul 30, 2013 at 11:19 AM, Musical Notation wrote: > Yes, I know that multiline lambda will never be implemented in Python, but > in many languages it is possible to write an anonymous function without > using lambda at all. > In JavaScript: > Instead of "function (){code}" you can write "var name; > name=function(){code}" This seems like an odd misunderstanding to me. Of course Javascript has a lambda, it just happens to spell it 'f-u-n-c-t-i-o-n' rather than like 'l-a-m-b-d-a'. As with every other object, a lambda object can be assigned a name in Javascript... although it need not be, of course. You can also just use a Javascript lambda inline anywhere a code object makes sense, e.g.: higher_order_func(function(x,y){return x+y}, 2, 3); Of course, if you wanted to, you could have written: add2 = function(x,y){return x+y}; higher_order_func(add2, 2, 3); Or likewise: function add2(x,y){return x+y}; higher_order_func(add2, 2, 3); Other than not allowing full blocks in lambdas, Python is exactly the same. higher_order_func(lambda x,y: x+y, 2, 3) And as with Javascript, you *could* give the passed function a name with: add2 = lambda x,y: x+y Or with: def add2(x,y): return x+y It sounds like what you are asking for--after saying Python will never have it--is multi-line, full-block, lambdas in Python. For that, it has been discussed a lot of times, and no one has found a syntax that feel widely acceptable. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. From ronaldoussoren at mac.com Tue Jul 30 17:59:54 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 30 Jul 2013 17:59:54 +0200 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: On 30 Jul, 2013, at 17:19, Musical Notation wrote: > Yes, I know that multiline lambda will never be implemented in Python, but in many languages it is possible to write an anonymous function without using lambda at all. "Never" is a long time. AFAIK the main reason why Python doesn't have multi-line lambda's is that nobody has proposed a suitable syntax yet (and not for lack of trying, the archives of this list and python-dev contain a lot of proposals that were found lacking). > In JavaScript: > Instead of "function (){code}" you can write "var name; name=function(){code}" That's just a lambda by another name. > Python (proposed): > def func(a,b): > print(a+b) > return a+b > > becomes > > func=function a,b: > print(a+b) > return a+b This has a number a problems, as noted by Matthew this isn't shorter than the corresponding named function. It also introduces a new keyword, and because of that likely breaks existing code (especially because "function" is a fairly common word in IT). The syntax appears to indicate that the new construct is an expression, if so how would you use it in other contexts where expressions can be used(Such as function arguments, list literals, ...)? Ronald > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From rymg19 at gmail.com Tue Jul 30 20:30:13 2013 From: rymg19 at gmail.com (Ryan) Date: Tue, 30 Jul 2013 13:30:13 -0500 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: <75205c49-9001-470a-b0b1-746bad8e0135@email.android.com> How about something like: f = indef(x, y): print x, y It's essentially storing lambdas. The syntax was taken from http://blog.deliciousrobots.com/2010/3/24/a-simple-compiler-with-python-and-ply-step-1-parsing/. indef stands for 'inline definition'. I've never heard someone use that name before. Or you could do 'slambda' for 'stored lambda'. Musical Notation wrote: >Yes, I know that multiline lambda will never be implemented in Python, >but in many languages it is possible to write an anonymous function >without using lambda at all. >In JavaScript: >Instead of "function (){code}" you can write "var >name; name=function(){code}" >Python (proposed): >def func(a,b): > print(a+b) > return a+b > >becomes > >func=function a,b: > print(a+b) > return a+b > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >http://mail.python.org/mailman/listinfo/python-ideas -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Tue Jul 30 20:47:33 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 30 Jul 2013 20:47:33 +0200 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <75205c49-9001-470a-b0b1-746bad8e0135@email.android.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <75205c49-9001-470a-b0b1-746bad8e0135@email.android.com> Message-ID: <4B867837-B407-45EC-BA8F-1784044BCFDE@mac.com> On 30 Jul, 2013, at 20:30, Ryan wrote: > How about something like: > > f = indef(x, y): print x, y > How is that different from a lambda? Ronald From ncoghlan at gmail.com Wed Jul 31 00:22:56 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 31 Jul 2013 08:22:56 +1000 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <4B867837-B407-45EC-BA8F-1784044BCFDE@mac.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <75205c49-9001-470a-b0b1-746bad8e0135@email.android.com> <4B867837-B407-45EC-BA8F-1784044BCFDE@mac.com> Message-ID: On 31 Jul 2013 04:48, "Ronald Oussoren" wrote: > > > On 30 Jul, 2013, at 20:30, Ryan wrote: > > > How about something like: > > > > f = indef(x, y): print x, y > > > How is that different from a lambda? And, since assignments are statements, different from def? Cheers, Nick. > > Ronald > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Wed Jul 31 00:59:43 2013 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 30 Jul 2013 17:59:43 -0500 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <75205c49-9001-470a-b0b1-746bad8e0135@email.android.com> <4B867837-B407-45EC-BA8F-1784044BCFDE@mac.com> Message-ID: On 07/30/2013 05:22 PM, Nick Coghlan wrote: > On 31 Jul 2013 04:48, "Ronald Oussoren" > > wrote: > > > > > > On 30 Jul, 2013, at 20:30, Ryan > > wrote: > > > > > How about something like: > > > > > > f = indef(x, y): print x, y > > > > > How is that different from a lambda? > > And, since assignments are statements, different from def? Or the old "let" statement. While writing a little bytecode interpreter in python, (To try some ideas out.), I found out it's much easier to parse source code if a line starts with a statement keyword such as 'let' or 'def'. And I suspect is why early languages followed that pattern. Cheers, Ron From tjreedy at udel.edu Wed Jul 31 03:41:31 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 30 Jul 2013 21:41:31 -0400 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: On 7/30/2013 11:59 AM, Ronald Oussoren wrote: > "Never" is a long time. AFAIK the main reason why Python doesn't have > multi-line lambda's is that nobody has proposed a suitable syntax yet > (and not for lack of trying, the archives of this list and python-dev > contain a lot of proposals that were found lacking). There is also the fact that a generic .__name__ attribute of '' is inferior to a possibly unique and meaningful name. This is not just in tracebacks. Consider [, ] versus [ at 0x0000000003470B70>, at 0x0000000003470BF8>] -- Terry Jan Reedy From steve at pearwood.info Wed Jul 31 07:20:25 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 31 Jul 2013 15:20:25 +1000 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: <51F89E99.2060509@pearwood.info> On 31/07/13 11:41, Terry Reedy wrote: > On 7/30/2013 11:59 AM, Ronald Oussoren wrote: > >> "Never" is a long time. AFAIK the main reason why Python doesn't have >> multi-line lambda's is that nobody has proposed a suitable syntax yet >> (and not for lack of trying, the archives of this list and python-dev >> contain a lot of proposals that were found lacking). > > There is also the fact that a generic .__name__ attribute of '' is inferior to a possibly unique and meaningful name. This is not just in tracebacks. Consider > [, ] > versus > [ at 0x0000000003470B70>, at 0x0000000003470BF8>] True, but if we're going to hypothesize nice syntax for multi-line lambdas, it's not much harder to imagine that there's also nice syntax to give them a name and a doc string at the same time :-) I-don't-want-much-just-multi-line-lambda-and-ultimate-power-ly yr's, -- Steven (and I don't care that much about multi-line lambda) From ronaldoussoren at mac.com Wed Jul 31 08:23:53 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 31 Jul 2013 08:23:53 +0200 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> Message-ID: <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> On 31 Jul, 2013, at 3:41, Terry Reedy wrote: > On 7/30/2013 11:59 AM, Ronald Oussoren wrote: > >> "Never" is a long time. AFAIK the main reason why Python doesn't have >> multi-line lambda's is that nobody has proposed a suitable syntax yet >> (and not for lack of trying, the archives of this list and python-dev >> contain a lot of proposals that were found lacking). > > There is also the fact that a generic .__name__ attribute of '' is inferior to a possibly unique and meaningful name. This is not just in tracebacks. Consider > [, ] > versus > [ at 0x0000000003470B70>, at 0x0000000003470BF8>] It might be lack of imagination on my part, but I have a lot of nested functions named "function" or "callback" that are too complex to be a lambda, but too simple or specialized to bother making them proper functions. The key function for sort is one of the usecases. I'd love to have anonymous functions for that, but haven't seen a proposal for those yet that would fit the language. Ronald From abarnert at yahoo.com Wed Jul 31 08:30:58 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 30 Jul 2013 23:30:58 -0700 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <75205c49-9001-470a-b0b1-746bad8e0135@email.android.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <75205c49-9001-470a-b0b1-746bad8e0135@email.android.com> Message-ID: <126BC0D2-605A-4775-8185-47D40BBFDD15@yahoo.com> On Jul 30, 2013, at 11:30, Ryan wrote: > How about something like: > > f = indef(x, y): print x, y > > It's essentially storing lambdas. You can already store lambdas. Or, more precisely, you can store functions. A lambda isn't a type; it's just a different way of creating functions, exactly the same type of functions you get from the def statement. And functions are first class values that can be bound to a name just like any other value. Meanwhile, the only advantages of lambda over def are that you don't have to come up with a name, and you can use it in an expression. So, trying to come up with a statement for giving lambdas names implies that there's something fundamental you're missing that would make your life a lot easier. Maybe you didn't know you could use def locally? Or you're trying to build code out of eval-ing strings because you don't know about closures? Or... Well, there are lots of possibilities. If you explain what you want to do that you think your new syntax would help with, I'm 99% sure Python already has a better way to do it. From haoyi.sg at gmail.com Wed Jul 31 08:37:38 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 31 Jul 2013 14:37:38 +0800 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> Message-ID: I'd like multiline lambdas too, but the syntax is a thorny problem. F# is the only language I know (are there others?) that allows you to mix whitespace-delimited and paren-delimited expressions, e.g. with whitespace-blocks inside parens: let f n = n + ( if n % 2 = 0 then printf "lol" 1 else printf "omg" 2 ) f 0 // lol // 1 f 1 // omg // 3 f 2 // lol // 3 And it actually works as you'd expect, most of the time, so clearly doing such a thing is possible. On the other hand, even with lots of smart people behind F#, edge cases in the parsing of this sort of thing bites me pretty regularly. Although It'd be awesome if it could be implemented cleanly and predictably, I definitely do not want to be the one writing the grammar or parser to support this kind of syntax! -Haoyi On Wed, Jul 31, 2013 at 2:23 PM, Ronald Oussoren wrote: > > On 31 Jul, 2013, at 3:41, Terry Reedy wrote: > > > On 7/30/2013 11:59 AM, Ronald Oussoren wrote: > > > >> "Never" is a long time. AFAIK the main reason why Python doesn't have > >> multi-line lambda's is that nobody has proposed a suitable syntax yet > >> (and not for lack of trying, the archives of this list and python-dev > >> contain a lot of proposals that were found lacking). > > > > There is also the fact that a generic .__name__ attribute of '' > is inferior to a possibly unique and meaningful name. This is not just in > tracebacks. Consider > > [, ] > > versus > > [ at 0x0000000003470B70>, at > 0x0000000003470BF8>] > > It might be lack of imagination on my part, but I have a lot of nested > functions named "function" or "callback" that are too complex to be a > lambda, but too simple or specialized to bother making them proper > functions. The key function for sort is one of the usecases. > > I'd love to have anonymous functions for that, but haven't seen a proposal > for those yet that would fit the language. > > Ronald > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Jul 31 08:47:53 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 30 Jul 2013 23:47:53 -0700 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> Message-ID: On Jul 30, 2013, at 23:23, Ronald Oussoren wrote: > > On 31 Jul, 2013, at 3:41, Terry Reedy wrote: > >> On 7/30/2013 11:59 AM, Ronald Oussoren wrote: >> >>> "Never" is a long time. AFAIK the main reason why Python doesn't have >>> multi-line lambda's is that nobody has proposed a suitable syntax yet >>> (and not for lack of trying, the archives of this list and python-dev >>> contain a lot of proposals that were found lacking). >> >> There is also the fact that a generic .__name__ attribute of '' is inferior to a possibly unique and meaningful name. This is not just in tracebacks. Consider >> [, ] >> versus >> [ at 0x0000000003470B70>, at 0x0000000003470BF8>] > > It might be lack of imagination on my part, but I have a lot of nested functions named "function" or "callback" that are too complex to be a lambda, but too simple or specialized to bother making them proper functions. The key function for sort is one of the usecases. > > I'd love to have anonymous functions for that, but haven't seen a proposal for those yet that would fit the language. Would it really help anything? If you're worried about keystrokes you can always call them "f" instead of "function". And I don't think anonymous functions would be as nice in tracebacks as even genetically-named ones. I think having to define them out of line is usually a more serious problem than having to name them, and if you solve that problem you may get the other one for free (although admittedly you may not, as the @in proposal shows...). Of course often, when I run into this, it's a matter of trying to write something that _could_ be an expression, without even needing a lambda... except that it would be a huge mess of partial and compose and other HOF calls and uses of operator functions and sometimes itertools stuff that would all be trivial in a different language but is horribly unpleasant in Python. And then I can compare the cost of having to write an out of line function in Python to the cost of writing the surrounding code in Haskell and I feel better. :) From tjreedy at udel.edu Wed Jul 31 10:15:02 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 31 Jul 2013 04:15:02 -0400 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: <51F89E99.2060509@pearwood.info> References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <51F89E99.2060509@pearwood.info> Message-ID: On 7/31/2013 1:20 AM, Steven D'Aprano wrote: > On 31/07/13 11:41, Terry Reedy wrote: >> On 7/30/2013 11:59 AM, Ronald Oussoren wrote: >> >>> "Never" is a long time. AFAIK the main reason why Python doesn't have >>> multi-line lambda's is that nobody has proposed a suitable syntax yet >>> (and not for lack of trying, the archives of this list and python-dev >>> contain a lot of proposals that were found lacking). >> >> There is also the fact that a generic .__name__ attribute of >> '' is inferior to a possibly unique and meaningful name. This >> is not just in tracebacks. Consider >> [, ] >> versus >> [ at 0x0000000003470B70>, at >> 0x0000000003470BF8>] > > > True, but if we're going to hypothesize nice syntax for multi-line > lambdas, it's not much harder to imagine that there's also nice syntax > to give them a name and a doc string at the same time :-) But then they would not be anonymous. When I have a multiple line expression, I often pull out an anonymous expression and give it a local name to use within the expression to make the expression shorter and more comprehensible. So does most everyone else. So I have no desire to do the opposite with function definitions by sticking multiple line definition in the middle of an expression. I know Lisper do that, and the result is ofter difficult to impossible to read. In sort(key=?) calls, I use lambda for short, one-use key expressions, but I would never want to inject a multiple line expression into the middle. To me, it make the code less readable than separating a non-trivial key idea from the sort idea. -- Terry Jan Reedy From masklinn at masklinn.net Wed Jul 31 10:38:54 2013 From: masklinn at masklinn.net (Masklinn) Date: Wed, 31 Jul 2013 10:38:54 +0200 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> Message-ID: <981E4F4C-4507-4B8F-8492-63D8719C2D4E@masklinn.net> On 2013-07-31, at 08:37 , Haoyi Li wrote: > I'd like multiline lambdas too, but the syntax is a thorny problem. F# is > the only language I know (are there others?) that allows you to mix > whitespace-delimited and paren-delimited expressions, e.g. with > whitespace-blocks inside parens: > > let f n = n + ( > if n % 2 = 0 then > printf "lol" > 1 > else > printf "omg" > 2 > ) Haskell can do that: f n = n + ( case n `mod` 2 of 0 -> unsafePerformIO $ do putStrLn "lol" return 1 1 -> unsafePerformIO $ do putStrLn "omfg" return 2 ) although a difference is that it doesn't have statement blocks, only expressions (`do` blocks are sugar for monadic chain expressions) From p.f.moore at gmail.com Wed Jul 31 11:15:33 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 31 Jul 2013 10:15:33 +0100 Subject: [Python-ideas] Enhance definition of functions In-Reply-To: References: <7751D7E0-2438-4E94-91F7-12C35D98B3FF@gmail.com> <76F4D890-7D8B-4633-987D-163C4BE45201@mac.com> Message-ID: On 31 July 2013 07:47, Andrew Barnert wrote: > > It might be lack of imagination on my part, but I have a lot of nested > functions named "function" or "callback" that are too complex to be a > lambda, but too simple or specialized to bother making them proper > functions. The key function for sort is one of the usecases. > > > > I'd love to have anonymous functions for that, but haven't seen a > proposal for those yet that would fit the language. > > Would it really help anything? If you're worried about keystrokes you can > always call them "f" instead of "function". And I don't think anonymous > functions would be as nice in tracebacks as even genetically-named ones. > > I think having to define them out of line is usually a more serious > problem than having to name them, and if you solve that problem you may get > the other one for free (although admittedly you may not, as the @in > proposal shows...). The only real reason I ever use lambdas (and would sometimes like a multiline version or similar) is for readability, where I want to pass a callback to a function and naming it and placing it before the call over-emphasises its importance. It's hard to make this objective, but to my eyes def k(obj): return obj['x'] / obj['y'] s = list(sorted(l, key=k) reads marginally worse than s = list(sorted(l, key=k)) where: def k(obj): return obj['x'] / obj['y'] simply because the focus of the block of code (building a sorted list) is at the start in the latter. But because the difference is so subtle, it's very hard to get a syntax that improves things sufficiently to justify new syntax. And it's also not at all obvious to me that any improvement in readability that can be gained in simple example code that you can post in an email, will actually still be present in "real world" code (which, in my experience, is always far messier than constructed examples :-)) Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Jul 31 17:40:03 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 31 Jul 2013 10:40:03 -0500 Subject: [Python-ideas] os.path.isbinary Message-ID: Here's something more interesting than my shlex idea. os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish. So... What if os.path had a binary checker that works just like isfile: os.path.isbinary('/nothingness/is/eternal') # Returns boolean It's a thought... -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Wed Jul 31 18:02:54 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 31 Jul 2013 20:02:54 +0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: <20130731160254.GA28425@iskra.aviel.ru> On Wed, Jul 31, 2013 at 10:40:03AM -0500, Ryan wrote: > Here's something more interesting than my shlex idea. > > os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish. > > So... > > What if os.path had a binary checker that works just like isfile: > os.path.isbinary('/nothingness/is/eternal') # Returns boolean What is a binary file? Would Russian text in koi8-r encoding be considered binary? What about utf-16? UTF16-encoded files have many zero characters. UTF32-encoded have even more. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From eliben at gmail.com Wed Jul 31 18:21:24 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 31 Jul 2013 09:21:24 -0700 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: On Wed, Jul 31, 2013 at 8:40 AM, Ryan wrote: > Here's something more interesting than my shlex idea. > > os.path is, pretty much, the Python FS toolbox, along with shutil. But, > there's one feature missing: check if a file is binary. It isn't hard, see > http://code.activestate.com/recipes/173220/. But, writing 50 lines of > code for a more common task isn't really Python-ish. > > So... > > What if os.path had a binary checker that works just like isfile: > os.path.isbinary('/nothingness/is/eternal') # Returns boolean > Some time ago I put on a gas mask and dove into the Perl source code to figure out how its "is binary" and "is text" operators work: http://eli.thegreenplace.net/2011/10/19/perls-guess-if-file-is-text-or-binary-implemented-in-python/ I would recommend against including such a simplistic heuristic in the Python stdlib. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Wed Jul 31 18:23:47 2013 From: masklinn at masklinn.net (Masklinn) Date: Wed, 31 Jul 2013 18:23:47 +0200 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <20130731160254.GA28425@iskra.aviel.ru> References: <20130731160254.GA28425@iskra.aviel.ru> Message-ID: On 31 juil. 2013, at 18:02, Oleg Broytman wrote: > On Wed, Jul 31, 2013 at 10:40:03AM -0500, Ryan wrote: >> Here's something more interesting than my shlex idea. >> >> os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish. >> >> So... >> >> What if os.path had a binary checker that works just like isfile: >> os.path.isbinary('/nothingness/is/eternal') # Returns boolean > > What is a binary file? Would Russian text in koi8-r encoding be > considered binary? What about utf-16? UTF16-encoded files have many > zero characters. UTF32-encoded have even more. And the recipe linked is worse than that: even with no nul byte, if more than 30% of the files's bytes aren't ASCII it considers the file binary. Files in iso-8859 parts 5 to 8 (Cyrillic, Arabic, Greek and Hebrew) are pretty much guaranteed to be inferred as binary. Part 11 (Thai) as well. UTF-8 for any non-Latin script will also be considered binary as the high bit is always set when encoding codepoints outside the ASCII range. From tjreedy at udel.edu Wed Jul 31 18:37:45 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 31 Jul 2013 12:37:45 -0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: On 7/31/2013 11:40 AM, Ryan wrote: > Here's something more interesting than my shlex idea. > > os.path is, pretty much, the Python FS toolbox, along with shutil. But, > there's one feature missing: check if a file is binary. It isn't hard, > see http://code.activestate.com/recipes/173220/. But, writing 50 lines > of code for a more common task isn't really Python-ish. The somewhat arbitrarily defined is(ascii)textfile function and subsidiaries is more like 13 lines. The best discrimination function is heavily dependent upon the two groups of files one expects to be testing. -- Terry Jan Reedy From clay.sweetser at gmail.com Wed Jul 31 19:15:37 2013 From: clay.sweetser at gmail.com (Clay Sweetser) Date: Wed, 31 Jul 2013 13:15:37 -0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: On Jul 31, 2013 12:22 PM, "Eli Bendersky" wrote: > > > > > On Wed, Jul 31, 2013 at 8:40 AM, Ryan wrote: >> >> Here's something more interesting than my shlex idea. >> >> os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish. >> >> So... >> >> What if os.path had a binary checker that works just like isfile: >> os.path.isbinary('/nothingness/is/eternal') # Returns boolean Besides the high chance of false positives, what makes this method (and the problem it tries to solve) so so difficult is that binary files may contain what is considered to be large amounts of text, and text files may contain pieces of binary data. For example, consider a windows executable file - Much of the data in such a file is considered binary data, but there are defined sections where strings and text resources are stored. Any heuristic algorithm like the one mentioned will be insufficient in such cases. Although I can't think of a situation off hand where the opposite may be true (binary data embedded in what is considered to be a text file) I'm pretty sure such a situation exists. > > > > Some time ago I put on a gas mask and dove into the Perl source code to figure out how its "is binary" and "is text" operators work: http://eli.thegreenplace.net/2011/10/19/perls-guess-if-file-is-text-or-binary-implemented-in-python/ > > I would recommend against including such a simplistic heuristic in the Python stdlib. > > Eli > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Wed Jul 31 21:03:43 2013 From: rymg19 at gmail.com (Ryan) Date: Wed, 31 Jul 2013 14:03:43 -0500 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: 1.The link I provided wasn't how I wanted it to be. I was using it as an example to show it wasn't impossible. 2.You yourself stated it doesn't work on UTF-8 files. If you wanted one that worked on all text files, it wouldn't work right. 3.Did no one get the 'nothingness/is/eternal' joke? So...although that is a nice piece of code, an os.path implementation would probably be more complete and foolproof. Eli Bendersky wrote: >On Wed, Jul 31, 2013 at 8:40 AM, Ryan wrote: > >> Here's something more interesting than my shlex idea. >> >> os.path is, pretty much, the Python FS toolbox, along with shutil. >But, >> there's one feature missing: check if a file is binary. It isn't >hard, see >> http://code.activestate.com/recipes/173220/. But, writing 50 lines of >> code for a more common task isn't really Python-ish. >> >> So... >> >> What if os.path had a binary checker that works just like isfile: >> os.path.isbinary('/nothingness/is/eternal') # Returns boolean >> > > >Some time ago I put on a gas mask and dove into the Perl source code to >figure out how its "is binary" and "is text" operators work: >http://eli.thegreenplace.net/2011/10/19/perls-guess-if-file-is-text-or-binary-implemented-in-python/ > >I would recommend against including such a simplistic heuristic in the >Python stdlib. > >Eli -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Jul 31 22:57:11 2013 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 31 Jul 2013 13:57:11 -0700 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: <94931857-EFA1-45F1-86B0-B83432B01BEC@yahoo.com> On Jul 31, 2013, at 12:03, Ryan wrote: > > So...although that is a nice piece of code, an os.path implementation would probably be more complete and foolproof. And because there is no foolproof, or even remotely close to foolproof, way to do it, there can be no os.path implementation. From tjreedy at udel.edu Wed Jul 31 23:23:42 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 31 Jul 2013 17:23:42 -0400 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: On 7/31/2013 3:03 PM, Ryan wrote: > 1.The link I provided wasn't how I wanted it to be. And there is no 'one way' that will satisfy everyone, or every most people, as they will have different use cases for 'istext'. > I was using it as an example to show it wasn't impossible. It is obviously possible to apply any arbitrary predicate to any object within its input domain. No one has claimed otherwise that I know of. > 2.You yourself stated it doesn't work on UTF-8 files. If you wanted one > that worked on all text files, it wouldn't work right. The problem is that the problem is ill-defined. Every file is (or can be viewed as) a sequence of binary bytes. Every file can be interpreted as a text file encoded with any of the encodings (like at least some latin-1 encodings, and the IBM PC Graphics encoding) that give a character meaning to every byte. So, to be strict, every file is both binary and text. Python allows us to open any file as either binary or text (with some encoding, with latin-1 one of the possible choices). The pragmatic question is 'Is this file 'likely' *intended* to be interpreted as text, given that the creator is a member of our *local culture*. For the function you referenced, the 'local culture' is 'closed Western European'. For 'closed American', the threshold of allowed non-ascii text and control chars should be more like 0 or 1%. For many cultures, the referenced function is nonsensical. For an open global context, istext would have to try all standard text encodings and for those that worked, apply the grammar rules of the languages that normally are encoded with that encoding. -- Terry Jan Reedy From grosser.meister.morti at gmx.net Wed Jul 31 23:20:14 2013 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Wed, 31 Jul 2013 23:20:14 +0200 Subject: [Python-ideas] os.path.isbinary In-Reply-To: References: Message-ID: <51F97F8E.30507@gmx.net> On 07/31/2013 07:15 PM, Clay Sweetser wrote: > > On Jul 31, 2013 12:22 PM, "Eli Bendersky" > wrote: > > > > > > > > > > On Wed, Jul 31, 2013 at 8:40 AM, Ryan > wrote: > >> > >> Here's something more interesting than my shlex idea. > >> > >> os.path is, pretty much, the Python FS toolbox, along with shutil. But, there's one feature missing: check if a file is binary. It isn't hard, see http://code.activestate.com/recipes/173220/. But, writing 50 lines of code for a more common task isn't really Python-ish. > >> > >> So... > >> > >> What if os.path had a binary checker that works just like isfile: > >> os.path.isbinary('/nothingness/is/eternal') # Returns boolean > Besides the high chance of false positives, what makes this method (and the problem it tries to solve) so so difficult is that binary files may contain what is considered to be large amounts of text, and text files may contain pieces of binary data. > For example, consider a windows executable file - Much of the data in such a file is considered binary data, but there are defined sections where strings and text resources are stored. Any heuristic algorithm like the one mentioned will be insufficient in such cases. > Although I can't think of a situation off hand where the opposite may be true (binary data embedded in what is considered to be a text file) I'm pretty sure such a situation exists. One could consider PDF to be such a format (text with embedded binary data). > > > > > > > > Some time ago I put on a gas mask and dove into the Perl source code to figure out how its "is binary" and "is text" operators work: http://eli.thegreenplace.net/2011/10/19/perls-guess-if-file-is-text-or-binary-implemented-in-python/ > > > > I would recommend against including such a simplistic heuristic in the Python stdlib. > > > > Eli > > > > From ckaynor at zindagigames.com Wed Jul 31 23:29:38 2013 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Wed, 31 Jul 2013 14:29:38 -0700 Subject: [Python-ideas] os.path.isbinary In-Reply-To: <51F97F8E.30507@gmx.net> References: <51F97F8E.30507@gmx.net> Message-ID: > > Besides the high chance of false positives, what makes this method (and >> the problem it tries to solve) so so difficult is that binary files may >> contain what is considered to be large amounts of text, and text files may >> contain pieces of binary data. >> For example, consider a windows executable file - Much of the data in >> such a file is considered binary data, but there are defined sections where >> strings and text resources are stored. Any heuristic algorithm like the one >> mentioned will be insufficient in such cases. >> Although I can't think of a situation off hand where the opposite may be >> true (binary data embedded in what is considered to be a text file) I'm >> pretty sure such a situation exists. >> > > One could consider PDF to be such a format (text with embedded binary > data). > > RTF is another example. -------------- next part -------------- An HTML attachment was scrubbed... URL: